From jay at jays.net  Thu Jun  1 00:58:29 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 23:58:29 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000001c68528$d1b6ec10$15327e82@pyrimidine>
References: <000001c68528$d1b6ec10$15327e82@pyrimidine>
Message-ID: <447E73F5.40403@jays.net>

Chris Fields wrote:
>> Is the doc/ tree being abandoned?
> 
> Most docs have been moved over to the wiki, which generates nicely formatted
> docs for printing.

Oh. Well, if we've already jumped off that cliff I say we just go for it. Move everything to the wiki, nuke the empty CVS dirs, and call it good.

I hereby volunteer to strip the code out of bptutorial.pl and put it wherever. Where should I put it when I'm done? (examples/tutorial.pl?)

>> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> I believe the reasoning is along these lines: HOWTO's are focused in on
> specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
> has greater detail. The tutorials are more broadly based (sort of a general
> bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
> has additional information over the tutorial (at least it did the last time
> I looked at the tutorial, which has been a while).

Huh. Sounds like a subtle line. I might suggest picking one name or the other and shuffling everything into one list on the wiki. 

>> It's hard for me to dive into a wiki lifestyle for the huge documentation
>> pillars since it can't ever get back into the distro... (can it?)  Small,
>> throw away stuff is great for the wiki, but huge, established, thoughtful,
>> long documents should be left in the distro? Present (and searchable) on
>> the wiki but static?
> 
> Hence the problem we face now.  It is something we need to really look into
> before adding too much more to the wiki.  IMHO, I think we should have very
> little information directly in the distribution itself since it's already
> quite large.  It's almost as easy to have a bare-bones INSTALL file, which
> would point to the wiki for additional information.  But I may be very much
> alone in that train of thought ; >

If the doc/ tree has already moved then I guess I just joined the all-wiki camp. I assume it stores full revision history and we have backups in case somebody blows something up. Any system is better than multiple systems breeding inconsistencies. Keep the spammers/clueless out and/or quickly remove their nonsense and I'm pro-wiki. Revisions email reviewers?

>> Sick of my endless questions yet? -grin-
> 
> Not really.

Give it a few more posts. It'll come. :)

j
Current toy: http://openlab.jays.net/


From ULNJUJERYDIX at spammotel.com  Thu Jun  1 02:53:46 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 1 Jun 2006 14:53:46 +0800
Subject: [Bioperl-l] **Fwd: Re: SOLVED ver2 Bio::Graphics::Panel make
	ruler have neg values
Message-ID: <5b6410e0605312353l1fbf8256hc8a2b85d0f0ac199@mail.gmail.com>

 Thanks Lincoln! Your code worked in ver 1.4 as well.
think the prob i had was due to me just adapting from the blast output
tutorial so i had something like
my $feature =
Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end,
-source=>$source);

and maybe also because I didn't have the + sign for the numbers

on a side note, I think that the ability to offset the ruler might prove
useful for some applications. Will spend more time to understand the
$relative_coords_offset option in the arrow.pm when i can afford to, and
perhaps help contribute an offset option to arrow.pm

cheers
kevin

Content-Disposition: inline
>
> Hi Kevin,
>
> Since you are modifying the Panel.pm source code, why don't you just go
> ahead
> and use the current Bio::Graphics development tree? Since 1.5.1 it
> supports
> negative coordinates. Here's an illustration:
>
> #!/usr/bin/perl
>
> use strict;
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
>
> my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
> my $feature =
> Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
> my $panel   = Bio::Graphics::Panel->new(-start=> -200,
>                                          -end  => +200,
>                                          -width=>800,
>                                          -pad_left=>10,
>                                          -pad_right=>10);
> $panel->add_track($whole,
>                    -glyph=>'arrow',
>                    -double=>1,
>                    -tick=>2);
> $panel->add_track($feature,
>                   -glyph=>'box',
>                    -stranded=>1);
> print $panel->png;
>
> exit 0;
>
> The resulting image is attached.
>
> Lincoln
>
> On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> > I am so sorry for the truncated email accidentally hit reply.
> > if anyone is interested i have opted to change
> >
> > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> > in linux its
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
> >
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
> >
> > to
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
> >
> > just  for this one-off use.
> >
> >
> >
> > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> > option for coords offset?
> >     my $relative_coords_offset =
> $self->option('relative_coords_offset');
> >     $relative_coords_offset    = 1 unless defined
> $relative_coords_offset;
> > but entering the option -relative_coords_offset=>1000 in the arrow
> glyphs
> > didn't do anything...
> >
> >
> >
> > Hi!
> >
> > > oh it was in a slightly different header asking about the create image
> > > map feature.
> > > I am using the stable version 1.4 of bioperl now. In any case I have
> not
> > > added the sequence as a feature annotated seq. as I already have the
> bp
> > > where the TF binds (in 1-1050 numberings) so what I did was to just
> add
> > > graded segments based on the position.
> > > I saw that there is a scale function for the arrow glyp however, it is
> a
> > > multiply function, can it be hacked to take in a offset value (ie
> minus
> > > the
> > > scale by 1000?)
> > >
> > > cheers
> > > kevin
> > >
> > >
> > > Hi,
> > >
> > > > For some reason I didn't see the first posting on this. In current
> > >
> > > bioperl
> > >
> > > > live, the ruler can have negative numberings - I use this routinely.
> > > > You need
> > > > to create a feature that starts in negative coordinates. What is
> > >
> > > happening
> > >
> > > > to
> > > > you when you try this?
> > > >
> > > > Lincoln
> > > >
> > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > > Hi
> > > > > thanks for the help offered thus far!
> > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer
> seq
> > > >
> > > > using
> > > >
> > > > > bioperl. therefore i was asked to make the numberings as such
> (-1000)
> > >
> > > is
> > >
> > > > > there any way at all to do this in bioperl without changing the
> .pm
> > > >
> > > > file?
> > > >
> > > > > thanks guys..
> > > > > kevin
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > Lincoln D. Stein
> > > > Cold Spring Harbor Laboratory
> > > > 1 Bungtown Road
> > > > Cold Spring Harbor, NY 11724
> > > > (516) 367-8380 (voice)
> > > > (516) 367-8389 (fax)
> > > > FOR URGENT MESSAGES & SCHEDULING,
> > > > PLEASE CONTACT MY ASSISTANT,
> > > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>

From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 03:59:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 08:59:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <447E9E59.6090709@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
>> Just looking for all return undef;s isn't enough. It's entirely possible
>> to do something like:
>>
>> my $return_value;
>> {
>>    # do something that assigns to return_value on success
>>    # on failure, just do nothing
>> }
>> return $return_value;
> 
> Agreed, though looking for these is obviously much harder.  
> 
> The way to get around those is:
> 
> return $return_value if $return_value;
> return;
> 
> which I've seen used in a number of get/set methods. 

Though if anyone is using that cookie-cutter/macro style, that's much 
worse because now you can't return 0.

return $return_value if defined($return_value);
return;

In any case, it burns the eyes. I share Lincoln's POV. I also fully 
understand your point about not being able to trust the docs 
(Bio::Map::Marker...). But the solution is to change the code so they 
match the docs when the docs make sense, not change the code so that it 
no longer matches the docs[*]. In a massive OO project like bioperl the 
users need to be able to rely on the docs. You can't turn around and say 
"you've used this method for years, but now I'm changing how it works 
because you might have used the method incorrectly". Ideally any code 
changes add functionality or improve it's working without affecting code 
  that uses the method correctly according to its old docs.


* though if there isn't time/interest in changing the code, and the 
method never worked as per the docs, then by all means change the docs 
to avoid confusion - just don't change the docs on a method that worked 
according to the docs, because then you can assume people use the method 
and will be affected by the change

From lstein at cshl.edu  Thu Jun  1 11:40:38 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 1 Jun 2006 11:40:38 -0400
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
Message-ID: <200606011140.38726.lstein@cshl.edu>

Hi,

The border is coming from the HTML <img. To get rid of it, set -border=>0 in 
the img() call.

Lincoln


On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> Hello everybody,
>
> does anybody know how to remove the background color of the Panel.
> Currently, I am not adding anything to it, so I can troubleshot the
> problem, and I have tried setting up
> all color attributes I could find to the panel, but no luck. Whatever I do,
> I get the BLUE border of the panel.
>
> Has anybody faced the same problem?
>
> Thanks in advance,
>
> Jelena
>
> And here is the code I am currently using:
>
> ---------------------------------------------------------------------------
>-------------------------------- my $panel =
>     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
>                               -width => 800,
>                               -pad_left => 10,
>                               -pad_right => 10,
>                               -key_color => 'white',
>                               -bgcolor => 'white',
>                               -gridcolor=>'black',
>                               -fgcolor => 'black',
>                               -grid => 0,
>                               );
>    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
>      -url  => '/tmpimages');
>    #make clickable image
>    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
>    print $map;
>
> ---------------------------------------------------------------------------
>--------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From arareko at campus.iztacala.unam.mx  Thu Jun  1 12:13:05 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:13:05 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A3BD0B.8A2C%osborne1@optonline.net>
References: <C0A3BD0B.8A2C%osborne1@optonline.net>
Message-ID: <447F1211.2010705@campus.iztacala.unam.mx>

You're right Brian. I also think that the text/POD part is more 
important than the script. Since we're more into moving everything to 
the Wiki, I believe this would be the right approach.

Moving the script part of the tutorial into the examples/ directory is 
also a nice idea.

Mauricio.

Brian Osborne wrote:
> Mauricio,
> 
> Bernd didn't say he want the _script_ in the package, he said he wanted
> bptutorial.pl in the package, not indicating whether it was the
> documentation or the script that was important. It's my suspicion that the
> documentation is more important than the script, and this is what my last
> letter was asking, in part: is the script important? Or can we focus on the
> text/POD part?
> 
> Brian O.
> 
> 
> On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> <arareko at campus.iztacala.unam.mx> wrote:
> 
>> I agree with what Bernd Web said in another reply. For some people will
>> be nice to still be able to run the script from the codebase and
>> interact with it.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 12:20:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:20:34 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1211.2010705@campus.iztacala.unam.mx>
Message-ID: <000b01c68597$5026bdf0$15327e82@pyrimidine>

Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
/scripts or /examples then?

Also, what do we do about similar situation with other docs moved to the
wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
distribution pointing out the wiki docs instead?

Chris

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Thursday, June 01, 2006 11:13 AM
> To: Brian Osborne
> Cc: Chris Fields; bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> You're right Brian. I also think that the text/POD part is more
> important than the script. Since we're more into moving everything to
> the Wiki, I believe this would be the right approach.
> 
> Moving the script part of the tutorial into the examples/ directory is
> also a nice idea.
> 
> Mauricio.
> 
> Brian Osborne wrote:
> > Mauricio,
> >
> > Bernd didn't say he want the _script_ in the package, he said he wanted
> > bptutorial.pl in the package, not indicating whether it was the
> > documentation or the script that was important. It's my suspicion that
> the
> > documentation is more important than the script, and this is what my
> last
> > letter was asking, in part: is the script important? Or can we focus on
> the
> > text/POD part?
> >
> > Brian O.
> >
> >
> > On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> > <arareko at campus.iztacala.unam.mx> wrote:
> >
> >> I agree with what Bernd Web said in another reply. For some people will
> >> be nice to still be able to run the script from the codebase and
> >> interact with it.
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 12:28:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:28:38 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447E9E59.6090709@mrc-dunn.cam.ac.uk>
Message-ID: <000c01c68598$704b15d0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 2:59 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> Chris Fields wrote:
> >
> > Sendu Bala wrote:
> >> Just looking for all return undef;s isn't enough. It's entirely
> possible
> >> to do something like:
> >>
> >> my $return_value;
> >> {
> >>    # do something that assigns to return_value on success
> >>    # on failure, just do nothing
> >> }
> >> return $return_value;
> >
> > Agreed, though looking for these is obviously much harder.
> >
> > The way to get around those is:
> >
> > return $return_value if $return_value;
> > return;
> >
> > which I've seen used in a number of get/set methods.
> 
> Though if anyone is using that cookie-cutter/macro style, that's much
> worse because now you can't return 0.
> 
> return $return_value if defined($return_value);
> return;

Makes sense.  Really, this all comes down to semantics and the context of
how the method is called and what is expected as a return value.  I suppose
it also depends on what one considers 'best practice,' which can be
subjective.  I don't want us getting into a situation in which we come
across as critiquing someone else's code w/o some valid points, i.e.
Lincoln's point about complaining.  I think that's why this thread is pretty
important, in that we're getting a broad range of opinions on the issue.

> In any case, it burns the eyes. 

Yep, I agree. 

> I share Lincoln's POV. I also fully
> understand your point about not being able to trust the docs
> (Bio::Map::Marker...). But the solution is to change the code so they
> match the docs when the docs make sense, not change the code so that it
> no longer matches the docs[*]. In a massive OO project like bioperl the

So you know, Lincoln and I both support the idea of an audit.  He also notes
(and I agree) that people will likely complain.  

Anyway, changing the code to match the docs makes sense therotically, but in
practice that doesn't always work.  Any situation where code does not behave
as expected (i.e. as described in the docs) are bugs and can be reported as
such.  The problem arises when the docs are completely wrong, as
Bio::Restriction::IO was before I made changes to it.  In many cases simple
small code changes won't work, such as when methods inherit from an
interface but don't implement all methods (so essentially are incomplete).

Hilmar made the point that we should change the docs to reflect
inconsistencies in particular plugin modules for IO classes (AlignIO has a
few modules with unimplemented write methods, and so on).  When the code
radically varies, such as in the Restriction::IO case (where none of the
write methods worked), the docs should be changed in the IO class to reflect
this.  Of course, you should also add a bit to the TO DO section of POD and
add a bit to the Project Priority List on the wiki to point this out, both
of whichI did.  It comes down to 'truth in advertising', does it do what's
expected.

> users need to be able to rely on the docs. You can't turn around and say
> "you've used this method for years, but now I'm changing how it works
> because you might have used the method incorrectly". Ideally any code

Not what I did, BTW.  The API is intact; you can still use the write methods
if you want (they throw errors just fine).  In fact, I didn't change any
methods except in one module (Restriction::IO::bairoch), where I added a
warning to the read method b/c it didn't work as expected, and I filed a bug
report.  Essentially, the only thing I changed was the docs to reflect what
the code currently can accomplish (at least until you read the TO DO).  We
already had one person email the group asking why code in the synopsis
didn't work.

Adding read and write methods to most of these modules (making the code do
what the docs reflect, in your words) is a lot of work, esp. for someone
like me unfamiliar with the class architecture and methods for those
modules.  IMHO, contributions to bioperl should accomplish what is reflected
in their docs once added to the core; if a write method hasn't been written,
then add it to the docs in a TO DO section or add a warning to the synopsis.
Don't put in the docs what you intend the code to accomplish down the road
but what it does currently.  Is that unreasonable?

Anyway, when something doesn't perform as expected (produces invalid output
or contains errors), it's considered a bug.  That includes misrepresenting
what a module does in the docs.  When we try to fix bugs we have to decipher
what the intent of the original author was from the docs and code, then try
to get it to work by modifying the code.  In extreme cases (such as
unimplemented methods) that may mean writing up entire methods from scratch.
The read and write methods for IO modules are normally the longest methods
in a class.  That's a heck of a lot of effort for something that a large
majority of us aren't interested in taking up, esp. when the submitting
author should have had everything up to spec (i.e. what's in the docs) when
adding it to the core.

> changes add functionality or improve it's working without affecting code
>   that uses the method correctly according to its old docs.
> 
> 
> * though if there isn't time/interest in changing the code, and the
> method never worked as per the docs, then by all means change the docs
> to avoid confusion - just don't change the docs on a method that worked
> according to the docs, because then you can assume people use the method
> and will be affected by the change

Again, didn't do that.  The methods in the docs either didn't exist (not
implemented) or didn't work (contained bugs).  The docs were changed b/c
they were misleading.

-chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 12:36:07 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:36:07 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
References: <C0A31929.89F9%osborne1@optonline.net> <447E48B9.4080503@jays.net>
Message-ID: <447F1777.3070906@campus.iztacala.unam.mx>

Jay Hannah wrote:
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should 
>> point to the Tutorials page instead of the Bptutorial.pl page please let 
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Nice idea, I'll check with Jason if it's possible (in mediawiki) to 
create a new Documentation sidebar to hold this 4 sections.

> (What's the conceptual difference between a HOWTO and a tutorial?)

My concept is that Tutorials cover a wider aspect of BioPerl, contrary 
to the HOWTO's which focus on a certain topic.

> Why isn't the short "Current events" just listed on the top of the "News" page?

I don't know, maybe because it was important when Jason started the Wiki 
a couple of months ago. Do you think it should be erased from the sidebar?

> Sick of my endless questions yet? -grin-
> 
> j
> 

Of course not! :)

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 12:46:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:46:03 +0100
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <000c01c68598$704b15d0$15327e82@pyrimidine>
References: <000c01c68598$704b15d0$15327e82@pyrimidine>
Message-ID: <447F19CB.4090607@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
[snip]
>> users need to be able to rely on the docs. You can't turn around and say
>> "you've used this method for years, but now I'm changing how it works
>> because you might have used the method incorrectly". Ideally any code
> 
> Not what I did, BTW.
[snip]
>> * though if there isn't time/interest in changing the code, and the
>> method never worked as per the docs, then by all means change the docs
>> to avoid confusion - just don't change the docs on a method that worked
>> according to the docs, because then you can assume people use the method
>> and will be affected by the change
> 
> Again, didn't do that.

I'm very sorry that I allowed the ambiguity, but my comments were 
certainly not directed at your recent changes to Bio::Restriction::IO. 
In fact, I put in the above * comment to exclude your changes from my 
discussion; you changed the docs because the code never did what they 
said they did (the docs were bad). That's fine (good!). My comments were 
a general point, slightly directed at the idea of changing all the 
return undef;s - changing the code so that it no longer matches the docs 
of a previously working method. That's what I think is bad. Though in 
this particular case it shouldn't make any difference at all.

From osborne1 at optonline.net  Thu Jun  1 12:46:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 12:46:02 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <C0A4920A.8A5B%osborne1@optonline.net>

Chris,

I think the INSTALL* files should be in the package, this is the de facto
convention for 99% of the packages I've ever seen. Then any Wiki page just
links to the file in CVS.

Personally I don't like the idea of maintaining a Wiki page and a file that
both say essentially the same thing (this is what has happened with the
INSTALL and INSTALL.WIN files). I've spent plenty of time merging redundant
text and removing files that contained these redundancies so it's
unfortunate to see them appear anew, sooner or later they'll get out of sync
despite best intentions. The most likely cause will be someone other than
the person who created the initial duplication (and promised to maintain
both) making a change in one of the two files.

Brian O.


On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 12:57:27 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:57:27 +0100
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
References: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <447F1C77.5040403@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
> /scripts or /examples then?
> 
> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?

Imho, something like an installation document should be there in full so 
once you've downloaded you can install without reference to anything 
else. Also, an installation document could be considered specific to the 
release version. Which is to say, it never goes out of date even if new 
versions of bioperl are released with new installation instructions - it 
applies to the installation directory it is found in.

The wiki can have the latest installation instructions, and you don't 
have to worry about keeping things synced.

From cjfields at uiuc.edu  Thu Jun  1 13:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:13:30 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1C77.5040403@mrc-dunn.cam.ac.uk>
Message-ID: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>

So basically have a minimal set of installation instructions in CVS and a
more detailed installation instructions on the wiki.  Sounds reasonable
enough but bioperl is a pretty complex distribution (lots of additional
modules required, platform-specific issues, so on).  Maybe we can come up
with a pared-down INSTALL file which combines the basic elements for
installing on UNIX/Windows/Mac/FreeBSD and points out dependencies.  

I still like the idea of just having a simple conversion from wiki->txt
direct from the web page (i.e. best of both worlds).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 11:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris Fields wrote:
> > Sounds good to me.  I guess the tutorial (post-stripping)would be moved
> to
> > /scripts or /examples then?
> >
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?
> 
> Imho, something like an installation document should be there in full so
> once you've downloaded you can install without reference to anything
> else. Also, an installation document could be considered specific to the
> release version. Which is to say, it never goes out of date even if new
> versions of bioperl are released with new installation instructions - it
> applies to the installation directory it is found in.
> 
> The wiki can have the latest installation instructions, and you don't
> have to worry about keeping things synced.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s-merchant at northwestern.edu  Thu Jun  1 13:17:32 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Thu, 1 Jun 2006 12:17:32 -0500
Subject: [Bioperl-l] Bio::OntologyIO
Message-ID: <000001c6859f$446f7fd0$c2987ca5@pc13>

Hi Everyone,

    I would like to announce the availability of an obo format parser
which can parse GO, PO, PATO and other ontology files in obo format. The
parser can be used through the Bio::OntologyIO module. Thanks to HIlamar
Lapp and Chris Mungall for their invaluable contributions.

 
Thanks,

Sohel Merchant.

 
Sohel Merchant

dictyBase

Bioinformatics Software Engineer

Center for Genetic Medicine

Northwestern University

676 St. Clair Street, Suite 1206

Chicago IL 60611

 
From cjfields at uiuc.edu  Thu Jun  1 13:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:35 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4920A.8A5B%osborne1@optonline.net>
Message-ID: <001101c685a3$53f4bf70$15327e82@pyrimidine>

I understand your point, though I think the wiki gives us an opportunity add
helpful links and use markup to help clarify things a bit more.  I have seen
several distributions which don't have INSTALL files, just simple README
with very basic instructions (Bio::ASN1::EntrezGene is one).  

I've been reluctant to mess around with the wiki Install pages too much more
b/c of syncing problems, just as you mentioned.  I will look into thing a
bit more to see if there's an easier way to go about converting wiki->text.

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 11:46 AM
> To: Chris Fields; 'Mauricio Herrera Cuadra'
> Cc: bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> I think the INSTALL* files should be in the package, this is the de facto
> convention for 99% of the packages I've ever seen. Then any Wiki page just
> links to the file in CVS.
> 
> Personally I don't like the idea of maintaining a Wiki page and a file
> that
> both say essentially the same thing (this is what has happened with the
> INSTALL and INSTALL.WIN files). I've spent plenty of time merging
> redundant
> text and removing files that contained these redundancies so it's
> unfortunate to see them appear anew, sooner or later they'll get out of
> sync
> despite best intentions. The most likely cause will be someone other than
> the person who created the initial duplication (and promised to maintain
> both) making a change in one of the two files.
> 
> Brian O.
> 
> 
> On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?


From cjfields at uiuc.edu  Thu Jun  1 13:46:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:45 -0500
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <447F19CB.4090607@mrc-dunn.cam.ac.uk>
Message-ID: <001201c685a3$59d78da0$15327e82@pyrimidine>


....

> > Again, didn't do that.
> 
> I'm very sorry that I allowed the ambiguity, but my comments were
> certainly not directed at your recent changes to Bio::Restriction::IO.
> In fact, I put in the above * comment to exclude your changes from my
> discussion; you changed the docs because the code never did what they
> said they did (the docs were bad). That's fine (good!). My comments were
> a general point, slightly directed at the idea of changing all the
> return undef;s - changing the code so that it no longer matches the docs
> of a previously working method. That's what I think is bad. Though in
> this particular case it shouldn't make any difference at all.

Agreed.  In any case, if tests have been properly set up then they should
catch problems.  This is, of course, if they are properly set up.  

Chris


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gad14 at cornell.edu  Thu Jun  1 15:10:31 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Thu, 01 Jun 2006 15:10:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447D5668.7070500@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu>
	<447BFB20.40501@mrc-dunn.cam.ac.uk>	<447C7985.9000404@cornell.edu>
	<447D5668.7070500@mrc-dunn.cam.ac.uk>
Message-ID: <447F3BA7.9030500@cornell.edu>

Problem solved, albeit, in a slightly hacky way.

I tried to make seek() work for a good long while with the SearchIO 
blast results object, but I just couldn't get it to work. (Probably b/c 
seek wants to see a genuine file handle-- not a SearchIO filehandle.) I 
used SearchIO's fh() to get the handle and could while(<$fh>) through 
the data but when I used seek($fh,0,0) to reset the cursor position in 
the handle in prep for another loop, i got an error complaining about my 
use of seek() by indicating that "SEEK" could not be found in Seekable.pm.

I concluded that it was not going to be possible and instead made an 
array if SeqFeature objects which contain all the relevant blast output 
data (i.e. the m8/hit table stuff).

It still seems unfortunate that one can't reuse the SearchIO object for 
cases when the SearchIO blast report needs to be accessed mltiple times.

Thanks for your help,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>>Thanks for your comment Sendu, it was very helpful. I think this must be 
>>what's going on.. I am using $blast_report->next_result in both 
>>subroutines. It appears that analyzing the blast results first w/ my 
>>sort subroutine empties (?) the $blast_result object so that when I try 
>>to print, there is nothing left to print. (and visa-versa when I print 
>>first then try to sort).
>>So, from the looks of things, using next_result has the effect of 
>>popping the Bio::Search::Result::ResultI objects off of the SearchIO 
>>blast report object??
> 
> 
> Not quite. It's more or less exactly like opening a file and then trying 
> to read it all twice like this:
> open(FILE, "file");
> while (<FILE>) {
>      print # prints each line in the file
> }
> while (<FILE>) {
>      print # never happens, we never enter this while loop
> }
> 
> To get the second while loop to print anything we need to say seek(FILE, 
> 0, 0) before it. Or in the first while loop store each line in an array, 
> and then make the second loop a foreach through that array.
> 
> 
> 
>>It seems I could get around this by making a copy of the blast report by 
>>setting it to another new variable...(not the most elegant solution) but 
>>I'm having trouble with this...
>>
>>If I do:
>>
>>    my $blast_report_copy = $blast_report;
>>
>>I'm just copying the reference to the SearchIO blast result, so it 
>>doesn't help me. How can I make another physical copy of this blast 
>>result object? Seems like a simple thing but how to do it is escaping me.
> 
> 
> Not really a good idea, and it may not work anyway if the object 
> contains a filehandle. But for a simple object you might recursively 
> loop through the data structure and copy each element out into a similar 
> data structure.
> 
> 
> 
>>But better yet, the way to go is to 'reset the counter,' or to find a 
>>way to look at/print/sort the results without removing data from the 
>>blast result object. How is this done though??
> 
> 
> It would be rather nice if this worked:
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);
> 
> Finally, the most sensible way (assuming bioperl provides no methods of 
> its own for this) of solving the problem is, the first time you go 
> through each next_result, next_hit and next_hsp, just store the returned 
> objects in an array of arrays of arrays. Then the second time get the 
> objects from your array structure instead of with the method calls.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jelenaob at gmail.com  Thu Jun  1 11:45:49 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Thu, 1 Jun 2006 08:45:49 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <200606011140.38726.lstein@cshl.edu>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
	<200606011140.38726.lstein@cshl.edu>
Message-ID: <5042a62b0606010845u79a5d5b3h131c4ed54f90fee3@mail.gmail.com>

Thanks Lincoln.

I figure out the solution just after I post a question, Murpfy's law ... but
my post left hanging in my email ... :(

The problem is in CGI->img method.

Instead of  print $cgi->img({-src=>$url,-usemap=>"#$mapname"});

I should have used: rint $cgi->img({-src=>$url,-usemap=>"#$mapname",
-border=>undef});

Thanks anyways for your help.

Cheers,

Jelena

On 6/1/06, Lincoln Stein <lstein at cshl.edu> wrote:
>
> Hi,
>
> The border is coming from the HTML <img. To get rid of it, set -border=>0
> in
> the img() call.
>
> Lincoln
>
>
>
> On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> > Hello everybody,
> >
> > does anybody know how to remove the background color of the Panel.
> > Currently, I am not adding anything to it, so I can troubleshot the
> > problem, and I have tried setting up
> > all color attributes I could find to the panel, but no luck. Whatever I
> do,
> > I get the BLUE border of the panel.
> >
> > Has anybody faced the same problem?
> >
> > Thanks in advance,
> >
> > Jelena
> >
> > And here is the code I am currently using:
> >
> >
> ---------------------------------------------------------------------------
> >-------------------------------- my $panel =
> >     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
> >                               -width => 800,
> >                               -pad_left => 10,
> >                               -pad_right => 10,
> >                               -key_color => 'white',
> >                               -bgcolor => 'white',
> >                               -gridcolor=>'black',
> >                               -fgcolor => 'black',
> >                               -grid => 0,
> >                               );
> >    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url'
> ,
> >      -url  => '/tmpimages');
> >    #make clickable image
> >    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
> >    print $map;
> >
> >
> ---------------------------------------------------------------------------
> >--------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>

From osborne1 at optonline.net  Thu Jun  1 15:36:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:36:27 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>
Message-ID: <C0A4B9FB.8A71%osborne1@optonline.net>

Chris,

Right - how would this be done?

Brian O.


On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> I still like the idea of just having a simple conversion from wiki->txt
> direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 15:44:13 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:44:13 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <C0A4BBCD.8A74%osborne1@optonline.net>

Jay,

You asked about the doc/ directory. The only directory I see in my
bioperl-live/doc directory is examples/, the reason this remains is that it
contains scripts and images related to the Graphics HOWTO, in theory these
could be moved to the Wiki and the examples/ directory deleted. One
explanation for why you see doc/html and all those other dirs is that you
aren't using the 'cvs -d' option (there are other explanations) when you
update.

If examples/ is removed then presumably the README can be removed and
makedoc.pl moved elsewhere.

Brian O.


On 5/31/06 9:54 PM, "Jay Hannah" <jay at jays.net> wrote:

> Brian Osborne wrote:
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
> 
> We certainly wouldn't want to try to maintain two copies, one POD one in wiki.
> That would be the worst of all options. One option that hasn't been mentioned
> yet is to keep maintenance of that in POD in the distro (leaving the cool
> runability alone), and then flag that document as unchangeable in the wiki
> with a note on top "Maintenance of this document is done in POD in the distro.
> Submit POD patches to bioperl-l and we'll re-post an updated copy to this
> wiki."
> 
> Just a thought.
> 
>> - What do we do with the script part of bptutorial.pl? It certainly could be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?
> 
> doc/faq        (empty?)
> doc/howto      
> doc/howto/examples
> doc/howto/figs (empty?)
> doc/howto/html (empty?)
> doc/howto/pdf  (empty?)
> doc/howto/sgml (empty?)
> doc/howto/txt  (empty?)
> doc/howto/xml  (empty?)
> 
> Does all that stuff officially live in and is being changed in the wiki, never
> to return to the distro?
> 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
>> Jay, looks like there are still some weird formatting issues with the
>> bptutorial wiki page, something which I ran into before when getting the
>> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
>> spaces preceding a line denotes code for some reason).  Not much you can do
>> in these cases except remove the extra spaces in those spots.  Looking good
>> though!  
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know nothing
> about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran
> off to work. :)
> 
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should
>> point to the Tutorials page instead of the Bptutorial.pl page please let
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials
> 
> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on the
> wiki but static?
> 
> Why isn't the short "Current events" just listed on the top of the "News"
> page?
> 
> Sick of my endless questions yet? -grin-
> 
> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun  1 15:47:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 14:47:40 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4B9FB.8A71%osborne1@optonline.net>
Message-ID: <001301c685b4$3dbfb820$15327e82@pyrimidine>

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 2:36 PM
> To: Chris Fields; 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> Right - how would this be done?

I'll look into a few of the wiki converters, there are a few things that
claim to convert wiki to other formats (and vice versa).  It may not be
direct, though.  I'll post anything if I figure something out.

Chris
 
> Brian O.
> 
> 
> On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > I still like the idea of just having a simple conversion from wiki->txt
> > direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 15:45:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:45:39 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E73F5.40403@jays.net>
Message-ID: <C0A4BC23.8A75%osborne1@optonline.net>

Jay,

Yes, good idea, thank you for volunteering.

Brian O.


On 6/1/06 12:58 AM, "Jay Hannah" <jay at jays.net> wrote:

> I hereby volunteer to strip the code out of bptutorial.pl and put it wherever.
> Where should I put it when I'm done? (examples/tutorial.pl?)


From hubert.prielinger at gmx.at  Thu Jun  1 16:33:45 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 01 Jun 2006 14:33:45 -0600
Subject: [Bioperl-l] remoteblast xml problem
Message-ID: <447F4F29.9070600@gmx.at>

hi,
I have the following program and it worked quite well, for retrieving 
remoteblast results in a textfile,
now I have altered it to to xml, and it didn't work anymore.....
it takes all the parameter at the commandline, submits the query, but I 
don't retrieve any results file anymore.....

it seems that it hangs in a endless loop......
the only output I get is:  $rc is not a ref! over and over..... it 
doesn't enter the else term anymore....

every help is appreciated, thanks in advance


#!/usr/bin/perl -w

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use IO::String;
use Bio::SearchIO;


#use lib qw(/usr/local/bioperl/bioperl-1.5.1);

print "Please insert database:\t";
my $db_STD = <STDIN>;
chomp $db_STD;

print "Please insert matrix:\t";
my $matrix_STD = <STDIN>;
chomp $matrix_STD;

print "Please insert count:\t";
my $count_STD = <STDIN>;
chomp $count_STD;

print "Please insert gapcosts:\t";
my $gapcosts_STD = <STDIN>;
chomp $gapcosts_STD;

my $prog   = 'blastp';
my $db     = $db_STD;           
my $e_val  = '20000';
my $matrix = $matrix_STD;               
my $wordSize = '2';


my @data;
my $line_dataArray;
my $rid;
my $count = $count_STD;           
my @params = (
  '-prog'   => $prog,
  '-data'   => $db,
  '-expect' => $e_val,
  '-MATRIX_NAME' => $matrix,
  '-readmethod' => 'xml',
  '-WORD_SIZE' => $wordSize,
);

my $seqio_obj = Bio::SeqIO->new(
  -file   => "aloneblosum62.txt",
  -format => "raw",
);

print "entering blast....";

my $xmlFactory = Bio::Tools::Run::RemoteBlast->new(@params);


$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
    $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = 
$gapcosts_STD;                   
    $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000';
     $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'ALIGNMENTS'} = '1000';
    $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML';
   

print "Blast entered successfully \n";

while ( my $query = $seqio_obj->next_seq ) {
  print "submit Sequence...just do it....\n";
 
  my $r = $xmlFactory->submit_blast($query);
  print $query->seq;
  print "\n";
 
 
#    sleep 30;

  # Wait for the reply and save the output file
  print "entering while loop for saving Output.... \n";
 
  while ( my @rids = $xmlFactory->each_rid ) {
      foreach my $rid (@rids) {
           
          my $rc = $xmlFactory->retrieve_blast($rid);
          if ( !ref($rc) ) {
              print '$rc is not a ref!', "\n";
              if ( $rc < 0 ) {
                  print "Remove rid ...\n";
                  $xmlFactory->remove_rid($rid);
              }
              # sleep 5;
          }
          else {

              print "retrieved Results successfully \n";
              print $rid;
              print "\n";
              my $filename = "comp80swiss$count.xml";
              $xmlFactory->save_output($filename);
              print "File saved successfully \n";
              my $checkinput = $xmlFactory->file;
              open(my $fh,"<$checkinput") or die $!;
              while(<$fh>){
                print;
              }
              close $fh;
              $count++;
              $xmlFactory->remove_rid($rid);
          }
      }
      print "\n";
      print "\n";

  }
}


From emmanuel.quevillon at versailles.inra.fr  Thu Jun  1 17:15:42 2006
From: emmanuel.quevillon at versailles.inra.fr (Emmanuel Quevillon)
Date: Thu, 01 Jun 2006 23:15:42 +0200
Subject: [Bioperl-l] How to submit new module?
Message-ID: <447F58FE.7020603@versailles.inra.fr>

Hi,

I just created some new parsers for TargetP, TandemRepeatFinder and
RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
like to know the differents steps procedure to submit them to BioPerl
and to be integrated in the next release (I hope)?
Is there any documentation about it?

Thanks

-- 
Emmanuel

---------------------------------------------------------------------
Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>

INRA-URGI / Bayer CropScience
523 Place des Terrasses             http://www.infobiogen.fr
91000 EVRY                          http://urgi.infobiogen.fr
Tel : 01 60 87 37 42                http://www.bayercropscience.com

PGP public key server : http://pgp.mit.edu/
Key ID : 0x0B84357F
---------------------------------------------------------------------

From cjfields at uiuc.edu  Thu Jun  1 17:36:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:36:05 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447F3BA7.9030500@cornell.edu>
Message-ID: <001b01c685c3$63840070$15327e82@pyrimidine>

Genevieve, 

seek() won't work here; all the file IO is handled through Bio::Root::IO
methods.  The SearchIO system is set up like an XML SAX parser so if you
want to save objects as they come you'll have to store the object refs in an
array, like so:

my @hsps;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      while ($hsp = $hit->next_hsp) {
         push @hsps, $hsp;
      }
   }
}

Or similarly with hits: 

my @hits;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      push @hits, $hit;
   }
}

Or you could use more complex data structures (array of arrays) as Sendu
suggested.  You should be able to sort like anything else by calling methods
within the sort:

# total number of hsps
my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;

# if you really like your accessions in alphabetical order
my @sorted = sort {$a->accession cmp $b->accession} @hits;

Then if you wanted to print later you could sort based on something else,
like the score:

my @sort_score = sort {$a->score <=> $b->score} @hits;

So you would end up with something like the following subroutines:

sub sort_results{
   my $report = shift;
   while($result = $report->next_result()){
      while(my $hit = $result->next_hit()){
         push @hits, $hit;
      }
   }
   my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;
   print $_->accession,"\t",$_->num_hsps,"\n" for @sorted;
}

sub print_blast_results{
   my $report = shift;
   my @sort_score = sort {$a->score <=> $b->score} @hits;
   for my $h (@sort_score) {
      while (my $hsp = $h->next_hsp) {
         # might use something else here like hit->name or accession,
         # not sure what you want
         my $q_name = $hsp->seq_id; 
         print join(", ",$q_name,$h->name,$hsp->bits)."\n";
         }
   }
}


Just so you know, I couldn't get display_id or display_name to work when
using the Bio::Search::HSP::GenericHSP object.  Your results may vary.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Genevieve DeClerck
> Sent: Thursday, June 01, 2006 2:11 PM
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Problem solved, albeit, in a slightly hacky way.
> 
> I tried to make seek() work for a good long while with the SearchIO
> blast results object, but I just couldn't get it to work. (Probably b/c
> seek wants to see a genuine file handle-- not a SearchIO filehandle.) I
> used SearchIO's fh() to get the handle and could while(<$fh>) through
> the data but when I used seek($fh,0,0) to reset the cursor position in
> the handle in prep for another loop, i got an error complaining about my
> use of seek() by indicating that "SEEK" could not be found in Seekable.pm.
> 
> I concluded that it was not going to be possible and instead made an
> array if SeqFeature objects which contain all the relevant blast output
> data (i.e. the m8/hit table stuff).
> 
> It still seems unfortunate that one can't reuse the SearchIO object for
> cases when the SearchIO blast report needs to be accessed mltiple times.
> 
> Thanks for your help,
> Genevieve
> 
> 
> 
> Sendu Bala wrote:
> 
> > Genevieve DeClerck wrote:
> >
> >>Thanks for your comment Sendu, it was very helpful. I think this must be
> >>what's going on.. I am using $blast_report->next_result in both
> >>subroutines. It appears that analyzing the blast results first w/ my
> >>sort subroutine empties (?) the $blast_result object so that when I try
> >>to print, there is nothing left to print. (and visa-versa when I print
> >>first then try to sort).
> >>So, from the looks of things, using next_result has the effect of
> >>popping the Bio::Search::Result::ResultI objects off of the SearchIO
> >>blast report object??
> >
> >
> > Not quite. It's more or less exactly like opening a file and then trying
> > to read it all twice like this:
> > open(FILE, "file");
> > while (<FILE>) {
> >      print # prints each line in the file
> > }
> > while (<FILE>) {
> >      print # never happens, we never enter this while loop
> > }
> >
> > To get the second while loop to print anything we need to say seek(FILE,
> > 0, 0) before it. Or in the first while loop store each line in an array,
> > and then make the second loop a foreach through that array.
> >
> >
> >
> >>It seems I could get around this by making a copy of the blast report by
> >>setting it to another new variable...(not the most elegant solution) but
> >>I'm having trouble with this...
> >>
> >>If I do:
> >>
> >>    my $blast_report_copy = $blast_report;
> >>
> >>I'm just copying the reference to the SearchIO blast result, so it
> >>doesn't help me. How can I make another physical copy of this blast
> >>result object? Seems like a simple thing but how to do it is escaping
> me.
> >
> >
> > Not really a good idea, and it may not work anyway if the object
> > contains a filehandle. But for a simple object you might recursively
> > loop through the data structure and copy each element out into a similar
> > data structure.
> >
> >
> >
> >>But better yet, the way to go is to 'reset the counter,' or to find a
> >>way to look at/print/sort the results without removing data from the
> >>blast result object. How is this done though??
> >
> >
> > It would be rather nice if this worked:
> > my $blast_report = $factory->blastall($ref_seq_objs);
> > my $blast_fh = $blast_report->fh();
> > while (<$blast_fh>) {
> >      # $_ is a ResultI object, use as normal
> > }
> > seek($blast_fh, 0, 0); # this would be great, but does it work?
> > while <$blast_fh>) {
> >      # go through the results again in your second subroutine
> > }
> >
> > An alternative hacky way of doing it, which may also not work, would be
> > to go through your $blast_report as normal, but then before going
> > through it a second time, say
> > my $fh = $blast_report->_fh;
> > seek($fh, 0, 0);
> >
> > Finally, the most sensible way (assuming bioperl provides no methods of
> > its own for this) of solving the problem is, the first time you go
> > through each next_result, next_hit and next_hsp, just store the returned
> > objects in an array of arrays of arrays. Then the second time get the
> > objects from your array structure instead of with the method calls.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 17:49:30 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 16:49:30 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
References: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <447F60EA.1050608@campus.iztacala.unam.mx>

Hi Emmanuel,

Take a look into the BioPerl FAQ:

http://bioperl.org/wiki/FAQ

It contains some info that will guide you through the appropriate steps 
depending on your situation.

Regards,
Mauricio.

Emmanuel Quevillon wrote:
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 17:47:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:47:11 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <001c01c685c4$f01e7550$15327e82@pyrimidine>

The Bioperl FAQ on the wiki answers this:

http://www.bioperl.org/wiki/FAQ#I.27ve_got_an_idea_for_a_module_how_do_I_con
tribute_it.3F

Basically, you've already done the first step, but you might want to
resubmit the email in a different form, with something about "New parsers
for TargetP, TandemRepeatFinder and RepeatMasker" in the Subject line to get
more input about those from the users-at-large.  

BTW, there is already a Bio::Tools::RepeatMasker, so you should check it out
to make sure there isn't any redundancy between your version and the
bioperl-live version.  The developers may be reluctant to replace the
bioperl-live version with yours to prevent API problems with end users,
unless you provide some serious justification (like the current one is
broken, not complete, etc).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Emmanuel Quevillon
> Sent: Thursday, June 01, 2006 4:16 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] How to submit new module?
> 
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 
> --
> Emmanuel
> 
> ---------------------------------------------------------------------
> Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>
> 
> INRA-URGI / Bayer CropScience
> 523 Place des Terrasses             http://www.infobiogen.fr
> 91000 EVRY                          http://urgi.infobiogen.fr
> Tel : 01 60 87 37 42                http://www.bayercropscience.com
> 
> PGP public key server : http://pgp.mit.edu/
> Key ID : 0x0B84357F
> ---------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  2 03:52:07 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 2 Jun 2006 09:52:07 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <001201c685a3$59d78da0$15327e82@pyrimidine>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
Message-ID: <200606020952.08034.heikki@sanbi.ac.za>

I've started going through the files that have 'return undef' lines.
I'll report back later.

Initial impression is that there are a few cases where the context indicates 
list to be returned but failure returns an explicit undef. I'll fix those.

Most of the cases are much more ambiguous. Even when documentation says the 
failure returns undef, it is clearly meant to mean false. In most cases 
documentation does not comment on return value at all. Luckily the context is 
almost always scalar and therefore it does not matter too much.

I seem to be changing 'return undef' to plain 'return' a bit overzealously, so 
do not take it personally.

	-Heikki

On Thursday 01 June 2006 19:46, Chris Fields wrote:
> ....
>
> > > Again, didn't do that.
> >
> > I'm very sorry that I allowed the ambiguity, but my comments were
> > certainly not directed at your recent changes to Bio::Restriction::IO.
> > In fact, I put in the above * comment to exclude your changes from my
> > discussion; you changed the docs because the code never did what they
> > said they did (the docs were bad). That's fine (good!). My comments were
> > a general point, slightly directed at the idea of changing all the
> > return undef;s - changing the code so that it no longer matches the docs
> > of a previously working method. That's what I think is bad. Though in
> > this particular case it shouldn't make any difference at all.
>
> Agreed.  In any case, if tests have been properly set up then they should
> catch problems.  This is, of course, if they are properly set up.
>
> Chris
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From sb at mrc-dunn.cam.ac.uk  Fri Jun  2 05:04:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 02 Jun 2006 10:04:18 +0100
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447F4F29.9070600@gmx.at>
References: <447F4F29.9070600@gmx.at>
Message-ID: <447FFF12.506@mrc-dunn.cam.ac.uk>

Hubert Prielinger wrote:
> hi,
> I have the following program and it worked quite well, for retrieving 
> remoteblast results in a textfile,
> now I have altered it to to xml, and it didn't work anymore.....
> it takes all the parameter at the commandline, submits the query, but I 
> don't retrieve any results file anymore.....
> 
> it seems that it hangs in a endless loop......
> the only output I get is:  $rc is not a ref! over and over..... it 
> doesn't enter the else term anymore....

There is no problem with your code. The problem is with the NCBI server 
and should be reported to them. You can visit the site and do a blast, 
requesting xml format, and you will typically get one normal 'waiting' 
message and the promise that it will be updated in x seconds, but 
subsequent attempts to get progress information result in an xml error 
page because the NCBI server doesn't actually send any data.

Unfortunately the way that the bioperl code is written, it treats no 
data as 'waiting' instead of an error. I've offered a patch to fix this 
at this bug page:
http://bugzilla.bioperl.org/show_bug.cgi?id=2015

From cjfields at uiuc.edu  Fri Jun  2 10:30:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 09:30:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447FFF12.506@mrc-dunn.cam.ac.uk>
Message-ID: <001a01c68651$12925250$15327e82@pyrimidine>

Sendu, Hubert,


Hubert, your code looks fine so Sendu's patch should fix the problem (break
out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
works.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, June 02, 2006 4:04 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> Hubert Prielinger wrote:
> > hi,
> > I have the following program and it worked quite well, for retrieving
> > remoteblast results in a textfile,
> > now I have altered it to to xml, and it didn't work anymore.....
> > it takes all the parameter at the commandline, submits the query, but I
> > don't retrieve any results file anymore.....
> >
> > it seems that it hangs in a endless loop......
> > the only output I get is:  $rc is not a ref! over and over..... it
> > doesn't enter the else term anymore....
> 
> There is no problem with your code. The problem is with the NCBI server
> and should be reported to them. You can visit the site and do a blast,
> requesting xml format, and you will typically get one normal 'waiting'
> message and the promise that it will be updated in x seconds, but
> subsequent attempts to get progress information result in an xml error
> page because the NCBI server doesn't actually send any data.
> 
> Unfortunately the way that the bioperl code is written, it treats no
> data as 'waiting' instead of an error. I've offered a patch to fix this
> at this bug page:
> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  2 15:13:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 14:13:31 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
Message-ID: <000301c68678$a3cdaa40$15327e82@pyrimidine>

Heikki,

I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
when running AlignIO.t (I was fixing bug 2000):

http://bugzilla.open-bio.org/show_bug.cgi?id=2016

Not sure what's going on there but using read_aln and write_aln seem to work
normally.  It may have something to do with Bio::SimpleAlign but I'm not
absolutely sure.

Any ideas what may be going on here?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Fri Jun  2 17:11:41 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 15:11:41 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <001a01c68651$12925250$15327e82@pyrimidine>
References: <001a01c68651$12925250$15327e82@pyrimidine>
Message-ID: <4480A98D.6010501@gmx.at>

hi,
sorry, but I have updated the remoteblast module and I have run several 
attempts with the same results as before. It didn't work.
I didn't get any results.

regards
Hubert


Chris Fields wrote:
> Sendu, Hubert,
>
>
> Hubert, your code looks fine so Sendu's patch should fix the problem (break
> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
> works.  
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Friday, June 02, 2006 4:04 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> Hubert Prielinger wrote:
>>     
>>> hi,
>>> I have the following program and it worked quite well, for retrieving
>>> remoteblast results in a textfile,
>>> now I have altered it to to xml, and it didn't work anymore.....
>>> it takes all the parameter at the commandline, submits the query, but I
>>> don't retrieve any results file anymore.....
>>>
>>> it seems that it hangs in a endless loop......
>>> the only output I get is:  $rc is not a ref! over and over..... it
>>> doesn't enter the else term anymore....
>>>       
>> There is no problem with your code. The problem is with the NCBI server
>> and should be reported to them. You can visit the site and do a blast,
>> requesting xml format, and you will typically get one normal 'waiting'
>> message and the promise that it will be updated in x seconds, but
>> subsequent attempts to get progress information result in an xml error
>> page because the NCBI server doesn't actually send any data.
>>
>> Unfortunately the way that the bioperl code is written, it treats no
>> data as 'waiting' instead of an error. I've offered a patch to fix this
>> at this bug page:
>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri Jun  2 17:54:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 16:54:20 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480A98D.6010501@gmx.at>
Message-ID: <000001c6868f$1b68dbe0$15327e82@pyrimidine>

Hubert, 

Could you post this on bugzilla with your script and test data so I can try
to replicate you error?  I may not get to it until Monday.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, June 02, 2006 4:12 PM
> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> sorry, but I have updated the remoteblast module and I have run several
> attempts with the same results as before. It didn't work.
> I didn't get any results.
> 
> regards
> Hubert
> 
> 
> Chris Fields wrote:
> > Sendu, Hubert,
> >
> >
> > Hubert, your code looks fine so Sendu's patch should fix the problem
> (break
> > out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
> CVS;
> > it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
> it
> > works.
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >> Sent: Friday, June 02, 2006 4:04 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>
> >> Hubert Prielinger wrote:
> >>
> >>> hi,
> >>> I have the following program and it worked quite well, for retrieving
> >>> remoteblast results in a textfile,
> >>> now I have altered it to to xml, and it didn't work anymore.....
> >>> it takes all the parameter at the commandline, submits the query, but
> I
> >>> don't retrieve any results file anymore.....
> >>>
> >>> it seems that it hangs in a endless loop......
> >>> the only output I get is:  $rc is not a ref! over and over..... it
> >>> doesn't enter the else term anymore....
> >>>
> >> There is no problem with your code. The problem is with the NCBI server
> >> and should be reported to them. You can visit the site and do a blast,
> >> requesting xml format, and you will typically get one normal 'waiting'
> >> message and the promise that it will be updated in x seconds, but
> >> subsequent attempts to get progress information result in an xml error
> >> page because the NCBI server doesn't actually send any data.
> >>
> >> Unfortunately the way that the bioperl code is written, it treats no
> >> data as 'waiting' instead of an error. I've offered a patch to fix this
> >> at this bug page:
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri Jun  2 19:19:40 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 17:19:40 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <000001c68691$8c4eeb40$15327e82@pyrimidine>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
Message-ID: <4480C78C.1000701@gmx.at>

hi,
I have submitted the bug -> Bug 2017
with the script and input file, just start it from command line

thank you very much
greetings

Hubert

Chris Fields wrote:
> Hubert,
>
> I have a script that's using blastxml and XML output which seems to work.
> I'll try looking at it to get a better idea this weekend.
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, June 02, 2006 4:12 PM
>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> sorry, but I have updated the remoteblast module and I have run several
>> attempts with the same results as before. It didn't work.
>> I didn't get any results.
>>
>> regards
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Sendu, Hubert,
>>>
>>>
>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>       
>> (break
>>     
>>> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
>>>       
>> CVS;
>>     
>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
>>>       
>> it
>>     
>>> works.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> Hubert Prielinger wrote:
>>>>
>>>>         
>>>>> hi,
>>>>> I have the following program and it worked quite well, for retrieving
>>>>> remoteblast results in a textfile,
>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>> it takes all the parameter at the commandline, submits the query, but
>>>>>           
>> I
>>     
>>>>> don't retrieve any results file anymore.....
>>>>>
>>>>> it seems that it hangs in a endless loop......
>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>> doesn't enter the else term anymore....
>>>>>
>>>>>           
>>>> There is no problem with your code. The problem is with the NCBI server
>>>> and should be reported to them. You can visit the site and do a blast,
>>>> requesting xml format, and you will typically get one normal 'waiting'
>>>> message and the promise that it will be updated in x seconds, but
>>>> subsequent attempts to get progress information result in an xml error
>>>> page because the NCBI server doesn't actually send any data.
>>>>
>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>> data as 'waiting' instead of an error. I've offered a patch to fix this
>>>> at this bug page:
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at uiuc.edu  Fri Jun  2 20:33:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:33:48 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480C78C.1000701@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
Message-ID: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>

You need to add the input conditions as well (you have several  
<STDIN> lines which may play a role; I would like to know what you  
normally enter for those).

How long did you let the script run?  I ran a quick check on your  
sequences; you have almost 1600, so you have to expect that you'll  
run into some problems here!  Most here (including me) would suggest  
you try installing a local blast setup for something like this.

Chris

On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:

> hi,
> I have submitted the bug -> Bug 2017
> with the script and input file, just start it from command line
>
> thank you very much
> greetings
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> I have a script that's using blastxml and XML output which seems  
>> to work.
>> I'll try looking at it to get a better idea this weekend.
>>
>> Chris
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>> Sent: Friday, June 02, 2006 4:12 PM
>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>
>>> hi,
>>> sorry, but I have updated the remoteblast module and I have run  
>>> several
>>> attempts with the same results as before. It didn't work.
>>> I didn't get any results.
>>>
>>> regards
>>> Hubert
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> Sendu, Hubert,
>>>>
>>>>
>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>> problem
>>>>
>>> (break
>>>
>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>> RemoteBlast in
>>>>
>>> CVS;
>>>
>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to  
>>>> see if
>>>>
>>> it
>>>
>>>> works.
>>>>
>>>> Chris
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi,
>>>>>> I have the following program and it worked quite well, for  
>>>>>> retrieving
>>>>>> remoteblast results in a textfile,
>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>> query, but
>>>>>>
>>> I
>>>
>>>>>> don't retrieve any results file anymore.....
>>>>>>
>>>>>> it seems that it hangs in a endless loop......
>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>> over..... it
>>>>>> doesn't enter the else term anymore....
>>>>>>
>>>>>>
>>>>> There is no problem with your code. The problem is with the  
>>>>> NCBI server
>>>>> and should be reported to them. You can visit the site and do a  
>>>>> blast,
>>>>> requesting xml format, and you will typically get one normal  
>>>>> 'waiting'
>>>>> message and the promise that it will be updated in x seconds, but
>>>>> subsequent attempts to get progress information result in an  
>>>>> xml error
>>>>> page because the NCBI server doesn't actually send any data.
>>>>>
>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>> treats no
>>>>> data as 'waiting' instead of an error. I've offered a patch to  
>>>>> fix this
>>>>> at this bug page:
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Fri Jun  2 20:49:15 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 18:49:15 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
Message-ID: <4480DC8B.7070005@gmx.at>

hi,
input database: swissprot
         matrix: pam30
         count: 1
         gapcosts: 9 1

I know that there are  a lot of sequences, but that doesn't matter, you 
can delete all of them except one, the amount of the sequences is not 
the problem, the script reads one line and submits it.....then the 
second line and so on.....I have tried it with only one sequence either 
and I got the same result.... the script run at that time for more than 
20 minutes!!!!!! .....and that should be enough time to retrieve the 
results for ONE sequence, I guess

regards
Hubert


Chris Fields wrote:
> You need to add the input conditions as well (you have several <STDIN> 
> lines which may play a role; I would like to know what you normally 
> enter for those).
>
> How long did you let the script run?  I ran a quick check on your 
> sequences; you have almost 1600, so you have to expect that you'll run 
> into some problems here!  Most here (including me) would suggest you 
> try installing a local blast setup for something like this.
>
> Chris
>
> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>
>> hi,
>> I have submitted the bug -> Bug 2017
>> with the script and input file, just start it from command line
>>
>> thank you very much
>> greetings
>>
>> Hubert
>>
>> Chris Fields wrote:
>>> Hubert,
>>>
>>> I have a script that's using blastxml and XML output which seems to 
>>> work.
>>> I'll try looking at it to get a better idea this weekend.
>>>
>>> Chris
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> hi,
>>>> sorry, but I have updated the remoteblast module and I have run 
>>>> several
>>>> attempts with the same results as before. It didn't work.
>>>> I didn't get any results.
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>> Sendu, Hubert,
>>>>>
>>>>>
>>>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>>>
>>>> (break
>>>>
>>>>> out of that infinite loop).  I applied Sendu's patch to 
>>>>> RemoteBlast in
>>>>>
>>>> CVS;
>>>>
>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to 
>>>>> see if
>>>>>
>>>> it
>>>>
>>>>> works.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> Hubert Prielinger wrote:
>>>>>>
>>>>>>
>>>>>>> hi,
>>>>>>> I have the following program and it worked quite well, for 
>>>>>>> retrieving
>>>>>>> remoteblast results in a textfile,
>>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>>> it takes all the parameter at the commandline, submits the 
>>>>>>> query, but
>>>>>>>
>>>> I
>>>>
>>>>>>> don't retrieve any results file anymore.....
>>>>>>>
>>>>>>> it seems that it hangs in a endless loop......
>>>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>>>> doesn't enter the else term anymore....
>>>>>>>
>>>>>>>
>>>>>> There is no problem with your code. The problem is with the NCBI 
>>>>>> server
>>>>>> and should be reported to them. You can visit the site and do a 
>>>>>> blast,
>>>>>> requesting xml format, and you will typically get one normal 
>>>>>> 'waiting'
>>>>>> message and the promise that it will be updated in x seconds, but
>>>>>> subsequent attempts to get progress information result in an xml 
>>>>>> error
>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>
>>>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>>>> data as 'waiting' instead of an error. I've offered a patch to 
>>>>>> fix this
>>>>>> at this bug page:
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Fri Jun  2 20:57:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:57:37 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480DC8B.7070005@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
Message-ID: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>

Yes, I see the same error you do.  But I have a similar script  
(blastp, XML blast report, XML parsing, similar loop structure) that  
works fine.  I'm trying to dissect the problem but I think it may be  
something logically wrong here (something not so obvious) and not a  
bug...

What I'm trying to say is, when you send sequences using remoteblast  
like, this you are essentially spamming the NCBI BLAST server with  
~1600 requests.  This script wasn't set up with that intent in mind;  
you should really try to set up your own local blast database if  
possible.  If you can't, try running this script in off-hours  
(10pm-6am EST or something like that).


Chris

On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:

> hi,
> input database: swissprot
>         matrix: pam30
>         count: 1
>         gapcosts: 9 1
>
> I know that there are  a lot of sequences, but that doesn't matter,  
> you can delete all of them except one, the amount of the sequences  
> is not the problem, the script reads one line and submits  
> it.....then the second line and so on.....I have tried it with only  
> one sequence either and I got the same result.... the script run at  
> that time for more than 20 minutes!!!!!! .....and that should be  
> enough time to retrieve the results for ONE sequence, I guess
>
> regards
> Hubert
>
>
>
> Chris Fields wrote:
>> You need to add the input conditions as well (you have several  
>> <STDIN> lines which may play a role; I would like to know what you  
>> normally enter for those).
>>
>> How long did you let the script run?  I ran a quick check on your  
>> sequences; you have almost 1600, so you have to expect that you'll  
>> run into some problems here!  Most here (including me) would  
>> suggest you try installing a local blast setup for something like  
>> this.
>>
>> Chris
>>
>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I have submitted the bug -> Bug 2017
>>> with the script and input file, just start it from command line
>>>
>>> thank you very much
>>> greetings
>>>
>>> Hubert
>>>
>>> Chris Fields wrote:
>>>> Hubert,
>>>>
>>>> I have a script that's using blastxml and XML output which seems  
>>>> to work.
>>>> I'll try looking at it to get a better idea this weekend.
>>>>
>>>> Chris
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>> Bala'
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> hi,
>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>> several
>>>>> attempts with the same results as before. It didn't work.
>>>>> I didn't get any results.
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Sendu, Hubert,
>>>>>>
>>>>>>
>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>> problem
>>>>>>
>>>>> (break
>>>>>
>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>> RemoteBlast in
>>>>>>
>>>>> CVS;
>>>>>
>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>> to see if
>>>>>>
>>>>> it
>>>>>
>>>>>> works.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>> hi,
>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>> retrieving
>>>>>>>> remoteblast results in a textfile,
>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>> anymore.....
>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>> query, but
>>>>>>>>
>>>>> I
>>>>>
>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>
>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>> over..... it
>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>
>>>>>>>>
>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>> NCBI server
>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>> a blast,
>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>> 'waiting'
>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>> but
>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>> xml error
>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>
>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>> treats no
>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>> to fix this
>>>>>>> at this bug page:
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Fri Jun  2 21:36:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 19:36:42 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
Message-ID: <4480E7AA.3020603@gmx.at>

hi chris,
thanks but I never intended to run the remoteblast with so much, only a 
few of them, acutally I goal is to run the phiblast with regular 
expression, so that i just don't need that
file anymore.

another question for parsing the xml output....is there a xml parser 
available for blast xml output or how to start.....
I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
I'm not sure how to start....sorry, I guess I'm too stupid....
is their maybe another introduction or an example.

thanks
Hubert


Chris Fields wrote:
> Yes, I see the same error you do.  But I have a similar script  
> (blastp, XML blast report, XML parsing, similar loop structure) that  
> works fine.  I'm trying to dissect the problem but I think it may be  
> something logically wrong here (something not so obvious) and not a  
> bug...
>
> What I'm trying to say is, when you send sequences using remoteblast  
> like, this you are essentially spamming the NCBI BLAST server with  
> ~1600 requests.  This script wasn't set up with that intent in mind;  
> you should really try to set up your own local blast database if  
> possible.  If you can't, try running this script in off-hours  
> (10pm-6am EST or something like that).
>
>
> Chris
>
> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>
>   
>> hi,
>> input database: swissprot
>>         matrix: pam30
>>         count: 1
>>         gapcosts: 9 1
>>
>> I know that there are  a lot of sequences, but that doesn't matter,  
>> you can delete all of them except one, the amount of the sequences  
>> is not the problem, the script reads one line and submits  
>> it.....then the second line and so on.....I have tried it with only  
>> one sequence either and I got the same result.... the script run at  
>> that time for more than 20 minutes!!!!!! .....and that should be  
>> enough time to retrieve the results for ONE sequence, I guess
>>
>> regards
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> You need to add the input conditions as well (you have several  
>>> <STDIN> lines which may play a role; I would like to know what you  
>>> normally enter for those).
>>>
>>> How long did you let the script run?  I ran a quick check on your  
>>> sequences; you have almost 1600, so you have to expect that you'll  
>>> run into some problems here!  Most here (including me) would  
>>> suggest you try installing a local blast setup for something like  
>>> this.
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>
>>>       
>>>> hi,
>>>> I have submitted the bug -> Bug 2017
>>>> with the script and input file, just start it from command line
>>>>
>>>> thank you very much
>>>> greetings
>>>>
>>>> Hubert
>>>>
>>>> Chris Fields wrote:
>>>>         
>>>>> Hubert,
>>>>>
>>>>> I have a script that's using blastxml and XML output which seems  
>>>>> to work.
>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>>> Bala'
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> hi,
>>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>>> several
>>>>>> attempts with the same results as before. It didn't work.
>>>>>> I didn't get any results.
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Sendu, Hubert,
>>>>>>>
>>>>>>>
>>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>>> problem
>>>>>>>
>>>>>>>               
>>>>>> (break
>>>>>>
>>>>>>             
>>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>>> RemoteBlast in
>>>>>>>
>>>>>>>               
>>>>>> CVS;
>>>>>>
>>>>>>             
>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>>> to see if
>>>>>>>
>>>>>>>               
>>>>>> it
>>>>>>
>>>>>>             
>>>>>>> works.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> hi,
>>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>>> retrieving
>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>>> anymore.....
>>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>>> query, but
>>>>>>>>>
>>>>>>>>>                   
>>>>>> I
>>>>>>
>>>>>>             
>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>
>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>>> over..... it
>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>>> NCBI server
>>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>>> a blast,
>>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>>> 'waiting'
>>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>>> but
>>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>>> xml error
>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>
>>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>>> treats no
>>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>>> to fix this
>>>>>>>> at this bug page:
>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Sat Jun  3 00:35:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 23:35:21 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>


On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:

> hi chris,
> thanks but I never intended to run the remoteblast with so much,  
> only a few of them, acutally I goal is to run the phiblast with  
> regular expression, so that i just don't need that
> file anymore

Not a problem.  Just to let you know, I did manage to get the script  
working, so I'm marking the bug INVALID.  I think the problem isn't  
that there is an infinite loop so much as setting composition-based  
statistics causes the search to take much much longer; try removing  
that line to see what I mean.

Just so you know, using $result->query_name doesn't get you what you  
would expect (it gives you a part of the RID, which you don't want;  
this is something in the XML output that is beyond our control).  You  
might want to change it to something else or you'll get filenames  
with numerical names.

> another question for parsing the xml output....is there a xml  
> parser available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
> but I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

Bio::SearchIO objects are used to parse BLAST XML output if you have  
it saved to a file.  For instance:

my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');

while (my $result = $factory->next_result) {
   while (my $hit = $result->next_hit) {
      while (my $hsp = $hit->next_hsp {
         #do stuff here
       }
    }
}

The only thing that changes in parsing a text BLAST report from an  
XML BLAST report is the -format line (similar to the -readmethod  
parameter in RemoteBlast).  You shouldn't need to look up any more  
documentation other than these on the wiki:

http://www.bioperl.org/wiki/HOWTO:SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml

Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
up parsing.

Chris

> thanks
> Hubert
>
>
> Chris Fields wrote:
>> Yes, I see the same error you do.  But I have a similar script   
>> (blastp, XML blast report, XML parsing, similar loop structure)  
>> that  works fine.  I'm trying to dissect the problem but I think  
>> it may be  something logically wrong here (something not so  
>> obvious) and not a  bug...
>>
>> What I'm trying to say is, when you send sequences using  
>> remoteblast  like, this you are essentially spamming the NCBI  
>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>> that intent in mind;  you should really try to set up your own  
>> local blast database if  possible.  If you can't, try running this  
>> script in off-hours  (10pm-6am EST or something like that).
>>
>>
>> Chris
>>
>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>
>>
>>> hi,
>>> input database: swissprot
>>>         matrix: pam30
>>>         count: 1
>>>         gapcosts: 9 1
>>>
>>> I know that there are  a lot of sequences, but that doesn't  
>>> matter,  you can delete all of them except one, the amount of the  
>>> sequences  is not the problem, the script reads one line and  
>>> submits  it.....then the second line and so on.....I have tried  
>>> it with only  one sequence either and I got the same result....  
>>> the script run at  that time for more than 20  
>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>> the results for ONE sequence, I guess
>>>
>>> regards
>>> Hubert
>>>
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> You need to add the input conditions as well (you have several   
>>>> <STDIN> lines which may play a role; I would like to know what  
>>>> you  normally enter for those).
>>>>
>>>> How long did you let the script run?  I ran a quick check on  
>>>> your  sequences; you have almost 1600, so you have to expect  
>>>> that you'll  run into some problems here!  Most here (including  
>>>> me) would  suggest you try installing a local blast setup for  
>>>> something like  this.
>>>>
>>>> Chris
>>>>
>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi,
>>>>> I have submitted the bug -> Bug 2017
>>>>> with the script and input file, just start it from command line
>>>>>
>>>>> thank you very much
>>>>> greetings
>>>>>
>>>>> Hubert
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Hubert,
>>>>>>
>>>>>> I have a script that's using blastxml and XML output which  
>>>>>> seems  to work.
>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>> 'Sendu  Bala'
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> hi,
>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>> run  several
>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>> I didn't get any results.
>>>>>>>
>>>>>>> regards
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Sendu, Hubert,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>> the  problem
>>>>>>>>
>>>>>>>>
>>>>>>> (break
>>>>>>>
>>>>>>>
>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>> RemoteBlast in
>>>>>>>>
>>>>>>>>
>>>>>>> CVS;
>>>>>>>
>>>>>>>
>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>> CVS  to see if
>>>>>>>>
>>>>>>>>
>>>>>>> it
>>>>>>>
>>>>>>>
>>>>>>>> works.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>
>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>> for  retrieving
>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>> anymore.....
>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>> the  query, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>
>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>
>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>> over..... it
>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>> the  NCBI server
>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>> do  a blast,
>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>> normal  'waiting'
>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>> seconds,  but
>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>> an  xml error
>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>
>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>> treats no
>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>> patch  to fix this
>>>>>>>>> at this bug page:
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 11:10:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:10:51 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149084373.447da2d5c5339@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
	<1149084373.447da2d5c5339@128.91.55.38>
Message-ID: <9206E0B2-15DC-4AB2-B71B-5EA9D1D11AEC@duke.edu>

The bootstrap is stored as the node ID because that is a limitation  
of the newick format, there isn't a formal way to distinguish  
internal IDs from bootstraps.  There are several differents ways that  
programs encode the internal ID and a bootstrap value in that one  
slot - we try and parse it out if the the bootstrap is stored in  
brackets like INTERNALID[BOOTSTRAP].

Formats like nhx explicitly solve this problem, but most programs  
only use the simple newick.  if you know your data it is a simple  
procedure to move the internal ID data into the bootstrap slot.

in terms of ignoreoverwrite you just need to send in a second  
parameter which is true
$node->add_Descendent($childnode, 1);

-jason


On May 31, 2006, at 10:06 AM, Lucia Peixoto wrote:

> Hi
> Thanks
> a couple more questions
> why is the bootstrap value stored as the node id? Is that right?
>
> also, in the add_descendant method, how do you set the  
> $ignoreoverwrite
> parameter to true?
>
> Lucia
>
> Quoting Jason Stajich <jason.stajich at duke.edu>:
>
>> you need to special case the root - it won't have an ancestor.  just
>> protect the my $parent = $node->ancestor with an if statement as I
>> did below
>>
>> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>>
>>> Hi
>>> OK that was silly, but what I have in my code is what you just wrote
>>> But the problem is that if I write
>>>
>>> $parent->add_Descendent($child)
>>>
>>> it tells me that I am calling  the method "ass_Descendent" on an
>>> undefined value
>>> (but I did define $parent before??)
>>>
>>> So here it goes the code so far:
>>>
>>> use Bio::TreeIO;
>>>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>>>                           -format => 'newick');
>>>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>>>                            -format => 'newick');
>>>  while( my $tree = $in->next_tree ) {
>>>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes 
>>> () ) {
>>>     my $bootstrap=$node->_creation_id;
>>>
>>>     if ($bootstrap < 70 ){
>>>>>> if(        my $parent = $node->ancestor ) {
>>>               my @children=$node->get_all_Descendents;
>>>               foreach my $child (@children){
>>>                  $parent->add_Descendent($child);
>>>               }
>>          }
>>>
>>> ........
>>>
>>> eventually I'll add (once I assigned the children to the parent
>>> succesfully):
>>> $tree->remove_Node($node);
>>>
>>>         }
>>>     }
>>>     $out->write_tree($tree);
>>> }
>>>
>>> Quoting aaron.j.mackey at gsk.com:
>>>
>>>>> foreach $child (@children){
>>>>>          $parent=add_Descendent->$child;
>>>>> }
>>>>
>>>> I think what you want is $parent->add_Descendent($child)
>>>>
>>>> -Aaron
>>>>
>>>
>>>
>>> Lucia Peixoto
>>> Department of Biology,SAS
>>> University of Pennsylvania
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sat Jun  3 11:29:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:29:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>

you can get all the Hits or hsps with the following method:
my @hits = $result->hits;
my @hsps = $hit->hsps;


You can also reset the counter since these implementations are in- 
memory and already parsed (and not a stream processor per se).   
next_XX just iterates through the list stored in the parent object.

$result->rewind;

   and

$hit->rewind;


For example, the rewind needs to be called if you want to use a  
ResultWriter object and filter some of the values for the final  
writing after first inspecting them.

-jason


On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:

> Thanks for your comment Sendu, it was very helpful. I think this  
> must be
> what's going on.. I am using $blast_report->next_result in both
> subroutines. It appears that analyzing the blast results first w/ my
> sort subroutine empties (?) the $blast_result object so that when I  
> try
> to print, there is nothing left to print. (and visa-versa when I print
> first then try to sort).
> So, from the looks of things, using next_result has the effect of
> popping the Bio::Search::Result::ResultI objects off of the SearchIO
> blast report object??
>
> It seems I could get around this by making a copy of the blast  
> report by
> setting it to another new variable...(not the most elegant  
> solution) but
> I'm having trouble with this...
>
> If I do:
>
> 	my $blast_report_copy = $blast_report;
>
> I'm just copying the reference to the SearchIO blast result, so it
> doesn't help me. How can I make another physical copy of this blast
> result object? Seems like a simple thing but how to do it is  
> escaping me.
>
> But better yet, the way to go is to 'reset the counter,' or to find a
> way to look at/print/sort the results without removing data from the
> blast result object. How is this done though??
>
> Sendu and Brian, I didn't post the sort_results subroutine because  
> it is
> sprawling, as is a lot of my code. The code I provided was more  
> like an
> aid for my explanation of the problem.. it doesn't actually run -  
> sorry
> for the confusion, I should have more clear on that.  The important
> thing to know perhaps is that both sort_results and  
> print_blast_results
> contain a foreach loop where I am using the 'next_results' method to
> view blast results. (And to clarify for Torsten, the blastall() is
> working just fine - the analysis/viewing of the results object is  
> where
> I am encountering the problem.)
>
>
> Any other ideas would be greatly appreciated...
>
> Thank you,
> Genevieve
>
>
>
>
> Sendu Bala wrote:
>
>> Genevieve DeClerck wrote:
>>
>>> Hi,
>>
>> [snip]
>>
>>> If I've sorted the results the sorted-results will print to screen,
>>> however when I try to print the Hit Table results nothing is  
>>> returned,
>>> as if the blast results have evaporated.... and visa versa, if i
>>> comment out the part where i point my sorting subroutine to the  
>>> blast
>>> results reference,  my hit table results suddenly prints to screen.
>>
>> [snip]
>>
>>> Here's an abbreviated version of my code:
>>
>> [snip]
>>
>>> #######
>>> ### the following 2 actions seem to be mutually exclusive.
>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>> # SeqFeature objs stored in arrays. arrays are then printed
>>> # to stdout
>>> &sort_results($blast_report);
>>>
>>> # 2) print blast results
>>> &print_blast_results($blast_report);
>>
>>
>>> sub print_blast_results{
>>>    my $report = shift;
>>>    while(my $result = $report->next_result()){
>>
>> [snip]
>>
>> You didn't give us your sort_results subroutine, but is it as  
>> simple as
>> they both use $report->next_result (and/or $result->next_hit), but  
>> you
>> don't reset the internal counter back to the start, so the second
>> subroutine tries to get the next_result and finds the first  
>> subroutine
>> has already looked at the last result and so next_result returns  
>> false?
>>
>>  From a quick look it wasn't obvious how to reset the counter.  
>> Hopefully
>> this can be done and someone else knows how.
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun  3 15:13:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 3 Jun 2006 14:13:22 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
Message-ID: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>

Nice!  Didn't know I could do that.  Maybe we should add some of this  
to the HOWTO (or is it already in there?).

Chris

On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:

> you can get all the Hits or hsps with the following method:
> my @hits = $result->hits;
> my @hsps = $hit->hsps;
>
>
> You can also reset the counter since these implementations are in-
> memory and already parsed (and not a stream processor per se).
> next_XX just iterates through the list stored in the parent object.
>
> $result->rewind;
>
>    and
>
> $hit->rewind;
>
>
> For example, the rewind needs to be called if you want to use a
> ResultWriter object and filter some of the values for the final
> writing after first inspecting them.
>
> -jason
>
>
> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>
>> Thanks for your comment Sendu, it was very helpful. I think this
>> must be
>> what's going on.. I am using $blast_report->next_result in both
>> subroutines. It appears that analyzing the blast results first w/ my
>> sort subroutine empties (?) the $blast_result object so that when I
>> try
>> to print, there is nothing left to print. (and visa-versa when I  
>> print
>> first then try to sort).
>> So, from the looks of things, using next_result has the effect of
>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>> blast report object??
>>
>> It seems I could get around this by making a copy of the blast
>> report by
>> setting it to another new variable...(not the most elegant
>> solution) but
>> I'm having trouble with this...
>>
>> If I do:
>>
>> 	my $blast_report_copy = $blast_report;
>>
>> I'm just copying the reference to the SearchIO blast result, so it
>> doesn't help me. How can I make another physical copy of this blast
>> result object? Seems like a simple thing but how to do it is
>> escaping me.
>>
>> But better yet, the way to go is to 'reset the counter,' or to find a
>> way to look at/print/sort the results without removing data from the
>> blast result object. How is this done though??
>>
>> Sendu and Brian, I didn't post the sort_results subroutine because
>> it is
>> sprawling, as is a lot of my code. The code I provided was more
>> like an
>> aid for my explanation of the problem.. it doesn't actually run -
>> sorry
>> for the confusion, I should have more clear on that.  The important
>> thing to know perhaps is that both sort_results and
>> print_blast_results
>> contain a foreach loop where I am using the 'next_results' method to
>> view blast results. (And to clarify for Torsten, the blastall() is
>> working just fine - the analysis/viewing of the results object is
>> where
>> I am encountering the problem.)
>>
>>
>> Any other ideas would be greatly appreciated...
>>
>> Thank you,
>> Genevieve
>>
>>
>>
>>
>> Sendu Bala wrote:
>>
>>> Genevieve DeClerck wrote:
>>>
>>>> Hi,
>>>
>>> [snip]
>>>
>>>> If I've sorted the results the sorted-results will print to screen,
>>>> however when I try to print the Hit Table results nothing is
>>>> returned,
>>>> as if the blast results have evaporated.... and visa versa, if i
>>>> comment out the part where i point my sorting subroutine to the
>>>> blast
>>>> results reference,  my hit table results suddenly prints to screen.
>>>
>>> [snip]
>>>
>>>> Here's an abbreviated version of my code:
>>>
>>> [snip]
>>>
>>>> #######
>>>> ### the following 2 actions seem to be mutually exclusive.
>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>> # to stdout
>>>> &sort_results($blast_report);
>>>>
>>>> # 2) print blast results
>>>> &print_blast_results($blast_report);
>>>
>>>
>>>> sub print_blast_results{
>>>>    my $report = shift;
>>>>    while(my $result = $report->next_result()){
>>>
>>> [snip]
>>>
>>> You didn't give us your sort_results subroutine, but is it as
>>> simple as
>>> they both use $report->next_result (and/or $result->next_hit), but
>>> you
>>> don't reset the internal counter back to the start, so the second
>>> subroutine tries to get the next_result and finds the first
>>> subroutine
>>> has already looked at the last result and so next_result returns
>>> false?
>>>
>>>  From a quick look it wasn't obvious how to reset the counter.
>>> Hopefully
>>> this can be done and someone else knows how.
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 15:31:59 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 15:31:59 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
Message-ID: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>

In the HOWTO hits() and hsps() were there, I just added rewind in the  
table of methods.
If someone wanted to write a little section in the HOWTO about  
resetting the iterator that would be great.

-jason
On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:

> Nice!  Didn't know I could do that.  Maybe we should add some of this
> to the HOWTO (or is it already in there?).
>
> Chris
>
> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>
>> you can get all the Hits or hsps with the following method:
>> my @hits = $result->hits;
>> my @hsps = $hit->hsps;
>>
>>
>> You can also reset the counter since these implementations are in-
>> memory and already parsed (and not a stream processor per se).
>> next_XX just iterates through the list stored in the parent object.
>>
>> $result->rewind;
>>
>>    and
>>
>> $hit->rewind;
>>
>>
>> For example, the rewind needs to be called if you want to use a
>> ResultWriter object and filter some of the values for the final
>> writing after first inspecting them.
>>
>> -jason
>>
>>
>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>
>>> Thanks for your comment Sendu, it was very helpful. I think this
>>> must be
>>> what's going on.. I am using $blast_report->next_result in both
>>> subroutines. It appears that analyzing the blast results first w/ my
>>> sort subroutine empties (?) the $blast_result object so that when I
>>> try
>>> to print, there is nothing left to print. (and visa-versa when I
>>> print
>>> first then try to sort).
>>> So, from the looks of things, using next_result has the effect of
>>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>>> blast report object??
>>>
>>> It seems I could get around this by making a copy of the blast
>>> report by
>>> setting it to another new variable...(not the most elegant
>>> solution) but
>>> I'm having trouble with this...
>>>
>>> If I do:
>>>
>>> 	my $blast_report_copy = $blast_report;
>>>
>>> I'm just copying the reference to the SearchIO blast result, so it
>>> doesn't help me. How can I make another physical copy of this blast
>>> result object? Seems like a simple thing but how to do it is
>>> escaping me.
>>>
>>> But better yet, the way to go is to 'reset the counter,' or to  
>>> find a
>>> way to look at/print/sort the results without removing data from the
>>> blast result object. How is this done though??
>>>
>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>> it is
>>> sprawling, as is a lot of my code. The code I provided was more
>>> like an
>>> aid for my explanation of the problem.. it doesn't actually run -
>>> sorry
>>> for the confusion, I should have more clear on that.  The important
>>> thing to know perhaps is that both sort_results and
>>> print_blast_results
>>> contain a foreach loop where I am using the 'next_results' method to
>>> view blast results. (And to clarify for Torsten, the blastall() is
>>> working just fine - the analysis/viewing of the results object is
>>> where
>>> I am encountering the problem.)
>>>
>>>
>>> Any other ideas would be greatly appreciated...
>>>
>>> Thank you,
>>> Genevieve
>>>
>>>
>>>
>>>
>>> Sendu Bala wrote:
>>>
>>>> Genevieve DeClerck wrote:
>>>>
>>>>> Hi,
>>>>
>>>> [snip]
>>>>
>>>>> If I've sorted the results the sorted-results will print to  
>>>>> screen,
>>>>> however when I try to print the Hit Table results nothing is
>>>>> returned,
>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>> comment out the part where i point my sorting subroutine to the
>>>>> blast
>>>>> results reference,  my hit table results suddenly prints to  
>>>>> screen.
>>>>
>>>> [snip]
>>>>
>>>>> Here's an abbreviated version of my code:
>>>>
>>>> [snip]
>>>>
>>>>> #######
>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>> # to stdout
>>>>> &sort_results($blast_report);
>>>>>
>>>>> # 2) print blast results
>>>>> &print_blast_results($blast_report);
>>>>
>>>>
>>>>> sub print_blast_results{
>>>>>    my $report = shift;
>>>>>    while(my $result = $report->next_result()){
>>>>
>>>> [snip]
>>>>
>>>> You didn't give us your sort_results subroutine, but is it as
>>>> simple as
>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>> you
>>>> don't reset the internal counter back to the start, so the second
>>>> subroutine tries to get the next_result and finds the first
>>>> subroutine
>>>> has already looked at the last result and so next_result returns
>>>> false?
>>>>
>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>> Hopefully
>>>> this can be done and someone else knows how.
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Sat Jun  3 19:54:20 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 04 Jun 2006 09:54:20 +1000
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <4482212C.3000908@infotech.monash.edu.au>

Hubert,

> another question for parsing the xml output....is there a xml parser 
> available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
> I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

I think we already answered this question for you on 20 May 2006:

http://bioperl.org/pipermail/bioperl-l/2006-May/021574.html
http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#How_to_parse_BLAST_XML_output

http://www.bioperl.org/wiki/HOWTO:SearchIO (search for "blastxml")

--Torsten Seemann


From cjfields at uiuc.edu  Sun Jun  4 01:17:46 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 00:17:46 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
Message-ID: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>

There's an interesting addition to this I found while checking this  
out; looks like if you use:

my @hits =  $result->hits;

to get all the hits, you don't need to use '$result->rewind'.  The  
rewind method resets the iterator for the hit list back back to the  
beginning, but using the hits method to grab all the hits doesn't use  
the iterator at all.  This works either pre- or post-iteration  
through the Hit::BlastHit objects.

Another thing; Genevieve was passing the SearchIO report object (i.e.  
the parser object which was returned from StandAloneBlast,  
$blast_report) to the methods, not the  
Bio::Search::Result::BlastResult object; looks like there was some  
confusion between the two object types since she refers to the report  
as the result object when it's actually the SearchIO parser object.   
So, once the parser was passed into the first method, a result object  
was generated, then destroyed.  When entering the second method, the  
parser had already read parsed the report and generated the objects,  
so it ended with no output.

Though passing the BlastResult object is better since one should only  
have to parse the report once and use the objects, for curiosity's  
sake, is there a method to rewind the parser itself (in other words,  
read through the report again)?

Chris


On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:

> In the HOWTO hits() and hsps() were there, I just added rewind in the
> table of methods.
> If someone wanted to write a little section in the HOWTO about
> resetting the iterator that would be great.
>
> -jason
> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>
>> Nice!  Didn't know I could do that.  Maybe we should add some of this
>> to the HOWTO (or is it already in there?).
>>
>> Chris
>>
>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>
>>> you can get all the Hits or hsps with the following method:
>>> my @hits = $result->hits;
>>> my @hsps = $hit->hsps;
>>>
>>>
>>> You can also reset the counter since these implementations are in-
>>> memory and already parsed (and not a stream processor per se).
>>> next_XX just iterates through the list stored in the parent object.
>>>
>>> $result->rewind;
>>>
>>>    and
>>>
>>> $hit->rewind;
>>>
>>>
>>> For example, the rewind needs to be called if you want to use a
>>> ResultWriter object and filter some of the values for the final
>>> writing after first inspecting them.
>>>
>>> -jason
>>>
>>>
>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>
>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>> must be
>>>> what's going on.. I am using $blast_report->next_result in both
>>>> subroutines. It appears that analyzing the blast results first  
>>>> w/ my
>>>> sort subroutine empties (?) the $blast_result object so that when I
>>>> try
>>>> to print, there is nothing left to print. (and visa-versa when I
>>>> print
>>>> first then try to sort).
>>>> So, from the looks of things, using next_result has the effect of
>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>> SearchIO
>>>> blast report object??
>>>>
>>>> It seems I could get around this by making a copy of the blast
>>>> report by
>>>> setting it to another new variable...(not the most elegant
>>>> solution) but
>>>> I'm having trouble with this...
>>>>
>>>> If I do:
>>>>
>>>> 	my $blast_report_copy = $blast_report;
>>>>
>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>> doesn't help me. How can I make another physical copy of this blast
>>>> result object? Seems like a simple thing but how to do it is
>>>> escaping me.
>>>>
>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>> find a
>>>> way to look at/print/sort the results without removing data from  
>>>> the
>>>> blast result object. How is this done though??
>>>>
>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>> it is
>>>> sprawling, as is a lot of my code. The code I provided was more
>>>> like an
>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>> sorry
>>>> for the confusion, I should have more clear on that.  The important
>>>> thing to know perhaps is that both sort_results and
>>>> print_blast_results
>>>> contain a foreach loop where I am using the 'next_results'  
>>>> method to
>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>> working just fine - the analysis/viewing of the results object is
>>>> where
>>>> I am encountering the problem.)
>>>>
>>>>
>>>> Any other ideas would be greatly appreciated...
>>>>
>>>> Thank you,
>>>> Genevieve
>>>>
>>>>
>>>>
>>>>
>>>> Sendu Bala wrote:
>>>>
>>>>> Genevieve DeClerck wrote:
>>>>>
>>>>>> Hi,
>>>>>
>>>>> [snip]
>>>>>
>>>>>> If I've sorted the results the sorted-results will print to
>>>>>> screen,
>>>>>> however when I try to print the Hit Table results nothing is
>>>>>> returned,
>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>> blast
>>>>>> results reference,  my hit table results suddenly prints to
>>>>>> screen.
>>>>>
>>>>> [snip]
>>>>>
>>>>>> Here's an abbreviated version of my code:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> #######
>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>> # to stdout
>>>>>> &sort_results($blast_report);
>>>>>>
>>>>>> # 2) print blast results
>>>>>> &print_blast_results($blast_report);
>>>>>
>>>>>
>>>>>> sub print_blast_results{
>>>>>>    my $report = shift;
>>>>>>    while(my $result = $report->next_result()){
>>>>>
>>>>> [snip]
>>>>>
>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>> simple as
>>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>>> you
>>>>> don't reset the internal counter back to the start, so the second
>>>>> subroutine tries to get the next_result and finds the first
>>>>> subroutine
>>>>> has already looked at the last result and so next_result returns
>>>>> false?
>>>>>
>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>> Hopefully
>>>>> this can be done and someone else knows how.
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sun Jun  4 10:08:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 4 Jun 2006 10:08:29 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
Message-ID: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>

right - you don't need rewind if you aren't going to use the iterator  
(next_XXX) -- we provide two different ways to get access to the data.
you can do
for my $hit ( $result->hits ) {

}
or
while( my $hit = $result->next_hit ) {
}


If you want to rewind the parser then (assuming you are using a  
filestream and not a data stream from the web or zcat or something)  
just reset the filehandle
seek($searchio->_fh, 0);

but then you'll have to re-parse everything and pay that cost twice -  
it makes more sense to me to just save the results and put them in  
list if you are going to deliberately make two passes over all the  
results.    You either pay the cost of memory (keeping all the  
objects) or time (reparse the results).


-jason
On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:

> There's an interesting addition to this I found while checking this  
> out; looks like if you use:
>
> my @hits =  $result->hits;
>
> to get all the hits, you don't need to use '$result->rewind'.  The  
> rewind method resets the iterator for the hit list back back to the  
> beginning, but using the hits method to grab all the hits doesn't  
> use the iterator at all.  This works either pre- or post-iteration  
> through the Hit::BlastHit objects.
>
> Another thing; Genevieve was passing the SearchIO report object  
> (i.e. the parser object which was returned from StandAloneBlast,  
> $blast_report) to the methods, not the  
> Bio::Search::Result::BlastResult object; looks like there was some  
> confusion between the two object types since she refers to the  
> report as the result object when it's actually the SearchIO parser  
> object.  So, once the parser was passed into the first method, a  
> result object was generated, then destroyed.  When entering the  
> second method, the parser had already read parsed the report and  
> generated the objects, so it ended with no output.
>
> Though passing the BlastResult object is better since one should  
> only have to parse the report once and use the objects, for  
> curiosity's sake, is there a method to rewind the parser itself (in  
> other words, read through the report again)?
>
> Chris
>
>
> On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:
>
>> In the HOWTO hits() and hsps() were there, I just added rewind in the
>> table of methods.
>> If someone wanted to write a little section in the HOWTO about
>> resetting the iterator that would be great.
>>
>> -jason
>> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>>
>>> Nice!  Didn't know I could do that.  Maybe we should add some of  
>>> this
>>> to the HOWTO (or is it already in there?).
>>>
>>> Chris
>>>
>>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>>
>>>> you can get all the Hits or hsps with the following method:
>>>> my @hits = $result->hits;
>>>> my @hsps = $hit->hsps;
>>>>
>>>>
>>>> You can also reset the counter since these implementations are in-
>>>> memory and already parsed (and not a stream processor per se).
>>>> next_XX just iterates through the list stored in the parent object.
>>>>
>>>> $result->rewind;
>>>>
>>>>    and
>>>>
>>>> $hit->rewind;
>>>>
>>>>
>>>> For example, the rewind needs to be called if you want to use a
>>>> ResultWriter object and filter some of the values for the final
>>>> writing after first inspecting them.
>>>>
>>>> -jason
>>>>
>>>>
>>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>>
>>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>>> must be
>>>>> what's going on.. I am using $blast_report->next_result in both
>>>>> subroutines. It appears that analyzing the blast results first  
>>>>> w/ my
>>>>> sort subroutine empties (?) the $blast_result object so that  
>>>>> when I
>>>>> try
>>>>> to print, there is nothing left to print. (and visa-versa when I
>>>>> print
>>>>> first then try to sort).
>>>>> So, from the looks of things, using next_result has the effect of
>>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>>> SearchIO
>>>>> blast report object??
>>>>>
>>>>> It seems I could get around this by making a copy of the blast
>>>>> report by
>>>>> setting it to another new variable...(not the most elegant
>>>>> solution) but
>>>>> I'm having trouble with this...
>>>>>
>>>>> If I do:
>>>>>
>>>>> 	my $blast_report_copy = $blast_report;
>>>>>
>>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>>> doesn't help me. How can I make another physical copy of this  
>>>>> blast
>>>>> result object? Seems like a simple thing but how to do it is
>>>>> escaping me.
>>>>>
>>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>>> find a
>>>>> way to look at/print/sort the results without removing data  
>>>>> from the
>>>>> blast result object. How is this done though??
>>>>>
>>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>>> it is
>>>>> sprawling, as is a lot of my code. The code I provided was more
>>>>> like an
>>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>>> sorry
>>>>> for the confusion, I should have more clear on that.  The  
>>>>> important
>>>>> thing to know perhaps is that both sort_results and
>>>>> print_blast_results
>>>>> contain a foreach loop where I am using the 'next_results'  
>>>>> method to
>>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>>> working just fine - the analysis/viewing of the results object is
>>>>> where
>>>>> I am encountering the problem.)
>>>>>
>>>>>
>>>>> Any other ideas would be greatly appreciated...
>>>>>
>>>>> Thank you,
>>>>> Genevieve
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sendu Bala wrote:
>>>>>
>>>>>> Genevieve DeClerck wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> If I've sorted the results the sorted-results will print to
>>>>>>> screen,
>>>>>>> however when I try to print the Hit Table results nothing is
>>>>>>> returned,
>>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>>> blast
>>>>>>> results reference,  my hit table results suddenly prints to
>>>>>>> screen.
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> Here's an abbreviated version of my code:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> #######
>>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>>> # to stdout
>>>>>>> &sort_results($blast_report);
>>>>>>>
>>>>>>> # 2) print blast results
>>>>>>> &print_blast_results($blast_report);
>>>>>>
>>>>>>
>>>>>>> sub print_blast_results{
>>>>>>>    my $report = shift;
>>>>>>>    while(my $result = $report->next_result()){
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>>> simple as
>>>>>> they both use $report->next_result (and/or $result->next_hit),  
>>>>>> but
>>>>>> you
>>>>>> don't reset the internal counter back to the start, so the second
>>>>>> subroutine tries to get the next_result and finds the first
>>>>>> subroutine
>>>>>> has already looked at the last result and so next_result returns
>>>>>> false?
>>>>>>
>>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>>> Hopefully
>>>>>> this can be done and someone else knows how.
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From cjfields at uiuc.edu  Sun Jun  4 11:51:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 10:51:53 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
Message-ID: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>


On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:

> right - you don't need rewind if you aren't going to use the  
> iterator (next_XXX) -- we provide two different ways to get access  
> to the data.
> you can do
> for my $hit ( $result->hits ) {
>
> }
> or
> while( my $hit = $result->next_hit ) {
> }
>
>
> If you want to rewind the parser then (assuming you are using a  
> filestream and not a data stream from the web or zcat or something)  
> just reset the filehandle
> seek($searchio->_fh, 0);
>
> but then you'll have to re-parse everything and pay that cost twice  
> - it makes more sense to me to just save the results and put them  
> in list if you are going to deliberately make two passes over all  
> the results.    You either pay the cost of memory (keeping all the  
> objects) or time (reparse the results).

I agree there isn't any really good reason to rewind the parser; I  
was mainly just curious how this was accomlished.  Your point about a  
memory or time hit might be a point we want to make in the HOWTO.  I  
already added some example code about rewinding the iterator and  
hits, so I'll add a bit about this.

I think a good deal of confusion here comes from not knowing how  
SearchIO works (i.e. that parsing a report can return several  
results, in turn which can return hits, in tur returning HSP's).  Of  
course that doesn't include iterations in the case of PSI-BLAST.    
The HOWTO, I think, explains this all well so it may be a matter of  
just RTM (I left the 'F' out to be a bit more polite).

Chris

> -jason
> On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:
>
...


Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Mon Jun  5 04:16:59 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 05 Jun 2006 16:16:59 +0800
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
Message-ID: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>


Dear Lincoln and experts

Curently I have a CGI application that does this:

1.  read and uploaded file 
2. check the content of the file whether fasta or not
3. print out the content of the file.


Now the problem I'm facing is that
on step three. The content of the file handled is altered
namely the very first line does not get printed. 

So for example if "test1.fasta" looks like this:

>Seq0
ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC
 
When it was printed it gives only:

ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC

Why is this happening? 

Below is the complete cgi script that 
does the task  I mentioned earlier.

Did I missed out anything in my code?


__BEGIN__
#!/usr/bin/perl -w

use CGI qw/:standard :html3/;
use CGI::Carp qw( fatalsToBrowser );
use Data::Dumper;

BEGIN {
    if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {

        # Blindly untaint.  Taintchecking is to protect
        # from Web data;
        # the environment is under our control.
        eval "use lib '$_';" foreach (
            reverse
            split( /:/, $1 )
        );
    }
}


use Bio::Tools::GuessSeqFormat;

print header,
    start_html('file upload'),
    h1('file upload!');
print_form()    unless param;
print_results() if param;
print end_html;

sub print_form {
    print start_multipart_form(),
       filefield(-name=>'upload',-size=>60),br,
       submit(-label=>'Upload File'),
       end_form;
}

sub print_results {
    my $length;
    my $file = param('upload');
    my $fh_upload = upload('upload');

    my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload );
    my $format_upload  = $guesser_upload->guess;

    if ( !$file ) {
        print "No file uploaded.";
        return;
    }
    print h2('File name'),      $file;
    print h2('Format'), $format_upload;
    print h2('The content is'),br;

    while (<$fh_upload>) {

     # The very first line of the file is not get printed here
     # Why?

        print;
        print br;
        $length += length($_);
    }
    print h2('File length'), $length;
}


__END__

Hope to hear from you again.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 05:02:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:02:48 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <4483F338.7090909@mrc-dunn.cam.ac.uk>

Wijaya Edward wrote:
> Dear Lincoln and experts
> 
> Curently I have a CGI application that does this:
> 
> 1.  read and uploaded file 
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
> 
> 
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed. 

The problem is almost certainly that the guessing is done by reading the 
first line of the filehandle, so that your subsequent while loop on that 
same filehandle starts at the second line.
Just seek the filehandle back to the start before trying to print the 
contents out.

...
my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
my $format_upload  = $guesser_upload->guess;
seek($fh_upload, 0, 0);
...
while (<$fh_upload>) {
     ...
}

An alternative might be to pass GuessSeqFormat the filename in which 
case it would make its own filehandle and close it, leaving your own 
filehandle untouched.

From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 05:57:52 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:57:52 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
Message-ID: <44840020.4020604@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
> 
>> If you want to rewind the parser then (assuming you are using a 
>> filestream and not a data stream from the web or zcat or something) 
>> just reset the filehandle
>> seek($searchio->_fh, 0);
>>
>> but then you'll have to re-parse everything and pay that cost twice - 
>> it makes more sense to me to just save the results and put them in 
>> list if you are going to deliberately make two passes over all the 
>> results.    You either pay the cost of memory (keeping all the 
>> objects) or time (reparse the results).
> 
> I agree there isn't any really good reason to rewind the parser; I was 
> mainly just curious how this was accomlished.

Didn't you already explain why seeking a SearchIO wouldn't work? And 
indeed, didn't Genevieve already try to do this after I suggested it and 
  found that it didn't work?

Confused...

From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 09:19:12 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 14:19:12 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
Message-ID: <44842F50.7090408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> 
> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
> 
>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>> indeed, didn't Genevieve already try to do this after I suggested it and
>> found that it didn't work?
>>
>> Confused...
>>
> There is an internal _rewind if you are using the next_XX methods that 
> resets the internal iterator (all the data has already been parsed).
> 
> You >>can<< reseek the internal filehandle (accessible by calling 
> $object->_fh ), but you can't call seek on the searchio object itsself.

... poor choice of words on my part. Or maybe I'm not understanding 
you... I already suggested to Genevieve that she try:

# in the following, $blast_report is a SearchIO
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);

She reported that neither way of doing it worked. You seem to be saying 
that at least the second way should have. Is that right?
rewind() would of course be preferable, I just wanted to know if my 
assumption about seek working was correct or not.

From jason at bioperl.org  Mon Jun  5 09:45:40 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:45:40 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44842F50.7090408@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
Message-ID: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>

It depends on how you have run StandAloneBlast -- if the stream you  
are dealing with is not a file, but a datastream as in the STDOUT  
from BLAST, then the seek won't work (as it wouldn't work for a zcat  
on gzipped file).  I think the default StandAloneBlast behavior is to  
operate on a STDOUT stream so seeking won't work no matter what.


On Jun 5, 2006, at 9:19 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
>>
>>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>>> indeed, didn't Genevieve already try to do this after I suggested  
>>> it and
>>> found that it didn't work?
>>>
>>> Confused...
>>>
>> There is an internal _rewind if you are using the next_XX methods  
>> that
>> resets the internal iterator (all the data has already been parsed).
>>
>> You >>can<< reseek the internal filehandle (accessible by calling
>> $object->_fh ), but you can't call seek on the searchio object  
>> itsself.
>
> ... poor choice of words on my part. Or maybe I'm not understanding
> you... I already suggested to Genevieve that she try:
>
> # in the following, $blast_report is a SearchIO
>> my $blast_report = $factory->blastall($ref_seq_objs);
>> my $blast_fh = $blast_report->fh();
>> while (<$blast_fh>) {
>>      # $_ is a ResultI object, use as normal
>> }
>> seek($blast_fh, 0, 0); # this would be great, but does it work?
>> while <$blast_fh>) {
>>      # go through the results again in your second subroutine
>> }
>>
>> An alternative hacky way of doing it, which may also not work,  
>> would be
>> to go through your $blast_report as normal, but then before going
>> through it a second time, say
>> my $fh = $blast_report->_fh;
>> seek($fh, 0, 0);
>
> She reported that neither way of doing it worked. You seem to be  
> saying
> that at least the second way should have. Is that right?
> rewind() would of course be preferable, I just wanted to know if my
> assumption about seek working was correct or not.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 10:13:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 15:13:03 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
	<E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
Message-ID: <44843BEF.6080609@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> It depends on how you have run StandAloneBlast -- if the stream you are 
> dealing with is not a file, but a datastream as in the STDOUT from 
> BLAST, then the seek won't work (as it wouldn't work for a zcat on 
> gzipped file).  I think the default StandAloneBlast behavior is to 
> operate on a STDOUT stream so seeking won't work no matter what.

As far as I can see, when you say blastall() on a StandAloneBlast, it 
eventually does:

if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
     $blast_obj = Bio::SearchIO->new(-file=>$outfile,
			            -format => 'blast' );
}

So seeking should work? Tools like StandAloneBlast creating temp files 
for their results prior to parsing is actually one of things I don't 
like about the bioperl tool system.

From lstein at cshl.edu  Mon Jun  5 10:51:52 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 5 Jun 2006 10:51:52 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <200606051051.52648.lstein@cshl.edu>

Hi,

From the Synopsis for GuessSeqFormat:

           # To guess the format from an already open filehandle:
           my $guesser = new Bio::Tools::GuessSeqFormat( -fh => $filehandle );
           my $format  = $guesser->guess;
           # If the filehandle is seekable (STDIN isn't), it will be
           # returned to its original position.

The filehandle returned by CGI.pm is not seekable.

Lincoln

On Monday 05 June 2006 04:16, Wijaya Edward wrote:
> Dear Lincoln and experts
>
> Curently I have a CGI application that does this:
>
> 1.  read and uploaded file
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
>
>
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed.
>
> So for example if "test1.fasta" looks like this:
> >Seq0
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> When it was printed it gives only:
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> Why is this happening?
>
> Below is the complete cgi script that
> does the task  I mentioned earlier.
>
> Did I missed out anything in my code?
>
>
>
> __BEGIN__
> #!/usr/bin/perl -w
>
> use CGI qw/:standard :html3/;
> use CGI::Carp qw( fatalsToBrowser );
> use Data::Dumper;
>
> BEGIN {
>     if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {
>
>         # Blindly untaint.  Taintchecking is to protect
>         # from Web data;
>         # the environment is under our control.
>         eval "use lib '$_';" foreach (
>             reverse
>             split( /:/, $1 )
>         );
>     }
> }
>
>
> use Bio::Tools::GuessSeqFormat;
>
> print header,
>     start_html('file upload'),
>     h1('file upload!');
> print_form()    unless param;
> print_results() if param;
> print end_html;
>
> sub print_form {
>     print start_multipart_form(),
>        filefield(-name=>'upload',-size=>60),br,
>        submit(-label=>'Upload File'),
>        end_form;
> }
>
> sub print_results {
>     my $length;
>     my $file = param('upload');
>     my $fh_upload = upload('upload');
>
>     my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload
> ); my $format_upload  = $guesser_upload->guess;
>
>     if ( !$file ) {
>         print "No file uploaded.";
>         return;
>     }
>     print h2('File name'),      $file;
>     print h2('Format'), $format_upload;
>     print h2('The content is'),br;
>
>     while (<$fh_upload>) {
>
>      # The very first line of the file is not get printed here
>      # Why?
>
>         print;
>         print br;
>         $length += length($_);
>     }
>     print h2('File length'), $length;
> }
>
>
> __END__
>
> Hope to hear from you again.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the
> intended recipient, please delete it and notify us immediately. Please do
> not copy or use it for any purpose, or disclose its contents to any other
> person. Thank you. --------------------------------------------------------

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060605/0d6f7bb0/attachment.bin 

From cjfields at uiuc.edu  Mon Jun  5 12:30:41 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 11:30:41 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44843BEF.6080609@mrc-dunn.cam.ac.uk>
Message-ID: <006001c688bd$62d48850$15327e82@pyrimidine>

If you want flexibility or added functionality then you can always
contribute a patch, such as adding an option for filehandles, IO::String,
pipes/forks, or whatever you wish.  Or you could suggest such to the module
maintainer, Torsten, and then it's his choice whether he wants to make it a
priority to implement it.  Simply stating this is 'one of things I don't
like about the bioperl tool system' isn't productive here.   It hasn't been
a top priority to implement something along those lines since the module
works for them as is, so if you want these options you'll have to add them,
and add the appropriate tests.

As for the seek issue, the file handle you get by using '$blast_report-fh()'
isn't the raw input file stream but is a tied filehandle of a stream of
ResultI objects:
==================================
Jason's version:
# seek called on the >>internal<< filehandle (from Bio::Root::IO)
# this is the raw data input stream from a file, so should work
seek($searchio->_fh, 0);
==================================
Your version:
# seek called on SearchIO object filehandle
my $blast_report = $factory->blastall($ref_seq_objs);
# this is a tied filehandle for an output stream of objects from SearchIO,
# NOT the raw input stream
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
	# a stream of Bio::Search::Result::BlastResult objects 
} 
# can't use seek on a tied filehandle, won't work unless 
# SEEK class method is implemented (and it's not)
seek($blast_fh, 0, 0); 
==================================

There's a good deal in Programming Perl about tied filehandles.  You'll
notice that Bio::SearchIO implements TIEHANDLE, READLINE, DESTROY, and PRINT
methods, but not SEEK since we've never needed it.  You can always add one
if you want but I really don't see the point based on reasons Jason and I
outlined before.

Seems there is not much overall documentation on newFh or $blast_report->fh,
but I believe it's analogous to the SeqIO version which is covered a bit in
the bptutorial file, now on the wiki:

http://www.bioperl.org/wiki/Bptutorial.pl#III.2.1_Transforming_sequence_file
s_.28SeqIO.29

$in  = Bio::SeqIO->newFh(-file => "inputfilename" ,
                          -format => 'fasta');
$out = Bio::SeqIO->newFh(-format => 'embl');
print $out $_ while <$in>;

Wouldn't hurt if someone wants to add a bit more about these to the SearchIO
HOWTO.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 05, 2006 9:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Jason Stajich wrote:
> > It depends on how you have run StandAloneBlast -- if the stream you are
> > dealing with is not a file, but a datastream as in the STDOUT from
> > BLAST, then the seek won't work (as it wouldn't work for a zcat on
> > gzipped file).  I think the default StandAloneBlast behavior is to
> > operate on a STDOUT stream so seeking won't work no matter what.
> 
> As far as I can see, when you say blastall() on a StandAloneBlast, it
> eventually does:
> 
> if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
>      $blast_obj = Bio::SearchIO->new(-file=>$outfile,
> 			            -format => 'blast' );
> }
> 
> So seeking should work? Tools like StandAloneBlast creating temp files
> for their results prior to parsing is actually one of things I don't
> like about the bioperl tool system.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Jun  5 09:02:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:02:02 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44840020.4020604@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
Message-ID: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>


On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
>>
>>> If you want to rewind the parser then (assuming you are using a
>>> filestream and not a data stream from the web or zcat or something)
>>> just reset the filehandle
>>> seek($searchio->_fh, 0);
>>>
>>> but then you'll have to re-parse everything and pay that cost  
>>> twice -
>>> it makes more sense to me to just save the results and put them in
>>> list if you are going to deliberately make two passes over all the
>>> results.    You either pay the cost of memory (keeping all the
>>> objects) or time (reparse the results).
>>
>> I agree there isn't any really good reason to rewind the parser; I  
>> was
>> mainly just curious how this was accomlished.
>
> Didn't you already explain why seeking a SearchIO wouldn't work? And
> indeed, didn't Genevieve already try to do this after I suggested  
> it and
>   found that it didn't work?
>
> Confused...
>
There is an internal _rewind if you are using the next_XX methods  
that resets the internal iterator (all the data has already been  
parsed).

You >>can<< reseek the internal filehandle (accessible by calling  
$object->_fh ), but you can't call seek on the searchio object itsself.

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 13:23:36 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 18:23:36 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006001c688bd$62d48850$15327e82@pyrimidine>
References: <006001c688bd$62d48850$15327e82@pyrimidine>
Message-ID: <44846898.8020001@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> If you want flexibility or added functionality then you can always
> contribute a patch, such as adding an option for filehandles, IO::String,
> pipes/forks, or whatever you wish.

Well, it wouldn't be a new feature per se, but just changing the way the 
modules work under the hood.


> Or you could suggest such to the module
> maintainer, Torsten, and then it's his choice whether he wants to make it a
> priority to implement it.  Simply stating this is 'one of things I don't
> like about the bioperl tool system' isn't productive here.

Yes, I apologise for that. I had thought too much would need to be 
changed and backward compatibility wouldn't be possible, but just 
changing StandAloneBlast should be possible.

I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... is there any point in working 
up a patch (would something of questionable reliability ever be 
committed into bioperl)?


> As for the seek issue, the file handle you get by using '$blast_report-fh()'
> isn't the raw input file stream but is a tied filehandle of a stream of
> ResultI objects:
> ==================================
> Jason's version:
> # seek called on the >>internal<< filehandle (from Bio::Root::IO)
> # this is the raw data input stream from a file, so should work
> seek($searchio->_fh, 0);
> ==================================
> Your version:
> # seek called on SearchIO object filehandle
> my $blast_report = $factory->blastall($ref_seq_objs);
> # this is a tied filehandle for an output stream of objects from SearchIO,
> # NOT the raw input stream
> my $blast_fh = $blast_report->fh();

For academic interest, how do I get the 'raw input stream'? Wasn't that 
what my second version did?

 > my $fh = $blast_report->_fh;
 > seek($fh, 0, 0);

From hubert.prielinger at gmx.at  Mon Jun  5 14:17:53 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 12:17:53 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>	<4480E7AA.3020603@gmx.at>
	<720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
Message-ID: <44847551.7040705@gmx.at>

hi,
you were right, removing the composition-based statistics solved the 
problem. Now I get the result viewed on STDIN, but it doesn't save the 
output in the file.
I haved tried it by reopening the file and writing it to an other file 
again, but it doesn't work.....
The strange thing is that if I retrieve text instead of xml output it 
works without any problem. Don't know why

Hubert


Chris Fields wrote:
> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>
>   
>> hi chris,
>> thanks but I never intended to run the remoteblast with so much,  
>> only a few of them, acutally I goal is to run the phiblast with  
>> regular expression, so that i just don't need that
>> file anymore
>>     
>
> Not a problem.  Just to let you know, I did manage to get the script  
> working, so I'm marking the bug INVALID.  I think the problem isn't  
> that there is an infinite loop so much as setting composition-based  
> statistics causes the search to take much much longer; try removing  
> that line to see what I mean.
>
> Just so you know, using $result->query_name doesn't get you what you  
> would expect (it gives you a part of the RID, which you don't want;  
> this is something in the XML output that is beyond our control).  You  
> might want to change it to something else or you'll get filenames  
> with numerical names.
>
>   
>> another question for parsing the xml output....is there a xml  
>> parser available for blast xml output or how to start.....
>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>> is their maybe another introduction or an example.
>>     
>
> Bio::SearchIO objects are used to parse BLAST XML output if you have  
> it saved to a file.  For instance:
>
> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>
> while (my $result = $factory->next_result) {
>    while (my $hit = $result->next_hit) {
>       while (my $hsp = $hit->next_hsp {
>          #do stuff here
>        }
>     }
> }
>
> The only thing that changes in parsing a text BLAST report from an  
> XML BLAST report is the -format line (similar to the -readmethod  
> parameter in RemoteBlast).  You shouldn't need to look up any more  
> documentation other than these on the wiki:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>
> Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
> up parsing.
>
> Chris
>
>   
>> thanks
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Yes, I see the same error you do.  But I have a similar script   
>>> (blastp, XML blast report, XML parsing, similar loop structure)  
>>> that  works fine.  I'm trying to dissect the problem but I think  
>>> it may be  something logically wrong here (something not so  
>>> obvious) and not a  bug...
>>>
>>> What I'm trying to say is, when you send sequences using  
>>> remoteblast  like, this you are essentially spamming the NCBI  
>>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>>> that intent in mind;  you should really try to set up your own  
>>> local blast database if  possible.  If you can't, try running this  
>>> script in off-hours  (10pm-6am EST or something like that).
>>>
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi,
>>>> input database: swissprot
>>>>         matrix: pam30
>>>>         count: 1
>>>>         gapcosts: 9 1
>>>>
>>>> I know that there are  a lot of sequences, but that doesn't  
>>>> matter,  you can delete all of them except one, the amount of the  
>>>> sequences  is not the problem, the script reads one line and  
>>>> submits  it.....then the second line and so on.....I have tried  
>>>> it with only  one sequence either and I got the same result....  
>>>> the script run at  that time for more than 20  
>>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>>> the results for ONE sequence, I guess
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> You need to add the input conditions as well (you have several   
>>>>> <STDIN> lines which may play a role; I would like to know what  
>>>>> you  normally enter for those).
>>>>>
>>>>> How long did you let the script run?  I ran a quick check on  
>>>>> your  sequences; you have almost 1600, so you have to expect  
>>>>> that you'll  run into some problems here!  Most here (including  
>>>>> me) would  suggest you try installing a local blast setup for  
>>>>> something like  this.
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> I have submitted the bug -> Bug 2017
>>>>>> with the script and input file, just start it from command line
>>>>>>
>>>>>> thank you very much
>>>>>> greetings
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Hubert,
>>>>>>>
>>>>>>> I have a script that's using blastxml and XML output which  
>>>>>>> seems  to work.
>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>>> 'Sendu  Bala'
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>>> run  several
>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>> I didn't get any results.
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Sendu, Hubert,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>>> the  problem
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> (break
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>>> RemoteBlast in
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> CVS;
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>>> CVS  to see if
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> it
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> hi,
>>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>>> for  retrieving
>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>>> anymore.....
>>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>>> the  query, but
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>> I
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>
>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>>> over..... it
>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>>> the  NCBI server
>>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>>> do  a blast,
>>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>>> normal  'waiting'
>>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>>> seconds,  but
>>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>>> an  xml error
>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>
>>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>>> treats no
>>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>>> patch  to fix this
>>>>>>>>>> at this bug page:
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Mon Jun  5 14:32:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:32:47 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44847551.7040705@gmx.at>
Message-ID: <006101c688ce$7185c330$15327e82@pyrimidine>

Hubert, 

Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
option to save XML was committed relatively recently (last month or so).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Monday, June 05, 2006 1:18 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> you were right, removing the composition-based statistics solved the
> problem. Now I get the result viewed on STDIN, but it doesn't save the
> output in the file.
> I haved tried it by reopening the file and writing it to an other file
> again, but it doesn't work.....
> The strange thing is that if I retrieve text instead of xml output it
> works without any problem. Don't know why
> 
> Hubert
> 
> 
> 
> Chris Fields wrote:
> > On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
> >
> >
> >> hi chris,
> >> thanks but I never intended to run the remoteblast with so much,
> >> only a few of them, acutally I goal is to run the phiblast with
> >> regular expression, so that i just don't need that
> >> file anymore
> >>
> >
> > Not a problem.  Just to let you know, I did manage to get the script
> > working, so I'm marking the bug INVALID.  I think the problem isn't
> > that there is an infinite loop so much as setting composition-based
> > statistics causes the search to take much much longer; try removing
> > that line to see what I mean.
> >
> > Just so you know, using $result->query_name doesn't get you what you
> > would expect (it gives you a part of the RID, which you don't want;
> > this is something in the XML output that is beyond our control).  You
> > might want to change it to something else or you'll get filenames
> > with numerical names.
> >
> >
> >> another question for parsing the xml output....is there a xml
> >> parser available for blast xml output or how to start.....
> >> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
> >> but I'm not sure how to start....sorry, I guess I'm too stupid....
> >> is their maybe another introduction or an example.
> >>
> >
> > Bio::SearchIO objects are used to parse BLAST XML output if you have
> > it saved to a file.  For instance:
> >
> > my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
> >
> > while (my $result = $factory->next_result) {
> >    while (my $hit = $result->next_hit) {
> >       while (my $hsp = $hit->next_hsp {
> >          #do stuff here
> >        }
> >     }
> > }
> >
> > The only thing that changes in parsing a text BLAST report from an
> > XML BLAST report is the -format line (similar to the -readmethod
> > parameter in RemoteBlast).  You shouldn't need to look up any more
> > documentation other than these on the wiki:
> >
> > http://www.bioperl.org/wiki/HOWTO:SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
> >
> > Pay attention to the fact you'll need to install XML::SAX (CPAN) and
> > that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
> > up parsing.
> >
> > Chris
> >
> >
> >> thanks
> >> Hubert
> >>
> >>
> >> Chris Fields wrote:
> >>
> >>> Yes, I see the same error you do.  But I have a similar script
> >>> (blastp, XML blast report, XML parsing, similar loop structure)
> >>> that  works fine.  I'm trying to dissect the problem but I think
> >>> it may be  something logically wrong here (something not so
> >>> obvious) and not a  bug...
> >>>
> >>> What I'm trying to say is, when you send sequences using
> >>> remoteblast  like, this you are essentially spamming the NCBI
> >>> BLAST server with  ~1600 requests.  This script wasn't set up with
> >>> that intent in mind;  you should really try to set up your own
> >>> local blast database if  possible.  If you can't, try running this
> >>> script in off-hours  (10pm-6am EST or something like that).
> >>>
> >>>
> >>> Chris
> >>>
> >>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
> >>>
> >>>
> >>>
> >>>> hi,
> >>>> input database: swissprot
> >>>>         matrix: pam30
> >>>>         count: 1
> >>>>         gapcosts: 9 1
> >>>>
> >>>> I know that there are  a lot of sequences, but that doesn't
> >>>> matter,  you can delete all of them except one, the amount of the
> >>>> sequences  is not the problem, the script reads one line and
> >>>> submits  it.....then the second line and so on.....I have tried
> >>>> it with only  one sequence either and I got the same result....
> >>>> the script run at  that time for more than 20
> >>>> minutes!!!!!! .....and that should be  enough time to retrieve
> >>>> the results for ONE sequence, I guess
> >>>>
> >>>> regards
> >>>> Hubert
> >>>>
> >>>>
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> You need to add the input conditions as well (you have several
> >>>>> <STDIN> lines which may play a role; I would like to know what
> >>>>> you  normally enter for those).
> >>>>>
> >>>>> How long did you let the script run?  I ran a quick check on
> >>>>> your  sequences; you have almost 1600, so you have to expect
> >>>>> that you'll  run into some problems here!  Most here (including
> >>>>> me) would  suggest you try installing a local blast setup for
> >>>>> something like  this.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> hi,
> >>>>>> I have submitted the bug -> Bug 2017
> >>>>>> with the script and input file, just start it from command line
> >>>>>>
> >>>>>> thank you very much
> >>>>>> greetings
> >>>>>>
> >>>>>> Hubert
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Hubert,
> >>>>>>>
> >>>>>>> I have a script that's using blastxml and XML output which
> >>>>>>> seems  to work.
> >>>>>>> I'll try looking at it to get a better idea this weekend.
> >>>>>>>
> >>>>>>> Chris
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> >>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
> >>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
> >>>>>>>> 'Sendu  Bala'
> >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>
> >>>>>>>> hi,
> >>>>>>>> sorry, but I have updated the remoteblast module and I have
> >>>>>>>> run  several
> >>>>>>>> attempts with the same results as before. It didn't work.
> >>>>>>>> I didn't get any results.
> >>>>>>>>
> >>>>>>>> regards
> >>>>>>>> Hubert
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Chris Fields wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Sendu, Hubert,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
> >>>>>>>>> the  problem
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> (break
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
> >>>>>>>>> RemoteBlast in
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> CVS;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
> >>>>>>>>> CVS  to see if
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> it
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> works.
> >>>>>>>>>
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
> >>>>>>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>>>
> >>>>>>>>>> Hubert Prielinger wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> hi,
> >>>>>>>>>>> I have the following program and it worked quite well,
> >>>>>>>>>>> for  retrieving
> >>>>>>>>>>> remoteblast results in a textfile,
> >>>>>>>>>>> now I have altered it to to xml, and it didn't work
> >>>>>>>>>>> anymore.....
> >>>>>>>>>>> it takes all the parameter at the commandline, submits
> >>>>>>>>>>> the  query, but
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>> I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> don't retrieve any results file anymore.....
> >>>>>>>>>>>
> >>>>>>>>>>> it seems that it hangs in a endless loop......
> >>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
> >>>>>>>>>>> over..... it
> >>>>>>>>>>> doesn't enter the else term anymore....
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> There is no problem with your code. The problem is with
> >>>>>>>>>> the  NCBI server
> >>>>>>>>>> and should be reported to them. You can visit the site and
> >>>>>>>>>> do  a blast,
> >>>>>>>>>> requesting xml format, and you will typically get one
> >>>>>>>>>> normal  'waiting'
> >>>>>>>>>> message and the promise that it will be updated in x
> >>>>>>>>>> seconds,  but
> >>>>>>>>>> subsequent attempts to get progress information result in
> >>>>>>>>>> an  xml error
> >>>>>>>>>> page because the NCBI server doesn't actually send any data.
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately the way that the bioperl code is written, it
> >>>>>>>>>> treats no
> >>>>>>>>>> data as 'waiting' instead of an error. I've offered a
> >>>>>>>>>> patch  to fix this
> >>>>>>>>>> at this bug page:
> >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>> Christopher Fields
> >>> Postdoctoral Researcher
> >>> Lab of Dr. Robert Switzer
> >>> Dept of Biochemistry
> >>> University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jun  5 14:56:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:56:18 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44846898.8020001@mrc-dunn.cam.ac.uk>
Message-ID: <006201c688d1$bad2aff0$15327e82@pyrimidine>


> Chris Fields wrote:
> > If you want flexibility or added functionality then you can always
> > contribute a patch, such as adding an option for filehandles,
> IO::String,
> > pipes/forks, or whatever you wish.
> 
> Well, it wouldn't be a new feature per se, but just changing the way the
> modules work under the hood.

...

> I use IPC::Open3 for blasts and have never run into problems, but it
> pretty much falls into the 'apt to cause deadlock' camp. It may pass
> tests on one machine but fail on others... is there any point in working
> up a patch (would something of questionable reliability ever be
> committed into bioperl)?

The main thing you should avoid is major API changes or issues which break
this module on other OS's.  I'm not sure that StandAloneBlast is 'broken' by
using a tempfile as the location of the BLAST report.  

Any way you go about it, you'll have to capture the BLAST output as a stream
and get it to persist in a SearchIO object somehow.  It's can be a pretty
decent memory hit to keep that report hanging around, esp. if it is larger.

...

> For academic interest, how do I get the 'raw input stream'? Wasn't that
> what my second version did?
> 
>  > my $fh = $blast_report->_fh;
>  > seek($fh, 0, 0);

That should work, yes.  Didn't see that one your previous response.  I can
get it work w/o problems with SearchIO directly but I haven't tried it with
StandAloneBlast.  Below is my script.  Commenting the seek line below
doesn't move the file pointer so the second round of parsing won't happen.

my $parser = Bio::SearchIO->new(  -file => shift,
                                  -format => 'blast');

my $fh = $parser->_fh;

while (<$fh>) {
     print;
}

seek($fh, 0,0);

$fh = $parser->fh;

print "Second round:\n";
while (<$fh>) {
    while (my $hit = $_->next_hit) {
        print $hit->accession,"\n";
    }
}


Chris


From hubert.prielinger at gmx.at  Mon Jun  5 15:12:37 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 13:12:37 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <006101c688ce$7185c330$15327e82@pyrimidine>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
Message-ID: <44848225.8080003@gmx.at>

hi chris,
sorry, I have tried it with the latest CVS version:

# $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $

but it still doesn't work.

Hubert

Chris Fields wrote:
> Hubert, 
>
> Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
> option to save XML was committed relatively recently (last month or so).
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Monday, June 05, 2006 1:18 PM
>> To: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> you were right, removing the composition-based statistics solved the
>> problem. Now I get the result viewed on STDIN, but it doesn't save the
>> output in the file.
>> I haved tried it by reopening the file and writing it to an other file
>> again, but it doesn't work.....
>> The strange thing is that if I retrieve text instead of xml output it
>> works without any problem. Don't know why
>>
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi chris,
>>>> thanks but I never intended to run the remoteblast with so much,
>>>> only a few of them, acutally I goal is to run the phiblast with
>>>> regular expression, so that i just don't need that
>>>> file anymore
>>>>
>>>>         
>>> Not a problem.  Just to let you know, I did manage to get the script
>>> working, so I'm marking the bug INVALID.  I think the problem isn't
>>> that there is an infinite loop so much as setting composition-based
>>> statistics causes the search to take much much longer; try removing
>>> that line to see what I mean.
>>>
>>> Just so you know, using $result->query_name doesn't get you what you
>>> would expect (it gives you a part of the RID, which you don't want;
>>> this is something in the XML output that is beyond our control).  You
>>> might want to change it to something else or you'll get filenames
>>> with numerical names.
>>>
>>>
>>>       
>>>> another question for parsing the xml output....is there a xml
>>>> parser available for blast xml output or how to start.....
>>>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
>>>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>>>> is their maybe another introduction or an example.
>>>>
>>>>         
>>> Bio::SearchIO objects are used to parse BLAST XML output if you have
>>> it saved to a file.  For instance:
>>>
>>> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>>>
>>> while (my $result = $factory->next_result) {
>>>    while (my $hit = $result->next_hit) {
>>>       while (my $hsp = $hit->next_hsp {
>>>          #do stuff here
>>>        }
>>>     }
>>> }
>>>
>>> The only thing that changes in parsing a text BLAST report from an
>>> XML BLAST report is the -format line (similar to the -readmethod
>>> parameter in RemoteBlast).  You shouldn't need to look up any more
>>> documentation other than these on the wiki:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>>>
>>> Pay attention to the fact you'll need to install XML::SAX (CPAN) and
>>> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
>>> up parsing.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> thanks
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Yes, I see the same error you do.  But I have a similar script
>>>>> (blastp, XML blast report, XML parsing, similar loop structure)
>>>>> that  works fine.  I'm trying to dissect the problem but I think
>>>>> it may be  something logically wrong here (something not so
>>>>> obvious) and not a  bug...
>>>>>
>>>>> What I'm trying to say is, when you send sequences using
>>>>> remoteblast  like, this you are essentially spamming the NCBI
>>>>> BLAST server with  ~1600 requests.  This script wasn't set up with
>>>>> that intent in mind;  you should really try to set up your own
>>>>> local blast database if  possible.  If you can't, try running this
>>>>> script in off-hours  (10pm-6am EST or something like that).
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> input database: swissprot
>>>>>>         matrix: pam30
>>>>>>         count: 1
>>>>>>         gapcosts: 9 1
>>>>>>
>>>>>> I know that there are  a lot of sequences, but that doesn't
>>>>>> matter,  you can delete all of them except one, the amount of the
>>>>>> sequences  is not the problem, the script reads one line and
>>>>>> submits  it.....then the second line and so on.....I have tried
>>>>>> it with only  one sequence either and I got the same result....
>>>>>> the script run at  that time for more than 20
>>>>>> minutes!!!!!! .....and that should be  enough time to retrieve
>>>>>> the results for ONE sequence, I guess
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> You need to add the input conditions as well (you have several
>>>>>>> <STDIN> lines which may play a role; I would like to know what
>>>>>>> you  normally enter for those).
>>>>>>>
>>>>>>> How long did you let the script run?  I ran a quick check on
>>>>>>> your  sequences; you have almost 1600, so you have to expect
>>>>>>> that you'll  run into some problems here!  Most here (including
>>>>>>> me) would  suggest you try installing a local blast setup for
>>>>>>> something like  this.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> hi,
>>>>>>>> I have submitted the bug -> Bug 2017
>>>>>>>> with the script and input file, just start it from command line
>>>>>>>>
>>>>>>>> thank you very much
>>>>>>>> greetings
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Hubert,
>>>>>>>>>
>>>>>>>>> I have a script that's using blastxml and XML output which
>>>>>>>>> seems  to work.
>>>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
>>>>>>>>>> 'Sendu  Bala'
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> sorry, but I have updated the remoteblast module and I have
>>>>>>>>>> run  several
>>>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>>>> I didn't get any results.
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Chris Fields wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Sendu, Hubert,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
>>>>>>>>>>> the  problem
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> (break
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
>>>>>>>>>>> RemoteBlast in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> CVS;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
>>>>>>>>>>> CVS  to see if
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>>>
>>>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>>>> hi,
>>>>>>>>>>>>> I have the following program and it worked quite well,
>>>>>>>>>>>>> for  retrieving
>>>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>>>> now I have altered it to to xml, and it didn't work
>>>>>>>>>>>>> anymore.....
>>>>>>>>>>>>> it takes all the parameter at the commandline, submits
>>>>>>>>>>>>> the  query, but
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>>>
>>>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
>>>>>>>>>>>>> over..... it
>>>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> There is no problem with your code. The problem is with
>>>>>>>>>>>> the  NCBI server
>>>>>>>>>>>> and should be reported to them. You can visit the site and
>>>>>>>>>>>> do  a blast,
>>>>>>>>>>>> requesting xml format, and you will typically get one
>>>>>>>>>>>> normal  'waiting'
>>>>>>>>>>>> message and the promise that it will be updated in x
>>>>>>>>>>>> seconds,  but
>>>>>>>>>>>> subsequent attempts to get progress information result in
>>>>>>>>>>>> an  xml error
>>>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the way that the bioperl code is written, it
>>>>>>>>>>>> treats no
>>>>>>>>>>>> data as 'waiting' instead of an error. I've offered a
>>>>>>>>>>>> patch  to fix this
>>>>>>>>>>>> at this bug page:
>>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher
>>>>>>> Lab of Dr. Robert Switzer
>>>>>>> Dept of Biochemistry
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 15:14:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 20:14:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006201c688d1$bad2aff0$15327e82@pyrimidine>
References: <006201c688d1$bad2aff0$15327e82@pyrimidine>
Message-ID: <44848280.1080703@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>>> If you want flexibility or added functionality then you can 
>>> always contribute a patch, such as adding an option for 
>>> filehandles, IO::String, pipes/forks, or whatever you wish.
>> 
>> Well, it wouldn't be a new feature per se, but just changing the 
>> way the modules work under the hood.
> 
> ...
> 
>> I use IPC::Open3 for blasts and have never run into problems, but 
>> it pretty much falls into the 'apt to cause deadlock' camp. It may
>> pass tests on one machine but fail on others... is there any point
>> in working up a patch (would something of questionable reliability
>> ever be committed into bioperl)?
> 
> The main thing you should avoid is major API changes or issues which
> break this module on other OS's.  I'm not sure that StandAloneBlast
> is 'broken' by using a tempfile as the location of the BLAST report.
> 
> 
> 
> Any way you go about it, you'll have to capture the BLAST output as a
> stream and get it to persist in a SearchIO object somehow.  It's can
> be a pretty decent memory hit to keep that report hanging around, 
> esp. if it is larger.

Well at the moment StandAloneBlast runs the blast program and stores its
output to a temp file, then gives the temp file name as an arg to
SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
of the blast program directly to my parser. The question is, why wasn't
this done in StandAloneBlast? I would get the blast program output
handle and pass it directly to SearchIO with the -fh option of new().
The only difference here is it's faster and more efficient with the
direct pipe, but you can't subsequently seek the SearchIO's internal
filehandle (as we discussing in this thread). There are no (additional)
issues with memory.

If it isn't done using IPC::Open3 (or similar) because the original
author already knew it wouldn't be reliable enough, or for some other
reason(s), fine. Does anyone know the reasons?

From cjfields at uiuc.edu  Mon Jun  5 15:43:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 14:43:50 -0500
Subject: [Bioperl-l] StandAloneBlast
In-Reply-To: <44848280.1080703@mrc-dunn.cam.ac.uk>
Message-ID: <006301c688d8$5e4ce910$15327e82@pyrimidine>

> Well at the moment StandAloneBlast runs the blast program and stores its
> output to a temp file, then gives the temp file name as an arg to
> SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
> of the blast program directly to my parser. The question is, why wasn't
> this done in StandAloneBlast? 

Probably for the reasons you outlined before:

'I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... '

Why would we take a chance on using something that works on one OS/machine
and fails to work on another?  

> I would get the blast program output handle and pass it directly to 
> SearchIO with the -fh option of new().
> The only difference here is it's faster and more efficient with the
> direct pipe, but you can't subsequently seek the SearchIO's internal
> filehandle (as we discussing in this thread). There are no (additional)
> issues with memory.

Like I said before, you can make changes and submit a patch.  The code here
is over five years old, and many many things have changed since then, so you
might find something works now which wasn't available or didn't work then.
It hasn't really been a priority (it certainly hasn't been mine).  Most
people don't care b/c it just works and a vast majority don't worry/care
about the internals.  

The issue at hand is whether any code changes will work on all OS's, not
just yours.  BioPerl is used the world over on just about every OS, so ANY
code changes need to take that into consideration.  I can guarantee that if
you made changes that break or reduce performance on 50% of the OS's, it'll
likely get rolled back.  You need the best cross-platform compatibility
possible.

We've now veered WAY off topic here.  If we intend on continuing this, we
need to switch the thread topic.

Chris

> If it isn't done using IPC::Open3 (or similar) because the original
> author already knew it wouldn't be reliable enough, or for some other
> reason(s), fine. Does anyone know the reasons?


From cjfields at uiuc.edu  Mon Jun  5 16:30:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 15:30:01 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
Message-ID: <006401c688de$d38035b0$15327e82@pyrimidine>

I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
finished yet (BioSQL and Bioperl-guts isn't done yet) and there are probably
some mangld worsd in there so have mercy on me!  It's been a busy month.

http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006

Fling your mud and abuses by responding to this thread per usual

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Mon Jun  5 23:42:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 22:42:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44848225.8080003@gmx.at>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
	<44848225.8080003@gmx.at>
Message-ID: <D7A85F26-1ADD-446E-A5F3-8C3420746364@uiuc.edu>

Hubert,

I had no trouble getting this to work; the script scans through each  
sequence and save the XML output to a file on both Windows and Mac OS  
X, both using bioperl-live.  The older RemoteBlast would only save  
text; otherwise it saved an empty file.  Using your script I get  
several XML BLAST output files (1.xml, 2.xml, etc) based on a  
counter, each about 1 MB.  All were parseable by SearchIO.

I did notice that if certain parameters weren't entered in correctly  
then you will get no data (such as setting the database to 'swiss'  
instead of 'swissprot').  A warning pops up stating that no data was  
returned when this occurs (it doesn't tell you what was wrong, just  
that no data came back from NCBI).  If you see this then that is  
likely the problem.  Besides that, I don't know what else it can be.

Chris

On Jun 5, 2006, at 2:12 PM, Hubert Prielinger wrote:

> hi chris,
> sorry, I have tried it with the latest CVS version:
>
> # $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $
>
> but it still doesn't work.
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> Make sure you have the latest Bio::Tools::Run::RemoteBlast from  
>> CVS.  The
>> option to save XML was committed relatively recently (last month  
>> or so).
>>
>> Chris
>>

...

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From heikki at sanbi.ac.za  Tue Jun  6 03:40:06 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 09:40:06 +0200
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <200606060940.07285.heikki@sanbi.ac.za>

Chris,

I am mystified. I'll try to get the massive 'return undef' change done first 
and the have an other look.

	-Heikki

On Friday 02 June 2006 21:13, Chris Fields wrote:
> Heikki,
>
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
>
> Not sure what's going on there but using read_aln and write_aln seem to
> work normally.  It may have something to do with Bio::SimpleAlign but I'm
> not absolutely sure.
>
> Any ideas what may be going on here?
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From heikki at sanbi.ac.za  Tue Jun  6 04:04:00 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 10:04:00 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606020952.08034.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
Message-ID: <200606061004.01193.heikki@sanbi.ac.za>


OK. I've gone through all cases where return and undef are on the same lines.
I've done changes in 185 files.

My aims have ben the following:

1. Remove undef from return undef when not necessary.
	This will make it easier to spot cases where undef matters in the future
	Most of the changes fall into this category. The context is clearly scalar.

2. Returning undef when user expects en empty list is bad

./Bio/Tools/Est2Genome.pm fixed
./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
                               not fixed
./Bio/Matrix/PSM/SiteMatrix.pm  fixed
./Bio/Matrix/PSM/Psm  fixed
./Bio/DB/Taxonomy::entrez.pm fixed

3. If docs say method returns nothing, explicit undef is not the right thing 
to return

4. do not return an explicit undef if the method is supposed to return false 
on failure


Before I do the commit, I'd like to see number people to do 'make test' on 
bioperl-live and report back after the commit they see changes. There are 
quite a few tests that fail currently.

I'll do the commit tomorrow Wednesday at 9 o'cock GMT.

	-Heikki


On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> I've started going through the files that have 'return undef' lines.
> I'll report back later.
>
> Initial impression is that there are a few cases where the context
> indicates list to be returned but failure returns an explicit undef. I'll
> fix those.
>
> Most of the cases are much more ambiguous. Even when documentation says the
> failure returns undef, it is clearly meant to mean false. In most cases
> documentation does not comment on return value at all. Luckily the context
> is almost always scalar and therefore it does not matter too much.
>
> I seem to be changing 'return undef' to plain 'return' a bit overzealously,
> so do not take it personally.
>
> 	-Heikki
>
> On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > ....
> >
> > > > Again, didn't do that.
> > >
> > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > certainly not directed at your recent changes to Bio::Restriction::IO.
> > > In fact, I put in the above * comment to exclude your changes from my
> > > discussion; you changed the docs because the code never did what they
> > > said they did (the docs were bad). That's fine (good!). My comments
> > > were a general point, slightly directed at the idea of changing all the
> > > return undef;s - changing the code so that it no longer matches the
> > > docs of a previously working method. That's what I think is bad. Though
> > > in this particular case it shouldn't make any difference at all.
> >
> > Agreed.  In any case, if tests have been properly set up then they should
> > catch problems.  This is, of course, if they are properly set up.
> >
> > Chris
> >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 05:17:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 10:17:48 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <4485483C.4080505@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Heikki,
> 
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> 
> Not sure what's going on there but using read_aln and write_aln seem to work
> normally.  It may have something to do with Bio::SimpleAlign but I'm not
> absolutely sure.
> 
> Any ideas what may be going on here?

Yes, see my replies on the bug page. But so more people see the 
question, I'll ask here: can anyone offer examples of metafasta files, 
especially multiple alignments?

From cjfields at uiuc.edu  Tue Jun  6 10:30:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:30:17 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <4485483C.4080505@mrc-dunn.cam.ac.uk>
Message-ID: <000901c68975$bb9968d0$15327e82@pyrimidine>

Sendu,

This is Heikki's original submission for the specs for meta format:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
sta

So it's really a specialized FASTA format used to store meta information
about sequences.  Seems mainly useful for amino acid sequences, but is
extended to include properties of nucleotides like DNA content, RNA sec.
structure, and so on.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Tuesday, June 06, 2006 4:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> work
> > normally.  It may have something to do with Bio::SimpleAlign but I'm not
> > absolutely sure.
> >
> > Any ideas what may be going on here?
> 
> Yes, see my replies on the bug page. But so more people see the
> question, I'll ask here: can anyone offer examples of metafasta files,
> especially multiple alignments?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun  6 10:36:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:36:16 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <200606060940.07285.heikki@sanbi.ac.za>
Message-ID: <000a01c68976$9479e300$15327e82@pyrimidine>

Heikki,

I agree it's all a bit weird.  Not too concerning at the moment though since
it works at the moment but it might take some tinkering with SimpleAlign to
get it to behave.

This alignment format has some of the same characteristics as Stockholm
alignment format but looks easier to work with.  I work with RNA,
specifically one with a conserved secondary structure so this format appeals
to me quite a bit.  If I get time (probably not for a while) I may tinker
with Bio::AlignIO::stockholm to get a write_aln() method up-and-running and
see if I can convert back-and-forth from the two.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Tuesday, June 06, 2006 2:40 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris,
> 
> I am mystified. I'll try to get the massive 'return undef' change done
> first
> and the have an other look.
> 
> 	-Heikki
> 
> On Friday 02 June 2006 21:13, Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> > work normally.  It may have something to do with Bio::SimpleAlign but
> I'm
> > not absolutely sure.
> >
> > Any ideas what may be going on here?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 11:40:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 16:40:05 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000901c68975$bb9968d0$15327e82@pyrimidine>
References: <000901c68975$bb9968d0$15327e82@pyrimidine>
Message-ID: <4485A1D5.5090805@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu,
> 
> This is Heikki's original submission for the specs for meta format:
> 
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
> sta
> 
> So it's really a specialized FASTA format used to store meta information
> about sequences.  Seems mainly useful for amino acid sequences, but is
> extended to include properties of nucleotides like DNA content, RNA sec.
> structure, and so on.  

Thanks. It's not really clear to me if the meta data needs to be 
considered in the context of an alignment. That is, if you have two meta 
sequences with the same primary sequence, will all their meta data 
necessarily be the same? Or could they be different?

If the same, then the test data and test need to be fixed so my patched 
version of Bio::AlignIO::metafasta passes the tests.

If different, how should the meta data be handled? Like the test implies 
with its expected value for the consensus (just treat the primary 
sequence and all meta data as one long string)?
Is it really the intent to include characters from the meta data names 
when considering what symbols we've seen with symbol_chars() method?
Do we include the meta data name symbols when numbering?

Thoughts anyone?

From cjfields at uiuc.edu  Tue Jun  6 17:07:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 16:07:39 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <006401c688de$d38035b0$15327e82@pyrimidine>
Message-ID: <000601c689ad$3e6aec20$15327e82@pyrimidine>

I hate talking to myself...

I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
(appropriately enough, on 6-6-06).  I am trying out a new script which helps
with all the developer list noise; hope everybody likes it.

Cheers,

Chris   

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Monday, June 05, 2006 3:30 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] ListSummaries for May 10-31.
> 
> I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
> finished yet (BioSQL and Bioperl-guts isn't done yet) and there are
> probably
> some mangld worsd in there so have mercy on me!  It's been a busy month.
> 
> http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006
> 
> Fling your mud and abuses by responding to this thread per usual
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun  6 20:41:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 19:41:08 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <44861D47.7090205@infotech.monash.edu.au>
Message-ID: <000601c689cb$11f568a0$15327e82@pyrimidine>

I could do something like that.  Right now I have a script that just grabs
the text from the web page:

http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html

and uses regexes and hashes to sort everything and make some sense of the
noise.  The resolution for a bug isn't on that page but in the linked
message so I would need to grab the link from HTML, go to that page, then
get the resolution if there is one, so at the moment I just check each one
(thanks for the bug hunt Jason!).  I usually have to do a little touching up
afterwards, such as fix links and such, but the script really saves on time.
As you can tell, it's been a busy month!

I'm (very slowly) updating the script to go through the mail list threads
recursively but haven't really gotten anywhere with that yet.  Benchwork has
intervened yet again!

Chris

> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Tuesday, June 06, 2006 7:27 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
> 
> > I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> > (appropriately enough, on 6-6-06).  I am trying out a new script which
> helps
> > with all the developer list noise; hope everybody likes it.
> 
> I like the CVS summaries.
> 
> For the bug summaries, would it make sense to categorise/sort by
> category/status eg. RESOLVED, WORKSFORME etc?
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Tue Jun  6 20:26:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 07 Jun 2006 10:26:47 +1000
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689ad$3e6aec20$15327e82@pyrimidine>
References: <000601c689ad$3e6aec20$15327e82@pyrimidine>
Message-ID: <44861D47.7090205@infotech.monash.edu.au>

> I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> (appropriately enough, on 6-6-06).  I am trying out a new script which helps
> with all the developer list noise; hope everybody likes it.

I like the CVS summaries.

For the bug summaries, would it make sense to categorise/sort by 
category/status eg. RESOLVED, WORKSFORME etc?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jason at bioperl.org  Wed Jun  7 00:04:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Jun 2006 00:04:02 -0400
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689cb$11f568a0$15327e82@pyrimidine>
References: <000601c689cb$11f568a0$15327e82@pyrimidine>
Message-ID: <8D9B514C-ADB4-409F-A55F-DC0C3DA9354A@bioperl.org>

It is possible some of this can be extracted from the bugzilla as a  
query (all the changes from X to Y) and generate RSS or text that can  
be processed.

-jason
On Jun 6, 2006, at 8:41 PM, Chris Fields wrote:

> I could do something like that.  Right now I have a script that  
> just grabs
> the text from the web page:
>
> http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html
>
> and uses regexes and hashes to sort everything and make some sense  
> of the
> noise.  The resolution for a bug isn't on that page but in the linked
> message so I would need to grab the link from HTML, go to that  
> page, then
> get the resolution if there is one, so at the moment I just check  
> each one
> (thanks for the bug hunt Jason!).  I usually have to do a little  
> touching up
> afterwards, such as fix links and such, but the script really saves  
> on time.
> As you can tell, it's been a busy month!
>
> I'm (very slowly) updating the script to go through the mail list  
> threads
> recursively but haven't really gotten anywhere with that yet.   
> Benchwork has
> intervened yet again!
>
> Chris
>
>> -----Original Message-----
>> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
>> Sent: Tuesday, June 06, 2006 7:27 PM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
>>
>>> I have updated the ListSummaries to include BioSQL-l and Bioperl- 
>>> guts-l
>>> (appropriately enough, on 6-6-06).  I am trying out a new script  
>>> which
>> helps
>>> with all the developer list noise; hope everybody likes it.
>>
>> I like the CVS summaries.
>>
>> For the bug summaries, would it make sense to categorise/sort by
>> category/status eg. RESOLVED, WORKSFORME etc?
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From heikki at sanbi.ac.za  Wed Jun  7 05:57:47 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Jun 2006 11:57:47 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606061004.01193.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
	<200606061004.01193.heikki@sanbi.ac.za>
Message-ID: <200606071157.47736.heikki@sanbi.ac.za>

Committed.

Please report any surprising changes in functionality to the list.

	-Heikki

On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> OK. I've gone through all cases where return and undef are on the same
> lines. I've done changes in 185 files.
>
> My aims have ben the following:
>
> 1. Remove undef from return undef when not necessary.
> 	This will make it easier to spot cases where undef matters in the future
> 	Most of the changes fall into this category. The context is clearly
> scalar.
>
> 2. Returning undef when user expects en empty list is bad
>
> ./Bio/Tools/Est2Genome.pm fixed
> ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
>                                not fixed
> ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> ./Bio/Matrix/PSM/Psm  fixed
> ./Bio/DB/Taxonomy::entrez.pm fixed
>
> 3. If docs say method returns nothing, explicit undef is not the right
> thing to return
>
> 4. do not return an explicit undef if the method is supposed to return
> false on failure
>
>
> Before I do the commit, I'd like to see number people to do 'make test' on
> bioperl-live and report back after the commit they see changes. There are
> quite a few tests that fail currently.
>
> I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
>
> 	-Heikki
>
> On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > I've started going through the files that have 'return undef' lines.
> > I'll report back later.
> >
> > Initial impression is that there are a few cases where the context
> > indicates list to be returned but failure returns an explicit undef. I'll
> > fix those.
> >
> > Most of the cases are much more ambiguous. Even when documentation says
> > the failure returns undef, it is clearly meant to mean false. In most
> > cases documentation does not comment on return value at all. Luckily the
> > context is almost always scalar and therefore it does not matter too
> > much.
> >
> > I seem to be changing 'return undef' to plain 'return' a bit
> > overzealously, so do not take it personally.
> >
> > 	-Heikki
> >
> > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > ....
> > >
> > > > > Again, didn't do that.
> > > >
> > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > certainly not directed at your recent changes to
> > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > exclude your changes from my discussion; you changed the docs because
> > > > the code never did what they said they did (the docs were bad).
> > > > That's fine (good!). My comments were a general point, slightly
> > > > directed at the idea of changing all the return undef;s - changing
> > > > the code so that it no longer matches the docs of a previously
> > > > working method. That's what I think is bad. Though in this particular
> > > > case it shouldn't make any difference at all.
> > >
> > > Agreed.  In any case, if tests have been properly set up then they
> > > should catch problems.  This is, of course, if they are properly set
> > > up.
> > >
> > > Chris
> > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From Michael.Muratet at operon.com  Tue Jun  6 14:34:38 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Tue, 6 Jun 2006 13:34:38 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>

Greetings

I am trying to install bioperl-db in preparation for installing a biosql database. I'm running on a Dell PowerEdge with quad dual-core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl 1.5.1.  I have installed mysql v5.0.21 from source with --with-innodb set for the configuration. I installed bioperl-db from cvs. I have the latest DBI and DBD:mysql installed a few weeks ago from CPAN. The installation has been working well with perl otherwise, for example, the Ensembl core API works OK. SHOW ENGINES indicates that innodb is enabled.  I have attached a snippet from the top of the output below. I searched the web and the bioperl-db list and haven't found anything that appears to be relevant. I've done several of these installs and they've pretty much completed without a single glitch. Does anyone have any ideas how to isolate the problem?

Thanks

Mike

[mmuratet at HSV-PROBE bioperl-db]$ make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/01dbadaptor.....ok 14/19
------------- EXCEPTION  -------------
MSG: failed to open connection: Transactions not supported by database
STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1477
STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm:518
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
STACK toplevel t/01dbadaptor.t:62


From hlapp at gmx.net  Wed Jun  7 08:52:22 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 08:52:22 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
Message-ID: <4F23D2EA-2218-4023-A3F6-3284912952BE@gmx.net>

Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From nlhepler at umd.edu  Wed Jun  7 09:46:32 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 09:46:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
Message-ID: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>

Hello,

I am having some difficulty here.  I have a list of accessions, which  
are the parameters for a get_Stream_by_acc() function on a  
Bio::DB::GenBank object.  None of the returned GenBank information  
for any of my accessions seems to contain variation data, no matter  
how I try to coax it out with unflattener and typemapper.  This data  
is, however, available via the web interface of NCBI Nucleotide, as  
an optional feature (SNP).  I was wondering if there was some option  
I'm missing in the initialization of the Bio::DB::GenBank object (no  
options currently) that will coax the database into giving me this  
data?  Or something else that I'm missing altogether.  The organism  
of interest is human, taxon:9606.

Nicolaus Lance Hepler
nlhepler at mail dot umd dot edu

From cjfields at uiuc.edu  Wed Jun  7 09:56:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 08:56:16 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606071157.47736.heikki@sanbi.ac.za>
Message-ID: <000601c68a3a$265552a0$15327e82@pyrimidine>

Yikes!  I'll download a tarball from anon CVS and run a comparison (vs my
pre-updated bioperl-live) on WinXP and Mac OS X 10.4 (Intel) and report back
success/fail; may be a bit.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> 	-Heikki
> 
> On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> > OK. I've gone through all cases where return and undef are on the same
> > lines. I've done changes in 185 files.
> >
> > My aims have ben the following:
> >
> > 1. Remove undef from return undef when not necessary.
> > 	This will make it easier to spot cases where undef matters in the
> future
> > 	Most of the changes fall into this category. The context is clearly
> > scalar.
> >
> > 2. Returning undef when user expects en empty list is bad
> >
> > ./Bio/Tools/Est2Genome.pm fixed
> > ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
> >                                not fixed
> > ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> > ./Bio/Matrix/PSM/Psm  fixed
> > ./Bio/DB/Taxonomy::entrez.pm fixed
> >
> > 3. If docs say method returns nothing, explicit undef is not the right
> > thing to return
> >
> > 4. do not return an explicit undef if the method is supposed to return
> > false on failure
> >
> >
> > Before I do the commit, I'd like to see number people to do 'make test'
> on
> > bioperl-live and report back after the commit they see changes. There
> are
> > quite a few tests that fail currently.
> >
> > I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
> >
> > 	-Heikki
> >
> > On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > > I've started going through the files that have 'return undef' lines.
> > > I'll report back later.
> > >
> > > Initial impression is that there are a few cases where the context
> > > indicates list to be returned but failure returns an explicit undef.
> I'll
> > > fix those.
> > >
> > > Most of the cases are much more ambiguous. Even when documentation
> says
> > > the failure returns undef, it is clearly meant to mean false. In most
> > > cases documentation does not comment on return value at all. Luckily
> the
> > > context is almost always scalar and therefore it does not matter too
> > > much.
> > >
> > > I seem to be changing 'return undef' to plain 'return' a bit
> > > overzealously, so do not take it personally.
> > >
> > > 	-Heikki
> > >
> > > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > > ....
> > > >
> > > > > > Again, didn't do that.
> > > > >
> > > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > > certainly not directed at your recent changes to
> > > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > > exclude your changes from my discussion; you changed the docs
> because
> > > > > the code never did what they said they did (the docs were bad).
> > > > > That's fine (good!). My comments were a general point, slightly
> > > > > directed at the idea of changing all the return undef;s - changing
> > > > > the code so that it no longer matches the docs of a previously
> > > > > working method. That's what I think is bad. Though in this
> particular
> > > > > case it shouldn't make any difference at all.
> > > >
> > > > Agreed.  In any case, if tests have been properly set up then they
> > > > should catch problems.  This is, of course, if they are properly set
> > > > up.
> > > >
> > > > Chris
> > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed Jun  7 11:42:32 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 07 Jun 2006 11:42:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>
Message-ID: <C0AC6C28.8C12%osborne1@optonline.net>

Nicolaus,

The short answer is no, there's no option that will omit or add a particular
feature or annotation to the Sequence object returned by Bio::DB::GenBank.
Can you give some example accessions?

Brian O.


On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:

> Hello,
> 
> I am having some difficulty here.  I have a list of accessions, which
> are the parameters for a get_Stream_by_acc() function on a
> Bio::DB::GenBank object.  None of the returned GenBank information
> for any of my accessions seems to contain variation data, no matter
> how I try to coax it out with unflattener and typemapper.  This data
> is, however, available via the web interface of NCBI Nucleotide, as
> an optional feature (SNP).  I was wondering if there was some option
> I'm missing in the initialization of the Bio::DB::GenBank object (no
> options currently) that will coax the database into giving me this
> data?  Or something else that I'm missing altogether.  The organism
> of interest is human, taxon:9606.
> 
> Nicolaus Lance Hepler
> nlhepler at mail dot umd dot edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nlhepler at umd.edu  Wed Jun  7 12:26:06 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 12:26:06 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <C0AC6C28.8C12%osborne1@optonline.net>
References: <C0AC6C28.8C12%osborne1@optonline.net>
Message-ID: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>

Brian,

A sample accession is BC000007.  I figured a way around it though.   
Rather than automate the whole process, I just downloaded from Batch  
Entrez a flat .gb file of all my accessions.  It's not flexible, and  
will be inconvenient when we expand the dataset, but it will provide  
me with data to work with for now.

Nicolaus

> Nicolaus,
>
> The short answer is no, there's no option that will omit or add a  
> particular
> feature or annotation to the Sequence object returned by  
> Bio::DB::GenBank.
> Can you give some example accessions?
>
> Brian O.
>
>
> On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
>
>> Hello,
>>
>> I am having some difficulty here.  I have a list of accessions, which
>> are the parameters for a get_Stream_by_acc() function on a
>> Bio::DB::GenBank object.  None of the returned GenBank information
>> for any of my accessions seems to contain variation data, no matter
>> how I try to coax it out with unflattener and typemapper.  This data
>> is, however, available via the web interface of NCBI Nucleotide, as
>> an optional feature (SNP).  I was wondering if there was some option
>> I'm missing in the initialization of the Bio::DB::GenBank object (no
>> options currently) that will coax the database into giving me this
>> data?  Or something else that I'm missing altogether.  The organism
>> of interest is human, taxon:9606.
>>
>> Nicolaus Lance Hepler
>> nlhepler at mail dot umd dot edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From lstein at cshl.edu  Wed Jun  7 12:50:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Jun 2006 12:50:24 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <4483F338.7090909@mrc-dunn.cam.ac.uk>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
Message-ID: <200606071250.25026.lstein@cshl.edu>

I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
because the CGI upload filehandle is not seekable (for good reasons that I 
won't inflict on you)! You'll have to write to a temporary file, or else read 
the whole sequence into memory. Sorry about this.

Lincoln

On Monday 05 June 2006 05:02, Sendu Bala wrote:
> Wijaya Edward wrote:
> > Dear Lincoln and experts
> >
> > Curently I have a CGI application that does this:
> >
> > 1.  read and uploaded file
> > 2. check the content of the file whether fasta or not
> > 3. print out the content of the file.
> >
> >
> > Now the problem I'm facing is that
> > on step three. The content of the file handled is altered
> > namely the very first line does not get printed.
>
> The problem is almost certainly that the guessing is done by reading the
> first line of the filehandle, so that your subsequent while loop on that
> same filehandle starts at the second line.
> Just seek the filehandle back to the start before trying to print the
> contents out.
>
> ..
> my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
> my $format_upload  = $guesser_upload->guess;
> seek($fh_upload, 0, 0);
> ..
> while (<$fh_upload>) {
>      ...
> }
>
> An alternative might be to pass GuessSeqFormat the filename in which
> case it would make its own filehandle and close it, leaving your own
> filehandle untouched.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From paul.boutros at utoronto.ca  Wed Jun  7 13:03:01 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 13:03:01 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
Message-ID: <1149699781.448706c5e803d@webmail.utoronto.ca>

Hi,

Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 and I had a few 
failures:

Failed Test         Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/Annotation.t                    89    2  79 88
t/Biblio.t                        24    1  2
t/LocusLink.t                     23    1  23
t/PhysicalMap.t                   14    2  11-12
t/RepeatMasker.t                   6    3  1-2 6
t/StandAloneBlast.t               18    4  19-22
t/TaxonTree.t                     17   30  11 18-42
t/alignUtilities.t                 9    1  9
t/psm.t              255 65280    48   35  29 32-48
t/tutorial.t                      21   15  7-21

Not sure if any of these are related to the "return undef" changes, or are known.  I also 
had some warnings running BioGraphics.t

t/BioGraphics................Use of uninitialized value in numeric lt (<) at Bio/Graphics/
FeatureFile.pm line 547, <GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
t/BioGraphics................ok

I also ran the tests manually and below I've attached what came out (doesn't always agree 
with the results of make test, and in a few cases (e.g. tutorial.t or StandAloneBlast.t) 
there were no errors running the tests manually.
Paul

Annotation.t
============
not ok 8
# Test 8 got: '' (t/Annotation.t at line 59)
#   Expected: '0'

not ok 71
# Test 71 got: 'dumpster|test case|Ann:00001' (t/Annotation.t at line 187)
#    Expected: 'dumpster|test case|'

not ok 79
# Failed test 79 in t/Annotation.t at line 217

ok 85
Use of uninitialized value in concatenation (.) or string at /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annot
ationFactory.pm line 236.

------------- EXCEPTION  -------------
MSG: Bio::AnnotationI implementation Bio::Annotation:: failed to load:
------------- EXCEPTION  -------------
MSG: Failed to load module Bio::Annotation::. Can't locate Bio/Annotation/.pm in @INC 
(@INC contains: t /db2blast/Paul/perl5.8
.7/lib/5.8.7/aix /db2blast/Paul/perl5.8.7/lib/5.8.7 /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/aix /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7 /db2blast/Paul/perl5.8.7/lib/site_perl .) at /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7/Bio/Root/Root.pm
 line 396.

STACK Bio::Root::Root::_load_module /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Root/
Root.pm:398
STACK (eval) /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Annotation/
AnnotationFactory.pm:149
STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:148
STACK toplevel t/Annotation.t:237
--------------------------------------

STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:152
STACK toplevel t/Annotation.t:237
--------------------------------------


PhysicalMap.t
=============
not ok 11
# Test 11 got: <UNDEF> (t/PhysicalMap.t at line 55)
#    Expected: '0' (code holds and returns a string, definition requires a boolean)
not ok 12
# Test 12 got: '3' (t/PhysicalMap.t at line 56)
#    Expected: '1' (code holds and returns a string, definition requires a boolean)

TaxonTree.t
===========
ok 10
Use of uninitialized value in string eq at /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/
Bio/Taxonomy/Taxon.pm line 559.
not ok 11
# Test 11 got: <UNDEF> (t/TaxonTree.t at line 35)
#    Expected: 'species'
ok 12 # foo is not a rank, class variable @RANK not initialised
ok 13
ok 14
ok 15
ok 16
ok 17
ok 18
Can't use string ("this could be anything") as a HASH ref while "strict refs" in use at /
db2blast/Paul/perl5.8.7/lib/site_perl
/5.8.7/Bio/Taxonomy/Taxon.pm line 452.

alignUtilities.t
================
ok 6

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------
ok 7
ok 8
not ok 9
# Test 9 got: '1' (t/alignUtilities.t at line 53)
#   Expected: '3'

RepeatMasker.t
==============
t/RepeatMasker...............FAILED tests 1-2, 6
        Failed 3/6 tests, 50.00% okay

StandAloneBlast.t
=================
t/StandAloneBlast............FAILED tests 19-22
        Failed 4/18 tests, 77.78% okay

psm.t
=====
t/Pseudowise.................ok
t/psm........................NOK 29Illegal division by zero at t/psm.t line 147, <GEN1> 
line 36.
t/psm........................dubious
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
        Failed 18/48 tests, 62.50% okay
t/QRNA.......................ok

tutorial.t
==========
t/tutorial...................ok 5/21
The following numeric arguments can be passed to run the corresponding demo-script.
1  => sequence_manipulations
2  => seqstats_and_seqwords
3  => restriction_and_sigcleave
4  => other_seq_utilities
5  => run_perl
6  => searchio_parsing
8  => hmmer_parsing
9  => simplealign
10 => gene_prediction_parsing
11 => access_remote_db
12 => index_local_db
13 => fetch_local_db    (NOTE: needs to be run with demo 12)
14 => sequence_annotation
15 => largeseqs
16 => liveseqs
17 => run_struct
18 => demo_variations
19 => demo_xml
20 => run_tree
21 => run_map
22 => run_remoteblast
23 => run_standaloneblast
24 => run_clustalw_tcoffee
25 => run_psw_bl2seq

In addition the argument "100" followed by the name of a single
bioperl object will display a list of all the public methods
available from that object and from what object they are inherited.

Using the parameter "0" will run all the tests that do not require
external programs (i.e. tests 1 to 22).
Using any other argument (or no argument) will run this display.

So typical command lines might be:
To run all core demo scripts:
 > perl -w  bptutorial.pl 0
or to just run the local indexing demos:
 > perl -w  bptutorial.pl 12 13
or to list all the methods available for object Bio::Tools::SeqStats -
 > perl -w  bptutorial.pl 100 Bio::Tools::SeqStats

t/tutorial...................FAILED tests 7-21
        Failed 15/21 tests, 28.57% okay

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> -Heikki
> 

From sb at mrc-dunn.cam.ac.uk  Wed Jun  7 12:54:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 07 Jun 2006 17:54:31 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <200606071250.25026.lstein@cshl.edu>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
	<200606071250.25026.lstein@cshl.edu>
Message-ID: <448704C7.6080201@mrc-dunn.cam.ac.uk>

Lincoln Stein wrote:
> I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
> because the CGI upload filehandle is not seekable (for good reasons that I 
> won't inflict on you)! You'll have to write to a temporary file, or else read 
> the whole sequence into memory. Sorry about this.

The OP already had success with my alternative solution.


>> An alternative might be to pass GuessSeqFormat the filename in which
>> case it would make its own filehandle and close it, leaving your own
>> filehandle untouched.

From hlapp at gmx.net  Wed Jun  7 13:25:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 13:25:25 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
Message-ID: <76434774-51A4-46E7-97AA-1E9227CB7771@gmx.net>

Hi Michael,

yes it looks like a problem in DBD if DBD::mysql fails to recognize  
that the mysql instance to which it is connected does support  
transactions. You can verify this by writing a simple script that  
tries to open a connection with
{ AutoCommit => 0 } as the parameter hash:

	use DBI;
	my $dbh = DBI->connect("dbi:mysql:database=<yourdb>;host=<yourhost>",
	                       "username","password",
	                       { AutoCommit => 0, RaiseError => 0 });
	die DBI::errstr unless $dbh;
	$dbh->disconnect;

If this succeeds fine then something in Biosql may be related to the  
problem, but otherwise not.

	-hilmar


On Jun 7, 2006, at 12:01 PM, Michael Muratet US-Huntsville wrote:

> Hilmar
>
> Pardon the top post.
>
> I tried the test below and it failed. So, I went back and redid the  
> Innodb configuration (deleted all the index files--they were empty  
> anyway, reinstalled biosql (which was empty,too) and restarted the  
> server. Now, the test below works. I went into the DBD-3.0003 and  
> did a distclean and reinstalled the package, but it fails the one  
> transaction test, too. So, it looks like the problem is in DBD, yes?
>
> We had a RAID 5 drive glitch the day before yesterday and rebuilt  
> it. That's the only thing that's changed that I know of that could  
> have caused the problem with ibxxx files.
>
> I have received a reply on the DBD list. Can you think of anything  
> else I should try from the biosql end?
>
> Thanks a million.
>
> Mike
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 07, 2006 7:52 AM
> To: Michael Muratet US-Huntsville
> Cc: Bioperl; BioSQL
> Subject: Re: [Bioperl-l] bioperl-db failing tests
>
>
> Hi Michael,
>
> Bioperl-db will open all connections with AutoCommit => 0 in the DBI
> parameter hash. The test you're stumbling over is actually there to
> test that the database  does support transactions, but apparently in
> 5.x versions MySQL no longer silently ignores the AutoCommit
> parameter if it doesn't support transactions (effectively preempting
> the test ...).
>
> Now you say that innodb shows as enabled - i.e., you can confirm that
> you changed the Mysql configuration parameter that designates the
> directory for innodb to store its files?
>
> You can confirm that transactions are supported by simple tests on
> the sql level. Open a mysql shell and do the following:
>
> 	-- BTW 'start transaction;' will (should) work too
> 	mysql> set autocommit = 0;
> 	mysql> insert into biodatabase (name) values ('__dummy__');
> 	mysql> select name from biodatabase where name = '__dummy__';
> 	mysql> rollback;
> 	mysql> select name from biodatabase where name = '__dummy__';
>
> The first SELECT query should return one and the last query should
> return zero rows if transactions are supported, and there shouldn't
> be any error.
>
> If the above succeeds (which I don't expect it to) then it looks like
> the DBD::mysql driver thinks the database doesn't support
> transactions when in reality it does. Let me know the result.
>
> 	-hilmar
>
> On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:
>
>> Greetings
>>
>> I am trying to install bioperl-db in preparation for installing a
>> biosql database. I'm running on a Dell PowerEdge with quad dual-
>> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl
>> 1.5.1.  I have installed mysql v5.0.21 from source with --with-
>> innodb set for the configuration. I installed bioperl-db from cvs.
>> I have the latest DBI and DBD:mysql installed a few weeks ago from
>> CPAN. The installation has been working well with perl otherwise,
>> for example, the Ensembl core API works OK. SHOW ENGINES indicates
>> that innodb is enabled.  I have attached a snippet from the top of
>> the output below. I searched the web and the bioperl-db list and
>> haven't found anything that appears to be relevant. I've done
>> several of these installs and they've pretty much completed without
>> a single glitch. Does anyone have any ideas how to isolate the
>> problem?
>>
>> Thanks
>>
>> Mike
>>
>> [mmuratet at HSV-PROBE bioperl-db]$ make test
>> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>> t/01dbadaptor.....ok 14/19
>> ------------- EXCEPTION  -------------
>> MSG: failed to open connection: Transactions not supported by  
>> database
>> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 255
>> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 215
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/
>> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/
>> BioSQL/BasePersistenceAdaptor.pm:1477
>> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/
>> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/
>> DB/BioSQL/BaseDriver.pm:518
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
>> STACK toplevel t/01dbadaptor.t:62
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 14:08:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 13:08:19 -0500
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>
Message-ID: <001501c68a5d$5db655a0$15327e82@pyrimidine>

Nicolaus,

Bio::DB::GenBank use NCBI's efetch mainly; I implemented epost but it's a
hack at best and only works in certain circumstances.  So you could get the
sequence data directly but the links aren't included and are only given
through NCBI's elink.  There is no way I know of to get this information via
bioperl as there isn't an interface to NCBI's elink AFAIK (Brian?).  I'm
working on a rewrite for a general NCBI eutils interface for each tool
(efetch, epost, elink, etc), but it isn't working yet and probably won't be
ready to go until the end of summer-beginning of fall.

Just so you know how complex the situation is when using accessions, you
can't use a sequence accession directly when querying elink (and most
eutils), it has to be the GI number; I believe efetch is the only one that
accepts accessions.  So you would have to run esearch first using the
accessions as a query, grab the GI from the XML, run elink with the GI, grab
the SNP cluster ID, efetch the SNP data, and parse the data to get into
Bio::ClusterIO.  Fun, huh?  You would think NCBI would try making this a
little easier...

There used to be a way to parse dbSNP data using Bio::ClusterIO but the XML
schema changed so the parser is likely broken (the tests work but the file
is from the old schema).  I think Allen Day was in charge of it.

I used the eutils test interface () to grab the SNP cluster accessions for
your sequence using elink (note that the format is XML, which one  would
have to parse out to grab the cluster ID's):

<eLinkResult>
<LinkSet>
	<DbFrom>nucleotide</DbFrom>
	<IdList>
		<Id>33875090</Id>
	</IdList>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp</LinkName>
		<Link>
			<Id>4631</Id>
		</Link>
	</LinkSetDb>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp_genegenotype</LinkName>
		<Link>
			<Id>28362589</Id>
		</Link>
		<Link>
			<Id>4635949</Id>
		</Link>

		<Link>
			<Id>28362591</Id>
		</Link>
		<Link>
			<Id>11545838</Id>
		</Link>
		<Link>
			<Id>4246814</Id>

		</Link>
		<Link>
			<Id>28670911</Id>
		</Link>
		<Link>
			<Id>4073746</Id>
		</Link>
		<Link>

			<Id>9313754</Id>
		</Link>
		<Link>
			<Id>11545840</Id>
		</Link>
		<Link>
			<Id>17077806</Id>

		</Link>
		<Link>
			<Id>28362590</Id>
		</Link>
		<Link>
			<Id>4076327</Id>
		</Link>
		<Link>

			<Id>9834</Id>
		</Link>
		<Link>
			<Id>4073745</Id>
		</Link>
		<Link>
			<Id>6879874</Id>

		</Link>
	</LinkSetDb>
</LinkSet>
</eLinkResult>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nicolaus Hepler
> Sent: Wednesday, June 07, 2006 11:26 AM
> To: Brian Osborne; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] GenBank Feature: variation
> 
> Brian,
> 
> A sample accession is BC000007.  I figured a way around it though.
> Rather than automate the whole process, I just downloaded from Batch
> Entrez a flat .gb file of all my accessions.  It's not flexible, and
> will be inconvenient when we expand the dataset, but it will provide
> me with data to work with for now.
> 
> Nicolaus
> 
> > Nicolaus,
> >
> > The short answer is no, there's no option that will omit or add a
> > particular
> > feature or annotation to the Sequence object returned by
> > Bio::DB::GenBank.
> > Can you give some example accessions?
> >
> > Brian O.
> >
> >
> > On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
> >
> >> Hello,
> >>
> >> I am having some difficulty here.  I have a list of accessions, which
> >> are the parameters for a get_Stream_by_acc() function on a
> >> Bio::DB::GenBank object.  None of the returned GenBank information
> >> for any of my accessions seems to contain variation data, no matter
> >> how I try to coax it out with unflattener and typemapper.  This data
> >> is, however, available via the web interface of NCBI Nucleotide, as
> >> an optional feature (SNP).  I was wondering if there was some option
> >> I'm missing in the initialization of the Bio::DB::GenBank object (no
> >> options currently) that will coax the database into giving me this
> >> data?  Or something else that I'm missing altogether.  The organism
> >> of interest is human, taxon:9606.
> >>
> >> Nicolaus Lance Hepler
> >> nlhepler at mail dot umd dot edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Michael.Muratet at operon.com  Wed Jun  7 12:01:29 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Wed, 7 Jun 2006 11:01:29 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>

Hilmar

Pardon the top post.

I tried the test below and it failed. So, I went back and redid the Innodb configuration (deleted all the index files--they were empty anyway, reinstalled biosql (which was empty,too) and restarted the server. Now, the test below works. I went into the DBD-3.0003 and did a distclean and reinstalled the package, but it fails the one transaction test, too. So, it looks like the problem is in DBD, yes?

We had a RAID 5 drive glitch the day before yesterday and rebuilt it. That's the only thing that's changed that I know of that could have caused the problem with ibxxx files. 

I have received a reply on the DBD list. Can you think of anything else I should try from the biosql end?

Thanks a million.

Mike

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net]
Sent: Wednesday, June 07, 2006 7:52 AM
To: Michael Muratet US-Huntsville
Cc: Bioperl; BioSQL
Subject: Re: [Bioperl-l] bioperl-db failing tests


Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 15:38:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 14:38:08 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
Message-ID: <001901c68a69$e7ece8e0$15327e82@pyrimidine>

All,

Don't know how many people use Bio::ClusterIO this module, but it looks like
Bio::ClusterIO::dbsnp is broken unless you are using older XML versions of
the dbSNP database; the schema for ASN.1 and XML format for SNP has changed:

http://www.ncbi.nlm.nih.gov/projects/SNP/

under 'Announcements'.

I actually tried parsing the dbsnp test file and a newer schema XML file to
confirm this; the new version doesn't work (returned object from
next_cluster is undef).  I'm filing a bug as a reminder.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From paul.boutros at utoronto.ca  Wed Jun  7 18:35:46 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 18:35:46 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149719746.448754c2ef4e0@webmail.utoronto.ca>

> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
Yup yup, you're right. I should have mentioned in my original message that I didn't run 
any remote-server tests, and unfortunately can't do so on this box.
Paul

Quoting David Messina <dmessina at wustl.edu>:

> To look for problems related to Heikki's "return undef" sweep, I ran  
> 'make test' on both today's version of bioperl-live and on an older  
> version I had checked out on May 12. This was done on OS X 10.4.6 and  
> perl 5.8.6.
> 
> 
> Here are the results:
> 
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
> need to be examined.
> 
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
> 
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
> between 5/12 and today, so this appears to be not 'return undef'- 
> related.
> 
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load  
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
> 5/12 and today, so this is not 'return undef'-related.
> 
> 
> 
> All the other test failures appear in both versions of bioperl-live,  
> so presumably they are not affected by the 'return undef' changes.
> 
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
> 
> Also, he saw errors in Biblio.t, Repeatmasker.t, and  
> StandAloneBlast.t that I did not.
> 
> Dave
> 
> 
> Today's bioperl-live test results:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
> 99.84% okay.
> 
> Note that this is including tests requiring a remote server.
> 
> And here's the output from a May 12 checkout of bioperl-live:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
> 99.89% okay.
> 
> 
> 
> 
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
> 
> > Hi,
> >
> > Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> > and I had a few
> > failures:
> >
> > Failed Test         Stat Wstat Total Fail  List of Failed
> > ---------------------------------------------------------------------- 
> > ---------
> > t/Annotation.t                    89    2  79 88
> > t/Biblio.t                        24    1  2
> > t/LocusLink.t                     23    1  23
> > t/PhysicalMap.t                   14    2  11-12
> > t/RepeatMasker.t                   6    3  1-2 6
> > t/StandAloneBlast.t               18    4  19-22
> > t/TaxonTree.t                     17   30  11 18-42
> > t/alignUtilities.t                 9    1  9
> > t/psm.t              255 65280    48   35  29 32-48
> > t/tutorial.t                      21   15  7-21
> 
> 


From dmessina at wustl.edu  Wed Jun  7 18:26:25 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 17:26:25 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <1149699781.448706c5e803d@webmail.utoronto.ca>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
Message-ID: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>

To look for problems related to Heikki's "return undef" sweep, I ran  
'make test' on both today's version of bioperl-live and on an older  
version I had checked out on May 12. This was done on OS X 10.4.6 and  
perl 5.8.6.


Here are the results:

Failures in today's version of bioperl-live but NOT in 5/12 version
===================================================================
- psm.t -
The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
need to be examined.

Here's the error message:
Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
t/psm........................dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
         Failed 18/48 tests, 62.50% okay


Failures in 5/12 version of bioperl-live but NOT in today's version
===================================================================
- OntologyStore.t -
Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
touched between 5/12 and today.

The error looks like a transient network problem to me, but I'm not  
sure:
-------------------- WARNING ---------------------
MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
*checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
500.  retrying...
---------------------------------------------------
[REPEATED 5 times -Dave]

t/OntologyStore..............FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay


- RepeatMasker.t -
Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
between 5/12 and today, so this appears to be not 'return undef'- 
related.

- SeqVersion.t -
The SeqVersion error was due to a failure to find and load  
Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
5/12 and today, so this is not 'return undef'-related.


All the other test failures appear in both versions of bioperl-live,  
so presumably they are not affected by the 'return undef' changes.

Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
that he didn't. Those only pop up when I run the optional remote- 
server tests, however. Perhaps Paul didn't run those and that  
accounts for the discrepancy?

Also, he saw errors in Biblio.t, Repeatmasker.t, and  
StandAloneBlast.t that I did not.

Dave


Today's bioperl-live test results:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/PhysicalMap.t                  14    2  14.29%  11-12
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/psm.t             255 65280    48   35  72.92%  29 32-48
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
99.84% okay.

Note that this is including tests requiring a remote server.

And here's the output from a May 12 checkout of bioperl-live:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/OntologyStore.t                 6    4  66.67%  3-6
t/PhysicalMap.t                  14    2  14.29%  11-12
t/RepeatMasker.t                  6    3  50.00%  1-2 6
t/SeqVersion.t      255 65280     6   10 166.67%  2-6
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
99.89% okay.


On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:

> Hi,
>
> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> and I had a few
> failures:
>
> Failed Test         Stat Wstat Total Fail  List of Failed
> ---------------------------------------------------------------------- 
> ---------
> t/Annotation.t                    89    2  79 88
> t/Biblio.t                        24    1  2
> t/LocusLink.t                     23    1  23
> t/PhysicalMap.t                   14    2  11-12
> t/RepeatMasker.t                   6    3  1-2 6
> t/StandAloneBlast.t               18    4  19-22
> t/TaxonTree.t                     17   30  11 18-42
> t/alignUtilities.t                 9    1  9
> t/psm.t              255 65280    48   35  29 32-48
> t/tutorial.t                      21   15  7-21


From cjfields at uiuc.edu  Wed Jun  7 19:38:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 18:38:10 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>

I saw a ton of activity from Jason on bioperl-guts for test files and  
modules; you may want to check your tests vs. his changes in case  
they were fixed.  I'll be running similar tests on WinXP ad Mac OS X;  
would be nice to see how my results compare to Dave's

Chris

On Jun 7, 2006, at 5:26 PM, David Messina wrote:

> To look for problems related to Heikki's "return undef" sweep, I ran
> 'make test' on both today's version of bioperl-live and on an older
> version I had checked out on May 12. This was done on OS X 10.4.6 and
> perl 5.8.6.
>
>
> Here are the results:
>
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may
> need to be examined.
>
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
>
>
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been
> touched between 5/12 and today.
>
> The error looks like a transient network problem to me, but I'm not
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
>
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
>
>
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm
> between 5/12 and today, so this appears to be not 'return undef'-
> related.
>
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between
> 5/12 and today, so this is not 'return undef'-related.
>
>
>
> All the other test failures appear in both versions of bioperl-live,
> so presumably they are not affected by the 'return undef' changes.
>
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG
> that he didn't. Those only pop up when I run the optional remote-
> server tests, however. Perhaps Paul didn't run those and that
> accounts for the discrepancy?
>
> Also, he saw errors in Biblio.t, Repeatmasker.t, and
> StandAloneBlast.t that I did not.
>
> Dave
>
>
> Today's bioperl-live test results:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.
>
> Note that this is including tests requiring a remote server.
>
> And here's the output from a May 12 checkout of bioperl-live:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,
> 99.89% okay.
>
>
>
>
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
>
>> Hi,
>>
>> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7
>> and I had a few
>> failures:
>>
>> Failed Test         Stat Wstat Total Fail  List of Failed
>> --------------------------------------------------------------------- 
>> -
>> ---------
>> t/Annotation.t                    89    2  79 88
>> t/Biblio.t                        24    1  2
>> t/LocusLink.t                     23    1  23
>> t/PhysicalMap.t                   14    2  11-12
>> t/RepeatMasker.t                   6    3  1-2 6
>> t/StandAloneBlast.t               18    4  19-22
>> t/TaxonTree.t                     17   30  11 18-42
>> t/alignUtilities.t                 9    1  9
>> t/psm.t              255 65280    48   35  29 32-48
>> t/tutorial.t                      21   15  7-21
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Wed Jun  7 20:50:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 19:50:48 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
Message-ID: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>

Thanks for letting me know, Chris.

Here's a new round of results on bioperl-live checked out moments ago:
[OS X 10.4.6, perl 5.8.6]

Failed Test   Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/DBCUTG.t                  29    5  17.24%  26 30-32
t/LocusLink.t               23    1   4.35%  23
t/PopGen.t                  89    1   1.12%  85
t/psm.t        255 65280    48   35  72.92%  29 32-48
t/tutorial.t                21   15  71.43%  7-21
121 subtests skipped.
Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,  
99.69% okay.

Fixed since earlier today
=========================
Annotation.t
PhysicalMap.t
TaxonTree.t
alignUtilities.t

New since earlier today
=======================
PopGen.t

t/PopGen.....................FAILED test 85
         Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86  
okay, 96.63%)

Unchanged
=========
DBCUTG.t
LocusLink.t
psm.t
tutorial.t

Remote-server tests were run like before. I forgot to mention last  
time that I skipped the local DB tests and I don't have bioperl-ext  
installed, so several staden-related tests were also skipped.

Dave


My results from earlier today for reference:
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.


From heikki at sanbi.ac.za  Thu Jun  8 04:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From heikki at sanbi.ac.za  Thu Jun  8 04:52:27 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:52:27 +0200
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
Message-ID: <200606081052.27446.heikki@sanbi.ac.za>

I sort of fixed this.

At least the tests pass (I commented out two) when using the new sample XML. 
To be really usefull, the code need much more work, so I left the bug open.

http://bugzilla.open-bio.org/show_bug.cgi?id=2018


	-Heikki


On Wednesday 07 June 2006 21:38, Chris Fields wrote:
> All,
>
> Don't know how many people use Bio::ClusterIO this module, but it looks
> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
> versions of the dbSNP database; the schema for ASN.1 and XML format for SNP
> has changed:
>
> http://www.ncbi.nlm.nih.gov/projects/SNP/
>
> under 'Announcements'.
>
> I actually tried parsing the dbsnp test file and a newer schema XML file to
> confirm this; the new version doesn't work (returned object from
> next_cluster is undef).  I'm filing a bug as a reminder.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From heikki at sanbi.ac.za  Thu Jun  8 04:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 01:55:09 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 08 Jun 2006 15:55:09 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
Message-ID: <4487BBBD.6060702@infotech.monash.edu.au>

Hi all,

I've just been further auditing the Bioperl code and noticed that
Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
can't locate an example/sample sequence file in "Lasergene" format.

 From the code it looks similar to 'raw' format but has "^^" as
a separator character.

Can anyone provide a real-life example so I can augment the 
t/lasergene.t tests?

Thanks,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jrm62 at cam.ac.uk  Thu Jun  8 07:38:40 2006
From: jrm62 at cam.ac.uk (John Mifsud)
Date: 08 Jun 2006 12:38:40 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
Message-ID: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>

Dear all,

Firstly I hope this is the right email list to write to! 

Secondly, I have a little program that parses the BLAST results i have got 
running remotely to the NCBI server and takes out all the hit sequences and 
converts them to FASTA format.

Now when using BROAD BLAST and getting results this works fine (tblastn ver 
2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
the output is different and the parsing no longer works. I was wondering if 
anyone knew of a new SearchIO module / script that is designed to blast the 
updated NCBI BLAST output?

Thanks for your time,


John


From cjfields at uiuc.edu  Thu Jun  8 08:56:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 07:56:27 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <200606081052.27446.heikki@sanbi.ac.za>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
	<200606081052.27446.heikki@sanbi.ac.za>
Message-ID: <AB8EE4BC-4774-48A6-8F26-2A8356F8E700@uiuc.edu>

Sounds good to me.  If someone wants to use this down the line, they  
might be desperate enough to provide patches; there are a lot of  
commented out tags.

Chris

On Jun 8, 2006, at 3:52 AM, Heikki Lehvaslaiho wrote:

> I sort of fixed this.
>
> At least the tests pass (I commented out two) when using the new  
> sample XML.
> To be really usefull, the code need much more work, so I left the  
> bug open.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2018
>
>
> 	-Heikki
>
>
> On Wednesday 07 June 2006 21:38, Chris Fields wrote:
>> All,
>>
>> Don't know how many people use Bio::ClusterIO this module, but it  
>> looks
>> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
>> versions of the dbSNP database; the schema for ASN.1 and XML  
>> format for SNP
>> has changed:
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/
>>
>> under 'Announcements'.
>>
>> I actually tried parsing the dbsnp test file and a newer schema  
>> XML file to
>> confirm this; the new version doesn't work (returned object from
>> next_cluster is undef).  I'm filing a bug as a reminder.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 09:03:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 14:03:05 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <44882009.1040906@mrc-dunn.cam.ac.uk>

John Mifsud wrote:
> Dear all,
> 
> Firstly I hope this is the right email list to write to! 
> 
> Secondly, I have a little program that parses the BLAST results i have got 
> running remotely to the NCBI server and takes out all the hit sequences and 
> converts them to FASTA format.
> 
> Now when using BROAD BLAST and getting results this works fine (tblastn ver 
> 2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
> the output is different and the parsing no longer works. I was wondering if 
> anyone knew of a new SearchIO module / script that is designed to blast the 
> updated NCBI BLAST output?

You'll probably need to get the latest SearchIO blast module from 
bioperl-live.
http://bioperl.org/wiki/Getting_BioPerl

If you're having difficulties with your setup, John, I can just send you 
the relevant file(s). Mail me (or Alan) privately for that.

From cjfields at uiuc.edu  Thu Jun  8 09:12:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 08:12:23 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>

I would say, based on previous responses, update to the latest CVS  
(bioperl-live).  You could also try updating  
Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you  
don't want to update the entire toolkit.  Running these with BLAST  
2.2.14 output seems to work fine.

Though this is the likely fix, if you have additional problems next  
time please make sure to include more information.  We have no idea  
what OS, bioperl version, perl version you are running.  And a code  
snippet and bug description would be nice (i.e. "it doesn't work" -  
not a good description; "the script freezes" is a little more  
informative).

Chris

On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:

> Dear all,
>
> Firstly I hope this is the right email list to write to!
>
> Secondly, I have a little program that parses the BLAST results i  
> have got
> running remotely to the NCBI server and takes out all the hit  
> sequences and
> converts them to FASTA format.
>
> Now when using BROAD BLAST and getting results this works fine  
> (tblastn ver
> 2.2.9). However, NCBI have just updated their BLAST server (to  
> 2.2.14) and
> the output is different and the parsing no longer works. I was  
> wondering if
> anyone knew of a new SearchIO module / script that is designed to  
> blast the
> updated NCBI BLAST output?
>
> Thanks for your time,
>
>
> John
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 12:03:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 17:03:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>	<200605311255.19166.heikki@sanbi.ac.za>
	<447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <44884A49.6060805@mrc-dunn.cam.ac.uk>

Sendu Bala wrote:
> Heikki Lehvaslaiho wrote:
>> In my opinion the sooner the bugs get exposed the better. It is much more 
>> likely that there is a well hidden bug caused by assigning accidentally undef 
>> into an one element array that someone intentionally writing code that 
>> expects that behaviour!
>>
>> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
>> and could not see any differences in the test output. 
>>
>> Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible 
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Looks like Heikki's work went well. If there is any further interest in 
getting rid of all the remaining undef returns, this also need to be fixed:

sub x {
   # return (...) on success
   # do nothing on failure
}

Needs to be changed to:

sub x {
   # return (...) on success
   return;
}

From roy at colibase.bham.ac.uk  Thu Jun  8 12:31:10 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 08 Jun 2006 17:31:10 +0100
Subject: [Bioperl-l] Truncate sequence with features
Message-ID: <448850CE.1040105@colibase.bham.ac.uk>

Hi all.

I've been playing around with a subroutine to truncate a sequence and 
adjust the coordinates of any features that overlap the specified 
region- something that according to the comments in 
Bio::Location::Simple has been abortively worked on in the past.

I've submitted the subroutine as an enhancement in Bugzilla. It's a bit 
hacky but works for what I needed it for. However I'm a bit unsure on 
the best way to deal with split locations where one of the sublocations 
is entirely outside the truncated region. My current method results in 
locations like:
join(1..500, >1000..>1000)

which is quite ugly and possibly invalid, but kind of makes sense. Does 
anyone know what would be the correct behaviour for this situation?

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk

From cjfields at uiuc.edu  Thu Jun  8 14:47:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 13:47:19 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <000701c68b2b$f8cc21e0$15327e82@pyrimidine>

Thomas;

That error isn't related to BioPerl.  This is the standard HTML response
NCBI gives as a web page; the error imbedded in the HTML you received as a
warning has:

ERROR: Cannot accept request, error code: 1Number of unfinished requests
(151) from your IP address reached the HARD limit 150.

So you may have too many requests in the BLAST queue.  

Chris

> -----Original Message-----
> From: Thomas J Keller [mailto:kellert at ohsu.edu]
> Sent: Thursday, June 08, 2006 1:39 PM
> To: Chris Fields
> Cc: John Mifsud; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] NCBI BLAST results parsing
> 
> I'm having the same problem bp_remote_blast.pl worked yesterday,
> today it's busted. Incidently, I got the following email from NCBI
> this morning:
> The new version of the NCBI SOAP E-Utilities, which includes recent
> changes to the NCBI sequence databases schema, was released today.
> 
> Thank you.
> NCBI E-Utilities Team
> 
> I wouldn't have thought that that would affect
> Bio::Tools::RemoteBlast but something has changed.
> 
> Here's a snippet of the output after $ bp_remote_blast.pl -p blastn -
> d nr -e 1e-3 -i nm_008540.fasta
> 
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
> Content-Length: 267
> Content-Type: application/x-www-form-urlencoded
> 
> DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+%
> 25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C
> +mRNA.%
> 0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm
> ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn
> 
> 
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content-
> Type" content="text/html; charset=utf-8"/><link rel="stylesheet"
> href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body
> bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"
> onload="StartBlastCgi();"><!--  the header   --> <table border="0"
> width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"
> colspan=4>    <map name="head_img_map">    <area shape="rect"
> coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home
> page">       <area shape="rect" coords="301,0,600,40" href="http://
> www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>
> <IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"
> NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"
> HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr
> align="center">    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI=
> yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN
> KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes
> &HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT
> O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td
> width="150" bgcolor="#003366">        <a href="http://
> www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI
> =yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI
> NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye
> s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</
> FONT></a></td></tr></table><br><!--  the contents   --> <form
> action="Blast.cgi" enctype="application/x-www-form-urlencoded"
> method="POST"><script src="blastcgi.js"></script><SCRIPT
> LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =
> 'html/head_formating.gif';// --></SCRIPT><br><hr><font
> color="red">ERROR: Cannot accept request, error code: 1Number of
> unfinished requests (151)  from your IP address reached the HARD
> limit 150.</font><hr></form>   </body></html>
> ---------------------------------------------------
> 
> On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:
> 
> > I would say, based on previous responses, update to the latest CVS
> > (bioperl-live).  You could also try updating
> > Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> > don't want to update the entire toolkit.  Running these with BLAST
> > 2.2.14 output seems to work fine.
> >
> > Though this is the likely fix, if you have additional problems next
> > time please make sure to include more information.  We have no idea
> > what OS, bioperl version, perl version you are running.  And a code
> > snippet and bug description would be nice (i.e. "it doesn't work" -
> > not a good description; "the script freezes" is a little more
> > informative).
> >
> > Chris
> >
> > On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
> >
> >> Dear all,
> >>
> >> Firstly I hope this is the right email list to write to!
> >>
> >> Secondly, I have a little program that parses the BLAST results i
> >> have got
> >> running remotely to the NCBI server and takes out all the hit
> >> sequences and
> >> converts them to FASTA format.
> >>
> >> Now when using BROAD BLAST and getting results this works fine
> >> (tblastn ver
> >> 2.2.9). However, NCBI have just updated their BLAST server (to
> >> 2.2.14) and
> >> the output is different and the parsing no longer works. I was
> >> wondering if
> >> anyone knew of a new SearchIO module / script that is designed to
> >> blast the
> >> updated NCBI BLAST output?
> >>
> >> Thanks for your time,
> >>
> >>
> >> John
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From kellert at ohsu.edu  Thu Jun  8 14:39:04 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 11:39:04 -0700
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
Message-ID: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>

I'm having the same problem bp_remote_blast.pl worked yesterday,  
today it's busted. Incidently, I got the following email from NCBI  
this morning:
The new version of the NCBI SOAP E-Utilities, which includes recent
changes to the NCBI sequence databases schema, was released today.

Thank you.
NCBI E-Utilities Team

I wouldn't have thought that that would affect  
Bio::Tools::RemoteBlast but something has changed.

Here's a snippet of the output after $ bp_remote_blast.pl -p blastn - 
d nr -e 1e-3 -i nm_008540.fasta

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
Content-Length: 267
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+% 
25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C 
+mRNA.% 
0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm 
ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn


---------------------------------------------------

-------------------- WARNING ---------------------
MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content- 
Type" content="text/html; charset=utf-8"/><link rel="stylesheet"  
href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body  
bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"  
onload="StartBlastCgi();"><!--  the header   --> <table border="0"  
width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"  
colspan=4>    <map name="head_img_map">    <area shape="rect"  
coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home  
page">       <area shape="rect" coords="301,0,600,40" href="http:// 
www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>     
<IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"  
NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"  
HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr  
align="center">    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI= 
yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN 
KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes 
&HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT 
O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td  
width="150" bgcolor="#003366">        <a href="http:// 
www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI 
=yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI 
NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye 
s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</ 
FONT></a></td></tr></table><br><!--  the contents   --> <form  
action="Blast.cgi" enctype="application/x-www-form-urlencoded"  
method="POST"><script src="blastcgi.js"></script><SCRIPT  
LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =  
'html/head_formating.gif';// --></SCRIPT><br><hr><font  
color="red">ERROR: Cannot accept request, error code: 1Number of  
unfinished requests (151)  from your IP address reached the HARD  
limit 150.</font><hr></form>   </body></html>
---------------------------------------------------

On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:

> I would say, based on previous responses, update to the latest CVS
> (bioperl-live).  You could also try updating
> Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> don't want to update the entire toolkit.  Running these with BLAST
> 2.2.14 output seems to work fine.
>
> Though this is the likely fix, if you have additional problems next
> time please make sure to include more information.  We have no idea
> what OS, bioperl version, perl version you are running.  And a code
> snippet and bug description would be nice (i.e. "it doesn't work" -
> not a good description; "the script freezes" is a little more
> informative).
>
> Chris
>
> On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
>
>> Dear all,
>>
>> Firstly I hope this is the right email list to write to!
>>
>> Secondly, I have a little program that parses the BLAST results i
>> have got
>> running remotely to the NCBI server and takes out all the hit
>> sequences and
>> converts them to FASTA format.
>>
>> Now when using BROAD BLAST and getting results this works fine
>> (tblastn ver
>> 2.2.9). However, NCBI have just updated their BLAST server (to
>> 2.2.14) and
>> the output is different and the parsing no longer works. I was
>> wondering if
>> anyone knew of a new SearchIO module / script that is designed to
>> blast the
>> updated NCBI BLAST output?
>>
>> Thanks for your time,
>>
>>
>> John
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Jun  8 15:28:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 14:28:18 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <200606081049.40232.heikki@sanbi.ac.za>
Message-ID: <000001c68b31$b5320390$15327e82@pyrimidine>

Here are tests run from WinXP, ActivePerl 5.8.817; almost everything passes.
Not sure what's going on with StandAloneBlast or the protgraph tests, so
I'll check into it.  The psm.t tests that failed are the same as the ones
mentioned previously on other systems.
As an aside, I hate that using '-w' flag with ActivePerl gives a thousand
useless 'subroutines redefined' warnings; only way I found to turn it off is
to not use the flag.  Anyway, I pulled out the relevant chunks of code here;
I'll submit the Mac results separately to not confuse the two.  

...
t/StandAloneBlast............FAILED tests 19-22
	Failed 4/18 tests, 77.78% okay
...
t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
36-37, 45, 48-56, 59-60, 65-66
	Failed 22/66 tests, 66.67% okay
...
t/psm........................Illegal division by zero at t/psm.t line 147,
<GEN1> line 36.
dubious
	Test returned status 9 (wstat 2304, 0x900)
DIED. FAILED tests 29, 32-48
Failed 18/48 tests, 62.50% okay
...
Failed Test         Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/StandAloneBlast.t               18    4  22.22%  19-22
t/protgraph.t                     66   22  33.33%  11 13 20-21 26 33 36-37
45
                                                   48-56 59-60 65-66
t/psm.t                9  2304    48   35  72.92%  29 32-48
39 subtests skipped.
Failed 3/233 test scripts, 98.71% okay. 36/11100 subtests failed, 99.68%
okay.
NMAKE :  U1077: 
Stop.


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Thursday, June 08, 2006 3:50 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; BioPerl Mailing List; Chris Fields
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall with
> "returnundef"
> 
> Looks like we survived the sweeping change - and fixed a number of
> existing
> bugs in the process. Thanks for everyone who helped!
> 
> 	-Heikki
> 
> On Thursday 08 June 2006 02:50, David Messina wrote:
> > Thanks for letting me know, Chris.
> >
> > Here's a new round of results on bioperl-live checked out moments ago:
> > [OS X 10.4.6, perl 5.8.6]
> >
> > Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> > ------------------------------------------------------------------------
> > -------
> > t/DBCUTG.t                  29    5  17.24%  26 30-32
> > t/LocusLink.t               23    1   4.35%  23
> > t/PopGen.t                  89    1   1.12%  85
> > t/psm.t        255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                21   15  71.43%  7-21
> > 121 subtests skipped.
> > Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> > 99.69% okay.
> >
> > Fixed since earlier today
> > =========================
> > Annotation.t
> > PhysicalMap.t
> > TaxonTree.t
> > alignUtilities.t
> >
> > New since earlier today
> > =======================
> > PopGen.t
> >
> > t/PopGen.....................FAILED test 85
> >          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> > okay, 96.63%)
> >
> > Unchanged
> > =========
> > DBCUTG.t
> > LocusLink.t
> > psm.t
> > tutorial.t
> >
> > Remote-server tests were run like before. I forgot to mention last
> > time that I skipped the local DB tests and I don't have bioperl-ext
> > installed, so several staden-related tests were also skipped.
> >
> > Dave
> >
> > My results from earlier today for reference:
> > > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > > ----------------------------------------------------------------------
> > > --
> > > -------
> > > t/Annotation.t                   89    2   2.25%  79 88
> > > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > > t/LocusLink.t                    23    1   4.35%  23
> > > t/PhysicalMap.t                  14    2  14.29%  11-12
> > > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > > t/alignUtilities.t                9    1  11.11%  9
> > > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > > t/tutorial.t                     21   15  71.43%  7-21
> > > 114 subtests skipped.
> > > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > > 99.84% okay.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fernan at iib.unsam.edu.ar  Thu Jun  8 13:02:27 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Thu, 8 Jun 2006 14:02:27 -0300
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <4487BBBD.6060702@infotech.monash.edu.au>
References: <4487BBBD.6060702@infotech.monash.edu.au>
Message-ID: <20060608170227.GF3334@iib.unsam.edu.ar>

+----[ Torsten Seemann <torsten.seemann at infotech.monash.edu.au> (08.Jun.2006 13:47):
|
| Hi all,
| 
| I've just been further auditing the Bioperl code and noticed that
| Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
| can't locate an example/sample sequence file in "Lasergene" format.
| 
|  From the code it looks similar to 'raw' format but has "^^" as
| a separator character.
| 
| Can anyone provide a real-life example so I can augment the 
| t/lasergene.t tests?
|
+----]

See the attached file. 

The format seems to be plain text, beginning with a free
text description that goes from the beginning of the file
until the "^^" delimiter, and after that the sequence.

Fernan
-------------- next part --------------
Created: Jueves, 08 de Junio de 2006 01:56 p.m.

This is a test sequence created with EditSeq (Lasergene's DNAStar)

^^
ATCGATCGATCG

From freimuth at pathology.wustl.edu  Thu Jun  8 13:12:36 2006
From: freimuth at pathology.wustl.edu (Freimuth, Robert)
Date: Thu, 8 Jun 2006 12:12:36 -0500
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
Message-ID: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>

Hi,

I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
of hits from blast, then get some information about the tiled result.  I
thought I'd use the num_unaligned_query and num_unaligned_hit methods to
get the number of unaligned bases in the tiled result, then subtract
that from the length of the query/subject sequence to get the number of
aligned bases in the region spanned by the hit(s).  My code is below,
followed by the error message.


while( my $result_obj = $blast_obj->next_result() )
{
    while( my $hit_obj = $result_obj->next_hit() )
    {
        my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
=> $hit_obj->name() );
        $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
this number of bp

        while( my $hsp_obj = $hit_obj->next_hsp() )
        {
            # add all HSPs to a GenericHit object so they can be tiled
together
            $generic_hit_obj->add_hsp( $hsp_obj );
        }

        my $num_unaligned_query =
$generic_hit_obj->num_unaligned_query();
        my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();


------------- EXCEPTION  -------------
MSG: Must have defined query_len
STACK Bio::Search::Hit::GenericHit::logical_length
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
STACK Bio::Search::Hit::GenericHit::num_unaligned_query
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
STACK toplevel blast_needle_timetrials_1.pl:94
 
--------------------------------------


I looked through the docs to try to find an explanation or some mention
of how to set query_len, but I didn't find anything.  Could someone
please point out what I'm doing wrong?  Additionally, if I'm making this
harder than it needs to be, please give me a gentle whack with the clue
stick.

Thanks,
Bob


From osborne1 at optonline.net  Thu Jun  8 15:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From osborne1 at optonline.net  Thu Jun  8 15:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From jason at bioperl.org  Thu Jun  8 16:15:47 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 8 Jun 2006 16:15:47 -0400
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <84AC010A-25E6-48C7-A723-CE4688ECA926@bioperl.org>

why are you trying to create new Hit objects?
  $hit_obj is-A GenericHit object...


-jason
On Jun 8, 2006, at 1:12 PM, Freimuth, Robert wrote:

> Hi,
>
> I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
> of hits from blast, then get some information about the tiled  
> result.  I
> thought I'd use the num_unaligned_query and num_unaligned_hit  
> methods to
> get the number of unaligned bases in the tiled result, then subtract
> that from the length of the query/subject sequence to get the  
> number of
> aligned bases in the region spanned by the hit(s).  My code is below,
> followed by the error message.
>
>
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
>
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
>
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
> STACK Bio::Search::Hit::GenericHit::num_unaligned_query
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
> STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
> STACK toplevel blast_needle_timetrials_1.pl:94
>
> --------------------------------------
>
>
> I looked through the docs to try to find an explanation or some  
> mention
> of how to set query_len, but I didn't find anything.  Could someone
> please point out what I'm doing wrong?  Additionally, if I'm making  
> this
> harder than it needs to be, please give me a gentle whack with the  
> clue
> stick.
>
> Thanks,
> Bob
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 18:36:00 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 09 Jun 2006 08:36:00 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <20060608170227.GF3334@iib.unsam.edu.ar>
References: <4487BBBD.6060702@infotech.monash.edu.au>
	<20060608170227.GF3334@iib.unsam.edu.ar>
Message-ID: <4488A650.2050803@infotech.monash.edu.au>

> I've just been further auditing the Bioperl code and noticed that
> Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
> can't locate an example/sample sequence file in "Lasergene" format.

Thanks to Fernan, Todd and Senthil who sent me example Lasergene files.
Those will be enough examples to write some tests.

--Torsten


From kellert at ohsu.edu  Thu Jun  8 20:29:10 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 17:29:10 -0700
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>

Greetings,
Is fink still a reasonable way to install and maintain bioperl?  
(There's been some emails about instability.) How 'bout upgrades: the  
way I have fink installed it's path is first when perl reads @INC. So  
if I put a newer Bio::something in /usr/local/whereever it won't be  
seen if an older module is in the fink path.  Can I upgrade in the  
fink "space" without messing up fink's database? Other options?

Thanks,
Tom K


Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From hlapp at gmx.net  Thu Jun  8 21:19:28 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 8 Jun 2006 21:19:28 -0400
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
	<1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
Message-ID: <060FC8CE-FD89-436E-B79C-135BB4F324CD@gmx.net>

Why don't you remove the fink bioperl package if you want to install  
a newer version locally?

BTW unless you use a custom-compiled perl your packages will end up  
in /Library/Perl/5.8.6/ (or /System/Library/Perl/5.8.6/), not /usr/ 
local, when you issue 'make install'.

	-hilmar

On Jun 8, 2006, at 8:29 PM, Thomas J Keller wrote:

> Greetings,
> Is fink still a reasonable way to install and maintain bioperl?
> (There's been some emails about instability.) How 'bout upgrades: the
> way I have fink installed it's path is first when perl reads @INC. So
> if I put a newer Bio::something in /usr/local/whereever it won't be
> seen if an older module is in the fink path.  Can I upgrade in the
> fink "space" without messing up fink's database? Other options?
>
> Thanks,
> Tom K
>
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Jun  8 22:30:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 21:30:20 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <C0ADF5CF.8C8F%osborne1@optonline.net>
Message-ID: <000c01c68b6c$a8184710$15327e82@pyrimidine>

Yes; using ActiveState's PPM:

ppm> query CLone
Querying target 1 (ActivePerl 5.8.7.815)
  1. Clone [0.20] recursively copy Perl datatypes
ppm>

v. 0.20 is the latest in CPAN.

I can try some additional tests with the relevant modules to see what the
problem is.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Thursday, June 08, 2006 2:42 PM
> To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; bioperl-l
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> with"returnundef"
> 
> Chris,
> 
> Odd. protgraph.t passes all of its tests on my computer. Do you have the
> Clone module installed?
> 
> Brian O.
> 
> 
> On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> > 36-37, 45, 48-56, 59-60, 65-66
> > Failed 22/66 tests, 66.67% okay
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  9 03:35:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 09:35:12 +0200
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <448850CE.1040105@colibase.bham.ac.uk>
References: <448850CE.1040105@colibase.bham.ac.uk>
Message-ID: <200606090935.12758.heikki@sanbi.ac.za>

Roy,

The definitive document describing the locations is the feature table 
definition:

http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#3.5

but you probably know that already.


Two questions come to mind:

1. Can you parse your joint location using bioperl without errors?

2. Is there a practical advantage in including a location which has no 
relevance to the sequence in hand?

I notice that the /partial qualifier is deprecated and the docs suggest using 
</> signs to indicate that the sequence is partial, so I guess what you are 
doing is  correct.

	-Heikki

On Thursday 08 June 2006 18:31, Roy Chaudhuri wrote:
> Hi all.
>
> I've been playing around with a subroutine to truncate a sequence and
> adjust the coordinates of any features that overlap the specified
> region- something that according to the comments in
> Bio::Location::Simple has been abortively worked on in the past.
>
> I've submitted the subroutine as an enhancement in Bugzilla. It's a bit
> hacky but works for what I needed it for. However I'm a bit unsure on
> the best way to deal with split locations where one of the sublocations
> is entirely outside the truncated region. My current method results in
> locations like:
> join(1..500, >1000..>1000)
>
> which is quite ugly and possibly invalid, but kind of makes sense. Does
> anyone know what would be the correct behaviour for this situation?
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
>
> http://xbase.bham.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From heikki at sanbi.ac.za  Fri Jun  9 04:06:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 10:06:30 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <000c01c68b6c$a8184710$15327e82@pyrimidine>
References: <000c01c68b6c$a8184710$15327e82@pyrimidine>
Message-ID: <200606091006.30893.heikki@sanbi.ac.za>

I am using:
   This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
and I have Clone installed, but more than half the tests fail.

Something is badly wrong.


	-Heikki
bala ~/src/bioperl/core> perl -w t/protgraph.t
1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Failed test 10 in t/protgraph.t at line 85
not ok 11
# Test 11 got: '5' (t/protgraph.t at line 86)
#    Expected: '13'
not ok 12
# Failed test 12 in t/protgraph.t at line 94
not ok 13
# Test 13 got: '5' (t/protgraph.t at line 95)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t/protgraph.t at line 113)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t/protgraph.t at line 114)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t/protgraph.t at line 122)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t/protgraph.t at line 150)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t/protgraph.t at line 158)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t/protgraph.t at line 163)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t/protgraph.t at line 187
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t/protgraph.t at line 212)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t/protgraph.t at line 228)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t/protgraph.t at line 229)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t/protgraph.t at line 233)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t/protgraph.t at line 234)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t/protgraph.t at line 248)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t/protgraph.t at line 249)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t/protgraph.t at line 253)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t/protgraph.t at line 254)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t/protgraph.t at line 263)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t/protgraph.t at line 264)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t/protgraph.t at line 275)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t/protgraph.t at line 276)
#    Expected: '5'


On Friday 09 June 2006 04:30, Chris Fields wrote:
> Yes; using ActiveState's PPM:
>
> ppm> query CLone
> Querying target 1 (ActivePerl 5.8.7.815)
>   1. Clone [0.20] recursively copy Perl datatypes
> ppm>
>
> v. 0.20 is the latest in CPAN.
>
> I can try some additional tests with the relevant modules to see what the
> problem is.
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > Sent: Thursday, June 08, 2006 2:42 PM
> > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > with"returnundef"
> >
> > Chris,
> >
> > Odd. protgraph.t passes all of its tests on my computer. Do you have the
> > Clone module installed?
> >
> > Brian O.
> >
> > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > Failed 22/66 tests, 66.67% okay
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 04:08:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 09:08:18 +0100
Subject: [Bioperl-l] undef query_len error
	with	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <44892C72.2040605@mrc-dunn.cam.ac.uk>

Freimuth, Robert wrote:
> Hi,
> 
> I'm trying to use the Bio::Search::Hit::GenericHit
[snip]
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
> 
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
> 
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
> 
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
[snip]
> I looked through the docs to try to find an explanation or some mention
> of how to set query_len, but I didn't find anything.

As Jason asked, why are you essentially recreating the hit object?
The problem you are seeing is that the query length is normally set via 
SearchIO stream via ResultI when it internally creates a new hit object.
When you created your own hit object you didn't supply -query_len as an 
option to new(), nor did you later use the query_length() method to set it.

If you really do need your $generic_hit_obj (instead of just using 
$hit_obj), do $generic_hit_obj->query_length($hit_obj->query_length); 
(Or if you know the length of your query sequence, supply that directly.)

From zhangchnxp at gmail.com  Fri Jun  9 05:05:36 2006
From: zhangchnxp at gmail.com (Zhang chnxp)
Date: Fri, 9 Jun 2006 17:05:36 +0800
Subject: [Bioperl-l] Are there any modules handling the HLA Typing (Sequence
	Based Typing) ?
Message-ID: <4d1768a60606090205m6e360413paf172fa4e731ef2e@mail.gmail.com>

Hi there,
  I have some .abi trace files from an ABI3100 Genetic Analyzer. Are
there any packages handling the typing work of HLA-A, -B, -C, -DRB1,
etc.? Or are there any free softwares solving the ambiguity through
the SBT?

From cain at cshl.edu  Wed Jun  7 19:02:43 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 07 Jun 2006 19:02:43 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"return	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149721363.12513.96.camel@localhost.localdomain>

On Wed, 2006-06-07 at 17:26 -0500, David Messina wrote:
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
That is a problem with the cvs server at SourceForge (where the Sequence
Ontology is hosted).  I changed the module that tries to get that file
(I don't remember off hand what it was).  

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060607/eca6cf35/attachment.bin 

From oldham at ucla.edu  Thu Jun  8 22:07:34 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Thu, 8 Jun 2006 19:07:34 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large file
Message-ID: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>

Dear all,

I am a total Bioperl newbie struggling to accomplish a conceptually simple
task.  I have a single large fasta file containing about 200,000 probe
sequences (from an Affymetrix microarray), each of which looks like this:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC

What I would like to do is extract from this file a subset of ~130,800
probes (both the header and the sequence) and output this subset into a new
fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
("1138_at" is the probe set ID in the header listed above); I have these
8,175 IDs listed in a separate file.  I *think* that I managed to create an
index of all 200,000 probes in the original fasta file using the following
script:

#!/usr/bin/perl -w

 # script 1: create the index

 use Bio::Index::Fasta;
 use strict;
 my $Index_File_Name = shift;
 my $inx = Bio::Index::Fasta->new(
     -filename => $Index_File_Name,
     -write_flag => 1);
 $inx->make_index(@ARGV);

I'm not sure if this is the most sensible approach, and even if it is, I'm
not sure what to do next.  Any help would be greatly appreciated!

Many thanks,
Mike O.


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006


From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 10:52:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 15:52:59 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <44898B4B.8080901@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes
[snip]
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
[snip]
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!

I'd say you're on the right lines. Next, you should continue reading the 
  rest of the synopsis and description in the docs for Bio::Index::Fasta.

Perhaps it's not clear, but you don't need to say 
$inx->make_index(@ARGV); if you've already provided -file to new() and 
are only dealing with one file. You also can't supply -file to new() if 
you want to change the id_parser (which you do, since you need to tell 
it how to detect your probe set ID).

Having indexed your file you can then output the desired sequences, just 
like the foreach loop suggested in the synopsis. (You could have that in 
the same script.)


One thing I'm not clear on is why it needs -write_flag => 1. Why can't 
it index a read-only database? Even when you set -write_flag allowing it 
to work, it doesn't write anything...

From simon.andrews at bbsrc.ac.uk  Fri Jun  9 11:01:05 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 9 Jun 2006 16:01:05 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>

 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Michael Oldham
> Sent: 09 June 2006 03:08
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a 
> single large file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a 
> conceptually simple task.  I have a single large fasta file 
> containing about 200,000 probe sequences (from an Affymetrix 
> microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC

Unfortunately that's not Fasta format (which only has a single header
line starting with a '>'.  I'd imagine that most programs which deal
with fasta which read that entry would see it as two sequences, the
first of which is empty.


> What I would like to do is extract from this file a subset of 
> ~130,800 probes (both the header and the sequence) and output 
> this subset into a new fasta file.  These 130,800 probes 
> correspond to 8,175 probe set IDs ("1138_at" is the probe set 
> ID in the header listed above)

If you're only having to do this once then it should be fairly quick to
knock up a one off script to do this.  Since you've only got 8000ish
probeset ids then you can probably just read those into a hash to start
with then parse through your big sequence file with something like;


#!perl
use warnings;
use strict;

my %probe_ids;

# Add real code here to populate your hash
$probe_ids{1138_at} = 1;
##########################################


open (IN,'your_affy_file.txt') or die "Can't read affy file: $!";

open (OUT,'>','probe_list.txt') or die "Can't write output: $!";

while (<IN>) {

  if (/^>probe/) {
    # This assumes there are always 3 lines per probe entry
    if (exists $probe_ids{(split(/:/))[2]}) {
      print OUT;
      print OUT scalar <IN>;
      print OUT scalar <IN>;
    }
  }
}


From MEC at stowers-institute.org  Fri Jun  9 10:58:22 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 9 Jun 2006 09:58:22 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work: 

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a 
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks 
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this 
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I 
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed 
>to create an
>index of all 200,000 probes in the original fasta file using 
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even 
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From senthil at cdfd.org.in  Fri Jun  9 18:21:11 2006
From: senthil at cdfd.org.in (M Senthil Kumar)
Date: Fri, 9 Jun 2006 15:21:11 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>


On Fri, 9 Jun 2006, simon andrews (BI) wrote:
|
|
|> -----Original Message-----
|> From: bioperl-l-bounces at lists.open-bio.org
|> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
|> Michael Oldham
|> Sent: 09 June 2006 03:08
|> To: bioperl-l at lists.open-bio.org
|> Subject: [Bioperl-l] Output a subset of FASTA data from a
|> single large file
|>
|> Dear all,
|>
|> I am a total Bioperl newbie struggling to accomplish a
|> conceptually simple task.  I have a single large fasta file
|> containing about 200,000 probe sequences (from an Affymetrix
|> microarray), each of which looks like this:
|>
|> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
|> >Antisense;
|> TGGCTCCTGCTGAGGTCCCCTTTCC
|
|Unfortunately that's not Fasta format (which only has a single header
|line starting with a '>'.  I'd imagine that most programs which deal
|with fasta which read that entry would see it as two sequences, the
|first of which is empty.
|

[snipped]

hi,

I think the file is in fasta format and probably you might have seen it
differently because of your mail transport agent.

Senthil


From cjfields at uiuc.edu  Fri Jun  9 13:59:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:18 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>
Message-ID: <002b01c68bee$6e3237e0$15327e82@pyrimidine>

No; I saw the same thing here.  It's not FASTA in the traditional sense:

http://www.bioperl.org/wiki/FASTA_sequence_format

though he did get it to build a database successfully.  Well, 'success' in
the sense that no errors were thrown.  I've learned the absence of error
messages does not necessarily mean that everything went as planned; it
depends on how much error handling has been added to the module by the
submitting author.  

It's possible that the second annotation line was ignored completely.  I
suppose it's also possible that two sequences are entered into the database,
an empty sequence for the first '>' line and the full sequence for the
second.  It's all dependent on how the parser handles this.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of M Senthil Kumar
> Sent: Friday, June 09, 2006 5:21 PM
> To: simon andrews (BI)
> Cc: bioperl-l at lists.open-bio.org; Michael Oldham
> Subject: Re: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> 
> 
> On Fri, 9 Jun 2006, simon andrews (BI) wrote:
> |
> |
> |> -----Original Message-----
> |> From: bioperl-l-bounces at lists.open-bio.org
> |> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> |> Michael Oldham
> |> Sent: 09 June 2006 03:08
> |> To: bioperl-l at lists.open-bio.org
> |> Subject: [Bioperl-l] Output a subset of FASTA data from a
> |> single large file
> |>
> |> Dear all,
> |>
> |> I am a total Bioperl newbie struggling to accomplish a
> |> conceptually simple task.  I have a single large fasta file
> |> containing about 200,000 probe sequences (from an Affymetrix
> |> microarray), each of which looks like this:
> |>
> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> |> >Antisense;
> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> |
> |Unfortunately that's not Fasta format (which only has a single header
> |line starting with a '>'.  I'd imagine that most programs which deal
> |with fasta which read that entry would see it as two sequences, the
> |first of which is empty.
> |
> 
> [snipped]
> 
> hi,
> 
> I think the file is in fasta format and probably you might have seen it
> differently because of your mail transport agent.
> 
> Senthil
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 13:59:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:31 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606091006.30893.heikki@sanbi.ac.za>
Message-ID: <002c01c68bee$76219ef0$15327e82@pyrimidine>

I ran tests this morning on protgraph.t using bioperl-live, Mac OS X (Intel)
running perl 5.8.6 and all tests passed, but I haven't updated from CVS
since June 7th.  The test results are almost exactly alike; most failed
tests are from unexpected results (with exactly the same results for both
OS's).  A few look more serious: test 45 failed on both and tests 10 and 12
failed on linux (the only noticeable difference between the two) 
...

ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
...

The line numbers seem to also be off by one (linux tests seem to have one
extra line); not sure if that means anything.

Here's the full WinXP protgraph.t results:

1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t\protgraph.t at line 112)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t\protgraph.t at line 113)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t\protgraph.t at line 121)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t\protgraph.t at line 149)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t\protgraph.t at line 157)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t\protgraph.t at line 162)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t\protgraph.t at line 186
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t\protgraph.t at line 211)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t\protgraph.t at line 227)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t\protgraph.t at line 228)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t\protgraph.t at line 232)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t\protgraph.t at line 233)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t\protgraph.t at line 247)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t\protgraph.t at line 248)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t\protgraph.t at line 252)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t\protgraph.t at line 253)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t\protgraph.t at line 262)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t\protgraph.t at line 263)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t\protgraph.t at line 274)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t\protgraph.t at line 275)
#    Expected: '5'  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Friday, June 09, 2006 3:07 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields; 'Brian Osborne'
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> I am using:
>    This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
> and I have Clone installed, but more than half the tests fail.
> 
> Something is badly wrong.
> 
> 
> 	-Heikki
> bala ~/src/bioperl/core> perl -w t/protgraph.t
> 1..66
> ok 1
> ok 2
> ok 3
> ok 4
> ok 5
> ok 6
> ok 7
> ok 8
> ok 9
> not ok 10
> # Failed test 10 in t/protgraph.t at line 85
> not ok 11
> # Test 11 got: '5' (t/protgraph.t at line 86)
> #    Expected: '13'
> not ok 12
> # Failed test 12 in t/protgraph.t at line 94
> not ok 13
> # Test 13 got: '5' (t/protgraph.t at line 95)
> #    Expected: '13'
> ok 14
> ok 15
> ok 16
> ok 17
> ok 18
> ok 19
> not ok 20
> # Test 20 got: '0.013' (t/protgraph.t at line 113)
> #    Expected: '0.027'
> .not ok 21
> # Test 21 got: '1' (t/protgraph.t at line 114)
> #    Expected: ''
> ..ok 22
> .ok 23
> ok 24
> ..ok 25
> .not ok 26
> # Test 26 got: '1' (t/protgraph.t at line 122)
> #    Expected: '5'
> ok 27
> ok 28
> ok 29
> ok 30
> ok 31
> ok 32
> not ok 33
> # Test 33 got: '139' (t/protgraph.t at line 150)
> #    Expected: '71'
> ok 34
> ok 35
> not ok 36
> # Test 36 got: '126' (t/protgraph.t at line 158)
> #    Expected: '58'
> .not ok 37
> # Test 37 got: '1' (t/protgraph.t at line 163)
> #    Expected: '15'
> ok 38
> ok 39
> ok 40
> ok 41
> ok 42
> ok 43
> ok 44
> not ok 45
> # Failed test 45 in t/protgraph.t at line 187
> ok 46
> ok 47
> not ok 48
> # Test 48 got: '75' (t/protgraph.t at line 212)
> #    Expected: '72'
> not ok 49
> # Test 49 got: '343' (t/protgraph.t at line 228)
> #    Expected: '72'
> not ok 50
> # Test 50 got: '368' (t/protgraph.t at line 229)
> #    Expected: '74'
> not ok 51
> # Test 51 got: '344' (t/protgraph.t at line 233)
> #    Expected: '73'
> not ok 52
> # Test 52 got: '368' (t/protgraph.t at line 234)
> #    Expected: '74'
> not ok 53
> # Test 53 got: '432' (t/protgraph.t at line 248)
> #    Expected: '72'
> not ok 54
> # Test 54 got: '461' (t/protgraph.t at line 249)
> #    Expected: '74'
> not ok 55
> # Test 55 got: '434' (t/protgraph.t at line 253)
> #    Expected: '74'
> not ok 56
> # Test 56 got: '463' (t/protgraph.t at line 254)
> #    Expected: '76'
> ok 57
> ok 58
> not ok 59
> # Test 59 got: '437' (t/protgraph.t at line 263)
> #    Expected: '3'
> not ok 60
> # Test 60 got: '467' (t/protgraph.t at line 264)
> #    Expected: '4'
> ok 61
> ok 62
> ok 63
> ok 64
> not ok 65
> # Test 65 got: '440' (t/protgraph.t at line 275)
> #    Expected: '3'
> not ok 66
> # Test 66 got: '472' (t/protgraph.t at line 276)
> #    Expected: '5'
> 
> 
> On Friday 09 June 2006 04:30, Chris Fields wrote:
> > Yes; using ActiveState's PPM:
> >
> > ppm> query CLone
> > Querying target 1 (ActivePerl 5.8.7.815)
> >   1. Clone [0.20] recursively copy Perl datatypes
> > ppm>
> >
> > v. 0.20 is the latest in CPAN.
> >
> > I can try some additional tests with the relevant modules to see what
> the
> > problem is.
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > > Sent: Thursday, June 08, 2006 2:42 PM
> > > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > > with"returnundef"
> > >
> > > Chris,
> > >
> > > Odd. protgraph.t passes all of its tests on my computer. Do you have
> the
> > > Clone module installed?
> > >
> > > Brian O.
> > >
> > > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > > Failed 22/66 tests, 66.67% okay
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Fri Jun  9 14:29:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 09 Jun 2006 14:29:53 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002b01c68bee$6e3237e0$15327e82@pyrimidine>
Message-ID: <C0AF3661.CD0A%sdavis2@mail.nih.gov>


On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> No; I saw the same thing here.  It's not FASTA in the traditional sense:
> 
> http://www.bioperl.org/wiki/FASTA_sequence_format
> 
> though he did get it to build a database successfully.  Well, 'success' in
> the sense that no errors were thrown.  I've learned the absence of error
> messages does not necessarily mean that everything went as planned; it
> depends on how much error handling has been added to the module by the
> submitting author.
> 
> It's possible that the second annotation line was ignored completely.  I
> suppose it's also possible that two sequences are entered into the database,
> an empty sequence for the first '>' line and the full sequence for the
> second.  It's all dependent on how the parser handles this.

I think that Senthil was pointing out that even though >Antisense looks to
be on its own line, it isn't, but is simply a continutation of the FASTA
header.  Judging from the context, that is the only interpretation that
makes sense.  

Sean

>> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> |> >Antisense;
>> |> TGGCTCCTGCTGAGGTCCCCTTTCC
>> |
>> |Unfortunately that's not Fasta format (which only has a single header
>> |line starting with a '>'.  I'd imagine that most programs which deal
>> |with fasta which read that entry would see it as two sequences, the
>> |first of which is empty.
>> |
>> 
>> [snipped]
>> 
>> hi,
>> 
>> I think the file is in fasta format and probably you might have seen it
>> differently because of your mail transport agent.


From cjfields at uiuc.edu  Fri Jun  9 15:05:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:05:44 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <002e01c68bf7$b594d210$15327e82@pyrimidine>

There's information in the HOWTOs:

http://www.bioperl.org/wiki/HOWTO:Flat_databases

http://www.bioperl.org/wiki/HOWTO:OBDA

Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
('fasta' format I/O) and this is what I got as output:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;


i.e. an empty sequence, which is what I guessed might happen, though I
thought it might pick up the second '>' and the full sequence there.  Since
the sequence is tossed you'll have to prescreen your sequence input stream
by either concatenating the two '>' lines together or screening for the
relevant information you want to retain.  You can try maybe getting this
info into Bio::Seq objects and writing to a Bio::SeqIO stream (to file or
file handle).

Once you have that set up, the HOWTO tells you how to set up custom or
secondary namespaces, so you can use a regex to parse out the information
for a primary or secondary keys:

http://www.bioperl.org/wiki/HOWTO:Flat_databases#Secondary_or_custom_namespa
ces

then you could select specific sequences this way (per the HOWTO):

$db->secondary_namespaces("GI");
my $acc_seq = $db->get_Seq_by_id("P84139");
my $gi_seq = $db->get_Seq_by_secondary("GI",443893);

or for multiple sequences (judging from the POD):

my $acc_seqio = $db->get_Stream_by_id(@ids);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Oldham
> Sent: Thursday, June 08, 2006 9:08 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes (both the header and the sequence) and output this subset into a
> new
> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> ("1138_at" is the probe set ID in the header listed above); I have these
> 8,175 IDs listed in a separate file.  I *think* that I managed to create
> an
> index of all 200,000 probes in the original fasta file using the following
> script:
> 
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
>  use strict;
>  my $Index_File_Name = shift;
>  my $inx = Bio::Index::Fasta->new(
>      -filename => $Index_File_Name,
>      -write_flag => 1);
>  $inx->make_index(@ARGV);
> 
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!
> 
> Many thanks,
> Mike O.
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 15:49:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:49:51 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <C0AF3661.CD0A%sdavis2@mail.nih.gov>
Message-ID: <002f01c68bfd$e1111e20$15327e82@pyrimidine>

> On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > No; I saw the same thing here.  It's not FASTA in the traditional sense:
> >
> > http://www.bioperl.org/wiki/FASTA_sequence_format
> >
> > though he did get it to build a database successfully.  Well, 'success'
> in
> > the sense that no errors were thrown.  I've learned the absence of error
> > messages does not necessarily mean that everything went as planned; it
> > depends on how much error handling has been added to the module by the
> > submitting author.
> >
> > It's possible that the second annotation line was ignored completely.  I
> > suppose it's also possible that two sequences are entered into the
> database,
> > an empty sequence for the first '>' line and the full sequence for the
> > second.  It's all dependent on how the parser handles this.
> 
> I think that Senthil was pointing out that even though >Antisense looks to
> be on its own line, it isn't, but is simply a continutation of the FASTA
> header.  Judging from the context, that is the only interpretation that
> makes sense.
> 
> Sean

Sorry.  Just checked through another mail client and you're right.  That's
what I get for trusting Mr. Gates (stupid Outlook).  I have seen a few funky
FASTA derivations, so I thought that's what was going on here.  My bad!

My point, though erroneous, was that the fasta format parser may not parse
this data correctly if he did have two description lines, but may not
indicate there are problems by throwing an exception.  I demonstrated that
using Bio::SeqIO as an example (you get empty sequences).  Bio::Index::Fasta
parses the file itself using this loop to index:

	# Main indexing loop
	while (<FASTA>) {
		if (/^>/) {
			# $begin is the position of the first character
after the '>'
			my $begin = tell(FASTA) - length( $_ ) + 1;

			foreach my $id (&$id_parser($_)) {
				$self->add_record($id, $i, $begin);
			}
		}
	}

Which simply looks for '>'.  That's fine for a vast majority of sequences.
I thought it would be nice to have something that's a little more strenuous
in verifying the format rather than trusting it implicitly, maybe by using
an eval{} block to make sure the format is FASTA-like and looks like
DNA/RNA/protein.  

Chris


> >> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >> |> >Antisense;
> >> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> >> |
> >> |Unfortunately that's not Fasta format (which only has a single header
> >> |line starting with a '>'.  I'd imagine that most programs which deal
> >> |with fasta which read that entry would see it as two sequences, the
> >> |first of which is empty.
> >> |
> >>
> >> [snipped]
> >>
> >> hi,
> >>
> >> I think the file is in fasta format and probably you might have seen it
> >> differently because of your mail transport agent.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Fri Jun  9 09:23:21 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 9 Jun 2006 15:23:21 +0200
Subject: [Bioperl-l] SimpleAlign
Message-ID: <716af09c0606090623v37c72bc5r1ddbcb2b8355a4a0@mail.gmail.com>

Hi,

Two queries with respect to SimpleAlign. I am using the following code
based on the POD.

my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
print $out $_ while <$in>;

1) is it possible to set set_displayname_flat() globally without doing
$_->set_displayname_flat() per alignment.

2) My input files have an ID and description line for each seq in the
alignment. When the file is converted I loose the description line. I
know I can get the description of the sequences (e.g.
$aln->get_seq_by_pos(2)->description()).
How could I export the complete fasta defline including the
description (I realize that general clustal format has a limit on the
number of characters, but still).

Regards,
Bernd

From oldham at ucla.edu  Fri Jun  9 21:39:45 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 9 Jun 2006 18:39:45 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>

Thanks to everyone for their helpful advice.  I think I am getting closer,
but no cigar quite yet.  The script below runs quickly with no errors--but
the output file is empty.  It seems that the problem must lie somewhere in
the 'while' loop, and I'm sure it's quite obvious to a more experienced
eye--but not to mine!  Any suggestions?  Thanks again for your help.

--Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID.dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Friday, June 09, 2006 7:58 AM
To: Michael Oldham; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single large
file


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work:

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed
>to create an
>index of all 200,000 probes in the original fasta file using
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: 6/9/2006


From cjfields at uiuc.edu  Sun Jun 11 00:32:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Jun 2006 23:32:04 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <F4E1042A-CE2D-4E51-B711-BDBB6E052FEB@uiuc.edu>

What happens if you just print $idmatch or $1 (i.e. check to see if  
the regex matches anything)?  If there is nothing printed then either  
the regex isn't working as expected or there is something logically  
wrong.  The problem may be that the captured string must match the id  
exactly, the id being the key to the %ID hash; any extra characters  
picked up by the regex outside of your id key and you will not get  
anything.  Looking at Malcolm's regex it should work just fine, but  
we only had one example sequence to try here.

If your while loop is set up like this won't it only print only the  
matched description lines to the outfile (no sequence) even if there  
is a match?  Or is this what you wanted?   If you want the sequence  
you should add 'print OUT <PROBES>;' after the 'print OUT;' line.

Chris

On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:

> Thanks to everyone for their helpful advice.  I think I am getting  
> closer,
> but no cigar quite yet.  The script below runs quickly with no  
> errors--but
> the output file is empty.  It seems that the problem must lie  
> somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more  
> experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>
> --Mike O.
>
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID.dat.txt';
>
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
>
> my $probes = 'HG_U95Av2_probe_fasta.txt';
>
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
> keys=PSIDs and
> all values=1.
>
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;
>
>
> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Friday, June 09, 2006 7:58 AM
> To: Michael Oldham; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
> single large
> file
>
>
>
> I wouldn't bioperl for this, or create an index.  Perl would do  
> fine and
> probably be faster.
>
> Assuming your ids are one per line in a file named id.dat looking like
> this
>
> 1138_at
> 1134_at
> etc..
>
> this should work:
>
> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> mybigfile.fa
>
> good luck
>
> --Malcolm Cook
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Michael Oldham
>> Sent: Thursday, June 08, 2006 9:08 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>> single large file
>>
>> Dear all,
>>
>> I am a total Bioperl newbie struggling to accomplish a
>> conceptually simple
>> task.  I have a single large fasta file containing about 200,000  
>> probe
>> sequences (from an Affymetrix microarray), each of which looks
>> like this:
>>
>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> Antisense;
>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>
>> What I would like to do is extract from this file a subset of  
>> ~130,800
>> probes (both the header and the sequence) and output this
>> subset into a new
>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>> ("1138_at" is the probe set ID in the header listed above); I
>> have these
>> 8,175 IDs listed in a separate file.  I *think* that I managed
>> to create an
>> index of all 200,000 probes in the original fasta file using
>> the following
>> script:
>>
>> #!/usr/bin/perl -w
>>
>> # script 1: create the index
>>
>> use Bio::Index::Fasta;
>> use strict;
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fasta->new(
>>     -filename => $Index_File_Name,
>>     -write_flag => 1);
>> $inx->make_index(@ARGV);
>>
>> I'm not sure if this is the most sensible approach, and even
>> if it is, I'm
>> not sure what to do next.  Any help would be greatly appreciated!
>>
>> Many thanks,
>> Mike O.
>>
>>
>>
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
> 6/8/2006
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
> 6/9/2006
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 04:21:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:21:31 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002e01c68bf7$b594d210$15327e82@pyrimidine>
References: <002e01c68bf7$b594d210$15327e82@pyrimidine>
Message-ID: <448D240B.6040508@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> There's information in the HOWTOs:
> 
> http://www.bioperl.org/wiki/HOWTO:Flat_databases
> 
> http://www.bioperl.org/wiki/HOWTO:OBDA
> 
> Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
> ('fasta' format I/O) and this is what I got as output:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> 
> 
> i.e. an empty sequence, which is what I guessed might happen
[snip]

As you later discovered, that was an Outlook problem. Just to make this 
thread relevant to bioperl, the bioperl solution is:

use Bio::SeqIO;
use Bio::Index::Fasta;
my $inx = Bio::Index::Fasta->new(-write_flag => 1);
$inx->id_parser(\&get_id);
$inx->make_index(shift);

my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
my $wanted_ids_file = shift;
open(IDS, $wanted_ids_file);
while (<IDS>) {
   chomp;
   my $seq = $inx->fetch($_);
   $out->write_seq($seq);
}

sub get_id {
   my $line = shift;
   $line =~ /^>probe:\S+?:(\S+?):/;
   $1;
}

It works for me on the sample sequence given by the OP.

From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 04:49:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:49:49 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <448D2AAD.3030601@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Thanks to everyone for their helpful advice.  I think I am getting closer,
> but no cigar quite yet.  The script below runs quickly with no errors--but
> the output file is empty.  It seems that the problem must lie somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> 
> --Mike O.
> 
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> my $IDs = 'ID.dat.txt';
> 
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
> 
> my $probes = 'HG_U95Av2_probe_fasta.txt';
> 
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
> 
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> 
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
> all values=1.
> 
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;

Not sure why it would print nothing (are the ids in IDFILE the same case 
as the ids in the fasta file, do they only contain word characters?), 
but even if it did you would only be printing out the fasta headers and 
not the sequences. Doing it the bioperl way gives you more flexibility 
in the future; you may want to do something with the sequences after 
printing them out, in which case do it in bioperl using Seq objects and 
skip the intermediate step of printing them.

From MEC at stowers-institute.org  Mon Jun 12 11:28:41 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:28:41 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F98D@exchkc02.stowers-institute.org>

Michael,

I don't think you can call perl's `print` on just a filehandle as you
are doing.  This is probably your problem.

If you call `select OUT` after opeining it, print will print $_ to it.
And, every line in the fasta record whose header matches on of the IDS
will get printed, not just the fasta header lines.  Read the code again
nothing that $idmatch is only getting reset when a correctly formatted
fasta header line is matched.

--Malcolm


>-----Original Message-----
>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>Sent: Saturday, June 10, 2006 11:32 PM
>To: Michael Oldham
>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>What happens if you just print $idmatch or $1 (i.e. check to see if  
>the regex matches anything)?  If there is nothing printed then either  
>the regex isn't working as expected or there is something logically  
>wrong.  The problem may be that the captured string must match the id  
>exactly, the id being the key to the %ID hash; any extra characters  
>picked up by the regex outside of your id key and you will not get  
>anything.  Looking at Malcolm's regex it should work just fine, but  
>we only had one example sequence to try here.
>
>If your while loop is set up like this won't it only print only the  
>matched description lines to the outfile (no sequence) even if there  
>is a match?  Or is this what you wanted?   If you want the sequence  
>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>
>Chris
>
>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>
>> Thanks to everyone for their helpful advice.  I think I am getting  
>> closer,
>> but no cigar quite yet.  The script below runs quickly with no  
>> errors--but
>> the output file is empty.  It seems that the problem must lie  
>> somewhere in
>> the 'while' loop, and I'm sure it's quite obvious to a more  
>> experienced
>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>
>> --Mike O.
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>>
>> my $IDs = 'ID.dat.txt';
>>
>> unless (open(IDFILE, $IDs)) {
>> 	print "Could not open file $IDs!\n";
>> 	}
>>
>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>
>> unless (open(PROBES, $probes)) {
>> 	print "Could not open file $probes!\n";
>> 	}
>>
>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>
>> my @ID = <IDFILE>;
>> chomp @ID;
>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>> keys=PSIDs and
>> all values=1.
>>
>> 	while (<PROBES>) {
>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>> 		if ($idmatch){
>> 			print OUT;
>> 		}
>> 	}
>> exit;
>>
>>
>> -----Original Message-----
>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>> Sent: Friday, June 09, 2006 7:58 AM
>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>> single large
>> file
>>
>>
>>
>> I wouldn't bioperl for this, or create an index.  Perl would do  
>> fine and
>> probably be faster.
>>
>> Assuming your ids are one per line in a file named id.dat 
>looking like
>> this
>>
>> 1138_at
>> 1134_at
>> etc..
>>
>> this should work:
>>
>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>> mybigfile.fa
>>
>> good luck
>>
>> --Malcolm Cook
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Michael Oldham
>>> Sent: Thursday, June 08, 2006 9:08 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>> single large file
>>>
>>> Dear all,
>>>
>>> I am a total Bioperl newbie struggling to accomplish a
>>> conceptually simple
>>> task.  I have a single large fasta file containing about 200,000  
>>> probe
>>> sequences (from an Affymetrix microarray), each of which looks
>>> like this:
>>>
>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>> Antisense;
>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>
>>> What I would like to do is extract from this file a subset of  
>>> ~130,800
>>> probes (both the header and the sequence) and output this
>>> subset into a new
>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>> ("1138_at" is the probe set ID in the header listed above); I
>>> have these
>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>> to create an
>>> index of all 200,000 probes in the original fasta file using
>>> the following
>>> script:
>>>
>>> #!/usr/bin/perl -w
>>>
>>> # script 1: create the index
>>>
>>> use Bio::Index::Fasta;
>>> use strict;
>>> my $Index_File_Name = shift;
>>> my $inx = Bio::Index::Fasta->new(
>>>     -filename => $Index_File_Name,
>>>     -write_flag => 1);
>>> $inx->make_index(@ARGV);
>>>
>>> I'm not sure if this is the most sensible approach, and even
>>> if it is, I'm
>>> not sure what to do next.  Any help would be greatly appreciated!
>>>
>>> Many thanks,
>>> Mike O.
>>>
>>>
>>>
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>> 6/9/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>Christopher Fields
>Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>Dept of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>


From MEC at stowers-institute.org  Mon Jun 12 11:47:09 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:47:09 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F991@exchkc02.stowers-institute.org>

ooops, in my message 


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From MEC at stowers-institute.org  Mon Jun 12 11:48:02 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:48:02 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>

oops,

s/matches on of/matches one of/
s/nothing that/noting that/ 

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hubert.prielinger at gmx.at  Mon Jun 12 14:29:19 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 12 Jun 2006 12:29:19 -0600
Subject: [Bioperl-l] How to use gi2taxonid
Message-ID: <448DB27F.6090107@gmx.at>

hi,
I have downloaded the gi2taxonid file to get the taxonid for a GI number 
taken from a report as recommended here, but I don't know how to use the 
gi2taxonid file.
Jason wrote in a previous post that you have to make a DB_File out of 
it, but I don't know how....and finally tie it to a hash....
Can anybody give me a hint how to use it..... my final goal is to get 
the taxonomy.

thanks
Hubert


From cjfields at uiuc.edu  Mon Jun 12 15:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 14:13:30 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <000f01c68e54$4d155ac0$15327e82@pyrimidine>

Michael, Malcolm et al,

I ran Michael's code (not Malcolm's one-liner), with and w/o adding the file
handle line that I suggested.  My suggestion works b/c I'm calling the file
handle in scalar context, which reads the next line, just like '$foo =
<FILE>' or 'while(<FILE>) {}' advances to the next line (with $/ = "\n")
each time the file handle is called.  You could use:

$_ = <PROBES>;
print OUT;

I just chopped it down to one line.

Without the extra line I suggested I get only the description line (I used
this as a test file based on the original sequence and Michael's description
of the ID):

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;

Which I don't think Michael wants (he mentioned sequence and description, I
think).  

Modifying the loop in Michael's code to:
...

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT <PROBES>; # grabs next line and prints
	}
}

Gets:

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
AGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTATCC
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
TGGATCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;
TGGCTACTGCTGAGGTCCCCTTTCC

Which matches the ID's in the ID file (there are 10 sequences in the probes
file).  

I did notice one odd thing; I tried the above code on Mac OS X and it worked
fine (i.e. printed only the descriptions and sequences for the ID's in the
ID hash).  If I used Windows, I needed to use this version:

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT scalar(<PROBES>);		
	}
}

Or 'print <PROBES>;' prints all sequences (I guess it assumes list context
instead of scalar context when printing, so this forces it to be scalar).

Like I said, I haven't tried Malcolm's one-liner.  It's possible that it
works just as well as what I suggested.  I'm just responding to Michael's
code request.

Chris


> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 10:48 AM
> To: Cook, Malcolm; Chris Fields; Michael Oldham
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> oops,
> 
> s/matches on of/matches one of/
> s/nothing that/noting that/
> 
> --Malcolm
> 
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >Cook, Malcolm
> >Sent: Monday, June 12, 2006 10:29 AM
> >To: Chris Fields; Michael Oldham
> >Cc: bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >Michael,
> >
> >I don't think you can call perl's `print` on just a filehandle as you
> >are doing.  This is probably your problem.
> >
> >If you call `select OUT` after opeining it, print will print $_ to it.
> >And, every line in the fasta record whose header matches on of the IDS
> >will get printed, not just the fasta header lines.  Read the code again
> >nothing that $idmatch is only getting reset when a correctly formatted
> >fasta header line is matched.
> >
> >--Malcolm
> >
> >
> >>-----Original Message-----
> >>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>Sent: Saturday, June 10, 2006 11:32 PM
> >>To: Michael Oldham
> >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
> >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >>single large file
> >>
> >>What happens if you just print $idmatch or $1 (i.e. check to see if
> >>the regex matches anything)?  If there is nothing printed
> >then either
> >>the regex isn't working as expected or there is something logically
> >>wrong.  The problem may be that the captured string must
> >match the id
> >>exactly, the id being the key to the %ID hash; any extra characters
> >>picked up by the regex outside of your id key and you will not get
> >>anything.  Looking at Malcolm's regex it should work just fine, but
> >>we only had one example sequence to try here.
> >>
> >>If your while loop is set up like this won't it only print only the
> >>matched description lines to the outfile (no sequence) even if there
> >>is a match?  Or is this what you wanted?   If you want the sequence
> >>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
> >>
> >>Chris
> >>
> >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
> >>
> >>> Thanks to everyone for their helpful advice.  I think I am getting
> >>> closer,
> >>> but no cigar quite yet.  The script below runs quickly with no
> >>> errors--but
> >>> the output file is empty.  It seems that the problem must lie
> >>> somewhere in
> >>> the 'while' loop, and I'm sure it's quite obvious to a more
> >>> experienced
> >>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> >>>
> >>> --Mike O.
> >>>
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> use strict;
> >>>
> >>> my $IDs = 'ID.dat.txt';
> >>>
> >>> unless (open(IDFILE, $IDs)) {
> >>> 	print "Could not open file $IDs!\n";
> >>> 	}
> >>>
> >>> my $probes = 'HG_U95Av2_probe_fasta.txt';
> >>>
> >>> unless (open(PROBES, $probes)) {
> >>> 	print "Could not open file $probes!\n";
> >>> 	}
> >>>
> >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> >>>
> >>> my @ID = <IDFILE>;
> >>> chomp @ID;
> >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
> >>> keys=PSIDs and
> >>> all values=1.
> >>>
> >>> 	while (<PROBES>) {
> >>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> >>> 		if ($idmatch){
> >>> 			print OUT;
> >>> 		}
> >>> 	}
> >>> exit;
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> >>> Sent: Friday, June 09, 2006 7:58 AM
> >>> To: Michael Oldham; bioperl-l at lists.open-bio.org
> >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
> >>> single large
> >>> file
> >>>
> >>>
> >>>
> >>> I wouldn't bioperl for this, or create an index.  Perl would do
> >>> fine and
> >>> probably be faster.
> >>>
> >>> Assuming your ids are one per line in a file named id.dat
> >>looking like
> >>> this
> >>>
> >>> 1138_at
> >>> 1134_at
> >>> etc..
> >>>
> >>> this should work:
> >>>
> >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> >>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> >>> mybigfile.fa
> >>>
> >>> good luck
> >>>
> >>> --Malcolm Cook
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>>> Michael Oldham
> >>>> Sent: Thursday, June 08, 2006 9:08 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
> >>>> single large file
> >>>>
> >>>> Dear all,
> >>>>
> >>>> I am a total Bioperl newbie struggling to accomplish a
> >>>> conceptually simple
> >>>> task.  I have a single large fasta file containing about 200,000
> >>>> probe
> >>>> sequences (from an Affymetrix microarray), each of which looks
> >>>> like this:
> >>>>
> >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >>>> Antisense;
> >>>> TGGCTCCTGCTGAGGTCCCCTTTCC
> >>>>
> >>>> What I would like to do is extract from this file a subset of
> >>>> ~130,800
> >>>> probes (both the header and the sequence) and output this
> >>>> subset into a new
> >>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> >>>> ("1138_at" is the probe set ID in the header listed above); I
> >>>> have these
> >>>> 8,175 IDs listed in a separate file.  I *think* that I managed
> >>>> to create an
> >>>> index of all 200,000 probes in the original fasta file using
> >>>> the following
> >>>> script:
> >>>>
> >>>> #!/usr/bin/perl -w
> >>>>
> >>>> # script 1: create the index
> >>>>
> >>>> use Bio::Index::Fasta;
> >>>> use strict;
> >>>> my $Index_File_Name = shift;
> >>>> my $inx = Bio::Index::Fasta->new(
> >>>>     -filename => $Index_File_Name,
> >>>>     -write_flag => 1);
> >>>> $inx->make_index(@ARGV);
> >>>>
> >>>> I'm not sure if this is the most sensible approach, and even
> >>>> if it is, I'm
> >>>> not sure what to do next.  Any help would be greatly appreciated!
> >>>>
> >>>> Many thanks,
> >>>> Mike O.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> No virus found in this outgoing message.
> >>>> Checked by AVG Free Edition.
> >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>>> 6/8/2006
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>> --
> >>> No virus found in this incoming message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>> 6/8/2006
> >>>
> >>> --
> >>> No virus found in this outgoing message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
> >>> 6/9/2006
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>Christopher Fields
> >>Postdoctoral Researcher
> >>Lab of Dr. Robert Switzer
> >>Dept of Biochemistry
> >>University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From hlapp at gmx.net  Mon Jun 12 16:06:23 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 12 Jun 2006 16:06:23 -0400
Subject: [Bioperl-l] How to use gi2taxonid
In-Reply-To: <448DB27F.6090107@gmx.at>
References: <448DB27F.6090107@gmx.at>
Message-ID: <878FB829-AD31-457D-957E-210448D7F6F5@gmx.net>

Thought about typing

	$ perldoc DB_File

at the command line?

Hubert, are you trying to outsource what should be your own work to  
the bioperl list, or what motivates you to waste everybody's time? If  
you google 'how to ask good questions' this (indeed frequently cited,  
also on the bioperl list if you had paid attention) comes up as the  
first link:

http://www.catb.org/~esr/faqs/smart-questions.html

There's nothing I can add, except to read it in full before your next  
posting or you may reach the point fast at which nobody will bother  
to respond to you and do your homework for you.

On Jun 12, 2006, at 2:29 PM, Hubert Prielinger wrote:

> hi,
> I have downloaded the gi2taxonid file to get the taxonid for a GI  
> number
> taken from a report as recommended here, but I don't know how to  
> use the
> gi2taxonid file.
> Jason wrote in a previous post that you have to make a DB_File out of
> it, but I don't know how....and finally tie it to a hash....
> Can anybody give me a hint how to use it..... my final goal is to get
> the taxonomy.
>
> thanks
> Hubert
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon Jun 12 16:35:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 15:35:10 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <448D240B.6040508@mrc-dunn.cam.ac.uk>
Message-ID: <001201c68e5f$b34ec8c0$15327e82@pyrimidine>

...
> Chris Fields wrote:
> > There's information in the HOWTOs:
> >
> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >
> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >
...
> As you later discovered, that was an Outlook problem. Just to make this
> thread relevant to bioperl, the bioperl solution is:

Agreed (stupid Outlook).  It might be much faster to use non-Bioperl-ish
ways, but it is easier to further manipulate sequences (convert format,
analyze sequences, etc) using Bioperl directly.  I haven't used flat
databases much but it should move very quickly, even in an OO environment.

The one problem with the proposed non-bioperl method is, if you wanted
100,000 sequences (based on ID's) in a FASTA database file containing
200,000 sequences, all ID's would need to be stored (1) in an array (which
gulped the data from the ID file) and then map the ID's to (2) a hash;
that's may be a pretty big memory footprint depending on your system.  

Sendu's BioPerl version indexes the FASTA file based on the ID, then (1)
reads the ID's in one at a time from the file, (2) retrieves the data, then
(3) prints it out.   The advantage of this approach is that the built index
can be used in other bioperl scripts as well w/o having to rebuild it again,
so if you wanted a different set of ID's later on you can access the
database using the prebuilt index.  More can be found in the
Bio::Index::Fasta POD.  

You can also use the ideas and code in the HOWTO (Flat Databases) I
mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
advantage of these is that you can use Sleepycat's Berkeley Database through
the Perl BerkeleyDB module (more functionality than DB_File) which is faster
than a standard flat database.  In the HOWTO, specifically look under
'Secondary or custom namespaces' for ideas on how to use your ID as a
primary or secondary key.

Chris

> use Bio::SeqIO;
> use Bio::Index::Fasta;
> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> $inx->id_parser(\&get_id);
> $inx->make_index(shift);
> 
> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> my $wanted_ids_file = shift;
> open(IDS, $wanted_ids_file);
> while (<IDS>) {
>    chomp;
>    my $seq = $inx->fetch($_);
>    $out->write_seq($seq);
> }
> 
> sub get_id {
>    my $line = shift;
>    $line =~ /^>probe:\S+?:(\S+?):/;
>    $1;
> }
> 
> It works for me on the sample sequence given by the OP.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Mon Jun 12 16:23:45 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 12 Jun 2006 16:23:45 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>

I'm trying to install the bioperl-run package and an getting errors from
make test regarding PAML:

t/PAML....................ok 2/18Can't call method "get_MLmatrix" on an
undefined value at t/PAML.t line 85, <GEN2> line 85.
t/PAML....................dubious
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Is this a legitimate error or am I missing something?

Ryan


From MEC at stowers-institute.org  Mon Jun 12 17:15:35 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 16:15:35 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>

Yeah, good points...

... my recommendation of the one-liner was motivated based on a small
number of IDs and no other applications needing to index the entire
fasta database.


--Malcolm [At which point he bowed out of this fray]

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>Sent: Monday, June 12, 2006 3:35 PM
>To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>...
>> Chris Fields wrote:
>> > There's information in the HOWTOs:
>> >
>> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
>> >
>> > http://www.bioperl.org/wiki/HOWTO:OBDA
>> >
>...
>> As you later discovered, that was an Outlook problem. Just 
>to make this
>> thread relevant to bioperl, the bioperl solution is:
>
>Agreed (stupid Outlook).  It might be much faster to use 
>non-Bioperl-ish
>ways, but it is easier to further manipulate sequences (convert format,
>analyze sequences, etc) using Bioperl directly.  I haven't used flat
>databases much but it should move very quickly, even in an OO 
>environment.
>
>The one problem with the proposed non-bioperl method is, if you wanted
>100,000 sequences (based on ID's) in a FASTA database file containing
>200,000 sequences, all ID's would need to be stored (1) in an 
>array (which
>gulped the data from the ID file) and then map the ID's to (2) a hash;
>that's may be a pretty big memory footprint depending on your system.  
>
>Sendu's BioPerl version indexes the FASTA file based on the 
>ID, then (1)
>reads the ID's in one at a time from the file, (2) retrieves 
>the data, then
>(3) prints it out.   The advantage of this approach is that 
>the built index
>can be used in other bioperl scripts as well w/o having to 
>rebuild it again,
>so if you wanted a different set of ID's later on you can access the
>database using the prebuilt index.  More can be found in the
>Bio::Index::Fasta POD.  
>
>You can also use the ideas and code in the HOWTO (Flat Databases) I
>mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
>advantage of these is that you can use Sleepycat's Berkeley 
>Database through
>the Perl BerkeleyDB module (more functionality than DB_File) 
>which is faster
>than a standard flat database.  In the HOWTO, specifically look under
>'Secondary or custom namespaces' for ideas on how to use your ID as a
>primary or secondary key.
>
>Chris
>
>> use Bio::SeqIO;
>> use Bio::Index::Fasta;
>> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
>> $inx->id_parser(\&get_id);
>> $inx->make_index(shift);
>> 
>> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
>> my $wanted_ids_file = shift;
>> open(IDS, $wanted_ids_file);
>> while (<IDS>) {
>>    chomp;
>>    my $seq = $inx->fetch($_);
>>    $out->write_seq($seq);
>> }
>> 
>> sub get_id {
>>    my $line = shift;
>>    $line =~ /^>probe:\S+?:(\S+?):/;
>>    $1;
>> }
>> 
>> It works for me on the sample sequence given by the OP.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Jun 12 17:20:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 16:20:55 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>
Message-ID: <001601c68e66$17b760a0$15327e82@pyrimidine>

Sorry Malcolm.  I didn't want to imply that your way or the bioperl way was
best, just point out advantages/disadvantages.  

Oops, didn't point out the possible Bioperl disadvantage (too many objects
generated = slow slow slow).  

Chris

> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 4:16 PM
> To: Chris Fields; Sendu Bala; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> Yeah, good points...
> 
> ... my recommendation of the one-liner was motivated based on a small
> number of IDs and no other applications needing to index the entire
> fasta database.
> 
> 
> --Malcolm [At which point he bowed out of this fray]
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Monday, June 12, 2006 3:35 PM
> >To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >...
> >> Chris Fields wrote:
> >> > There's information in the HOWTOs:
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >> >
> >...
> >> As you later discovered, that was an Outlook problem. Just
> >to make this
> >> thread relevant to bioperl, the bioperl solution is:
> >
> >Agreed (stupid Outlook).  It might be much faster to use
> >non-Bioperl-ish
> >ways, but it is easier to further manipulate sequences (convert format,
> >analyze sequences, etc) using Bioperl directly.  I haven't used flat
> >databases much but it should move very quickly, even in an OO
> >environment.
> >
> >The one problem with the proposed non-bioperl method is, if you wanted
> >100,000 sequences (based on ID's) in a FASTA database file containing
> >200,000 sequences, all ID's would need to be stored (1) in an
> >array (which
> >gulped the data from the ID file) and then map the ID's to (2) a hash;
> >that's may be a pretty big memory footprint depending on your system.
> >
> >Sendu's BioPerl version indexes the FASTA file based on the
> >ID, then (1)
> >reads the ID's in one at a time from the file, (2) retrieves
> >the data, then
> >(3) prints it out.   The advantage of this approach is that
> >the built index
> >can be used in other bioperl scripts as well w/o having to
> >rebuild it again,
> >so if you wanted a different set of ID's later on you can access the
> >database using the prebuilt index.  More can be found in the
> >Bio::Index::Fasta POD.
> >
> >You can also use the ideas and code in the HOWTO (Flat Databases) I
> >mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
> >advantage of these is that you can use Sleepycat's Berkeley
> >Database through
> >the Perl BerkeleyDB module (more functionality than DB_File)
> >which is faster
> >than a standard flat database.  In the HOWTO, specifically look under
> >'Secondary or custom namespaces' for ideas on how to use your ID as a
> >primary or secondary key.
> >
> >Chris
> >
> >> use Bio::SeqIO;
> >> use Bio::Index::Fasta;
> >> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> >> $inx->id_parser(\&get_id);
> >> $inx->make_index(shift);
> >>
> >> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> >> my $wanted_ids_file = shift;
> >> open(IDS, $wanted_ids_file);
> >> while (<IDS>) {
> >>    chomp;
> >>    my $seq = $inx->fetch($_);
> >>    $out->write_seq($seq);
> >> }
> >>
> >> sub get_id {
> >>    my $line = shift;
> >>    $line =~ /^>probe:\S+?:(\S+?):/;
> >>    $1;
> >> }
> >>
> >> It works for me on the sample sequence given by the OP.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From roy at colibase.bham.ac.uk  Mon Jun 12 11:46:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 12 Jun 2006 16:46:49 +0100
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <200606090935.12758.heikki@sanbi.ac.za>
References: <448850CE.1040105@colibase.bham.ac.uk>
	<200606090935.12758.heikki@sanbi.ac.za>
Message-ID: <448D8C69.4030005@colibase.bham.ac.uk>

Hi Heikki.

> Two questions come to mind:
> 
> 1. Can you parse your joint location using bioperl without errors?
Seems to work fine as far as I can tell (no errors, and to_FTstring 
reproduces the location as expected).

> 2. Is there a practical advantage in including a location which has no 
> relevance to the sequence in hand?
I think it would be misleading to imply that a location was complete 
when it is only a part of the originally annotated feature. From the FT 
definition the other possibility would be to include the missing parts 
of the feature as remote locations, I guess that may be more satisfactory.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk

From colin.erdman at du.edu  Mon Jun 12 15:52:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Mon, 12 Jun 2006 13:52:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
Message-ID: <1150141965.2992.17.camel@localhost.localdomain>

Hello all,

I am doing a project relating to some forensic analysis of mitochondrial
DNA. 

I would like to write a script that will take a reference sequence, in
this case the Anderson sequence which is the standard mitochondrial
sequence which sample sequences are compared to, and compare it to an
unknown sequence.

I have been using this script:

use Bio::SearchIO;
use strict;
my $fh;
my @nomatches;
open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p blastn |") || die $!;

my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);

if( my $result = $parser->next_result ) { 
     if( my $hit = $result->next_hit ) {   
     if( my $hsp = $hit->next_hsp ) { 
         my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
	 my ( @hitbases) = $hsp->hit_string;
	 my ( @querybases) = $hsp->query_string;
	 my $seq_string = join("", at querybases);
	 my $seq_string1 = join("", at hitbases);
         for my $base (  @qmismatches ) {
            print "base $base of the hit sequence is a mismatch: ";
	    print substr $seq_string, $base-1, 1;
	    print "->";
            print substr $seq_string1, $base-1, 1;
            print "\n";
        }
	
     }
     }
}


The problem is, that some mitochondrial sequences from individuals have
insertions, deletion etc, that cause them to be offset from the
reference sequence, this then offsets the numbering system.

To provide an example:

>Anderson Reference Sequence|HV2
ATTTGGT...
1234567

>Sample|HV2....
ATTTG|C|GT
12345,5.1,67

The |C| denote an insertion, and traditionally in the forensics community
this would be called position 5.1G, but the program reads it as position 6.

So basically I need to figure out how to modify a perl script in order to recognize 
that 5.1G is an insertion, and that it is not position 6, position 6 is actually 
the G to the right of it, followed by position 7-T.

Any ideas and suggestions would be greatly helpful, I know this could be very tricky,
or very easy - I just have come to the point where the idea flow has stopped and would 
love to gather some outside input.

Thanks
Colin Erdman
colin.erdman at du.edu
Undergraduate Research Associate
Institute For Forensic Genetic
University of Denver 


From jason at bioperl.org  Tue Jun 13 10:19:04 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 10:19:04 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
Message-ID: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>

The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors  
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"  
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason at bioperl.org  Tue Jun 13 11:45:27 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 11:45:27 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
	<B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <F802F582-28E4-4761-873C-2A49A60B3593@bioperl.org>

And just to say - codeml 3.15 parsing does work - yn00 parsing just  
hasn't been updated.   I agree that it is bad the test is failing but  
it is dependent on the version that is installed and we should put  
some sort of detect version-skip test code in there so it doesn't  
cause the tests to fail.  Just need more hands on deck tracking these  
sort of things....

-jason
On Jun 13, 2006, at 10:19 AM, Jason Stajich wrote:

> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start  
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
>
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
>
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
>
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
>
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>>
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>>
>> Is this a legitimate error or am I missing something?
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Tue Jun 13 12:04:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 12:04:46 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001001c68f03$17429070$e6028a0a@GOLHARMOBILE1>

I'll take a look at it and see what I can do.  While I'm at it,
bioperl-run tests a module called Coil, but I don't have that installed.
The documentation doesn't specify where I can get this application.
Does anyone know where Coil comes from?


-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Kevin.M.Brown at asu.edu  Tue Jun 13 13:42:40 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 13 Jun 2006 10:42:40 -0700
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?


From aaron.j.mackey at gsk.com  Tue Jun 13 08:19:11 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 13 Jun 2006 08:19:11 -0400
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <1150141965.2992.17.camel@localhost.localdomain>
Message-ID: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>

See Bio::LocatableSeq

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:

> Hello all,
> 
> I am doing a project relating to some forensic analysis of mitochondrial
> DNA. 
> 
> I would like to write a script that will take a reference sequence, in
> this case the Anderson sequence which is the standard mitochondrial
> sequence which sample sequences are compared to, and compare it to an
> unknown sequence.
> 
> I have been using this script:
> 
> use Bio::SearchIO;
> use strict;
> my $fh;
> my @nomatches;
> open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> blastn |") || die $!;
> 
> my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> 
> if( my $result = $parser->next_result ) { 
>      if( my $hit = $result->next_hit ) { 
>      if( my $hsp = $hit->next_hsp ) { 
>          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
>     my ( @hitbases) = $hsp->hit_string;
>     my ( @querybases) = $hsp->query_string;
>     my $seq_string = join("", at querybases);
>     my $seq_string1 = join("", at hitbases);
>          for my $base (  @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch: ";
>        print substr $seq_string, $base-1, 1;
>        print "->";
>             print substr $seq_string1, $base-1, 1;
>             print "\n";
>         }
> 
>      }
>      }
> }
> 
> 
> The problem is, that some mitochondrial sequences from individuals have
> insertions, deletion etc, that cause them to be offset from the
> reference sequence, this then offsets the numbering system.
> 
> To provide an example:
> 
> >Anderson Reference Sequence|HV2
> ATTTGGT...
> 1234567
> 
> >Sample|HV2....
> ATTTG|C|GT
> 12345,5.1,67
> 
> The |C| denote an insertion, and traditionally in the forensics 
community
> this would be called position 5.1G, but the program reads it as position 
6.
> 
> So basically I need to figure out how to modify a perl script in 
> order to recognize 
> that 5.1G is an insertion, and that it is not position 6, position 6
> is actually 
> the G to the right of it, followed by position 7-T.
> 
> Any ideas and suggestions would be greatly helpful, I know this 
> could be very tricky,
> or very easy - I just have come to the point where the idea flow has
> stopped and would 
> love to gather some outside input.
> 
> Thanks
> Colin Erdman
> colin.erdman at du.edu
> Undergraduate Research Associate
> Institute For Forensic Genetic
> University of Denver 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From colin.erdman at du.edu  Tue Jun 13 11:12:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 09:12:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150211566.7034.1.camel@localhost.localdomain>

I could see how this will help... but I am not sure how to implement it
in my situation, I am not very familiar with the Bio::Range or
Bio::Location modules...

Thanks very much,
Colin E.

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From colin.erdman at du.edu  Tue Jun 13 12:05:30 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 10:05:30 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150214730.12044.2.camel@localhost.localdomain>

I actually have found EMBOSS DiffSeq to work quite well for detecting
the insertions and SNPs in the "sample sequence" as compared to the
"reference sequence". 

If I get this all figured out and integrated I will post a method, I
imagine this would prove useful to others as well.

Thanks all,
Colin

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From golharam at umdnj.edu  Tue Jun 13 14:59:59 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:59:59 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <002301c68f1b$917b8c80$e6028a0a@GOLHARMOBILE1>

Nevermind - don't check it in yet.  There are still some other problems
not being picked up by the test suite.  I'll work on that and add to the
test suite.  Jason, I'll send you everything once I have it complete.


-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, June 13, 2006 2:34 PM
To: 'Jason Stajich'
Cc: 'bioperl-l at bioperl.org'
Subject: RE: [Bioperl-l] Test errors in bioperl-run


It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors 
> from make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix" on 
> an undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Jonathan_Epstein at nih.gov  Tue Jun 13 14:21:00 2006
From: Jonathan_Epstein at nih.gov (Jonathan_Epstein at nih.gov)
Date: Tue, 13 Jun 2006 14:21:00 -0400
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <0J0T001LE9O5M6@lswsmta04.nmcc.sprintspectrum.com>

sounds like a job for MUMMER (from Steven Salzberg's group).

Jonathan Epstein 

----------- 
Sent from my Treo

-----Original Message-----

From:  "Kevin Brown" <Kevin.M.Brown at asu.edu>
Subj:  [Bioperl-l] Blast or blat against custom db?
Date:  Tue Jun 13, 2006 2:17 pm
Size:  1K
To:  <bioperl-l at lists.open-bio.org>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

--- message truncated ---


From golharam at umdnj.edu  Tue Jun 13 14:34:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:34:00 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>

It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAML.pm
Type: application/octet-stream
Size: 43262 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060613/566881b4/attachment-0001.obj 

From cjfields at uiuc.edu  Tue Jun 13 21:41:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:41:45 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000601c68f53$b1e4b090$15327e82@pyrimidine>

I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Tue Jun 13 21:42:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:42:25 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000701c68f53$c9addcb0$15327e82@pyrimidine>

Sorry, Brian beat me to it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From osborne1 at optonline.net  Tue Jun 13 21:38:09 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 13 Jun 2006 21:38:09 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <C0B4E0C1.8D74%osborne1@optonline.net>

Checked in.


On 6/13/06 2:34 PM, "Ryan Golhar" <golharam at umdnj.edu> wrote:

> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>> 
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>> 
>> Is this a legitimate error or am I missing something?
>> 
>> Ryan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Tue Jun 13 21:55:49 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 21:55:49 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <000601c68f53$b1e4b090$15327e82@pyrimidine>
Message-ID: <000101c68f55$a9fa8ec0$2f01a8c0@GOLHARMOBILE1>

Okay, that's fine.  It does pass the bioperl-live tests.  When I ran the
bp_pairwise_kaks script, it didn't work, the script doesn't work with
3.15.  It looks like the current test suite is not exhaustive.  

When I looked into the code more so, I see that codeml 3.15 generates
some files slightly different than 3.14 which needs to be accounted for.
I'll work on that and post it here...shouldn't be too long.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Tuesday, June 13, 2006 9:42 PM
To: golharam at umdnj.edu; 'Jason Stajich'
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and 
> the comment sections have been changed slightly.  I've modified 
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from 
> YN00. I've attached it to this message.  It passs all the PAML tests 
> from bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  
> Can you (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code 
> as the output has changed substantially as Yang is now provided 
> several different method's simple Ka and Ks calculations.  Downgrade 
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start 
> parsing for the Pairwise data as well as the function 
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the 
> software packages so I am hopeful that other developers that use our 
> software as do molecular evolutionary studies will get involved to 
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week 
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors 
> > from make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix" on

> > an undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Tue Jun 13 21:10:04 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 14 Jun 2006 09:10:04 +0800
Subject: [Bioperl-l] SimpleAlign /Bio::AlignIO; POD code doesn't work for me
Message-ID: <5b6410e0606131810k495d8f55mc6dc73f0cd5a6df5@mail.gmail.com>

>
> Hi,
>
> Two queries with respect to SimpleAlign. I am using the following code
> based on the POD.
>
> my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
> my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
> print $out $_ while <$in>;
>
> 1) is it possible to set set_displayname_flat() globally without doing
> $_->set_displayname_flat() per alignment.
>
> 2) My input files have an ID and description line for each seq in the
> alignment. When the file is converted I loose the description line. I
> know I can get the description of the sequences (e.g.
> $aln->get_seq_by_pos(2)->description()).
> How could I export the complete fasta defline including the
> description (I realize that general clustal format has a limit on the
> number of characters, but still).
>
> Regards,
> Bernd
> _______________________________________________
>
I might be totally wrong here but what I understand about the FASTA format
is that the first word  (ie no spaces) is the only true name of the seq. So
anything other than the first word is discarded. putting underscores for me
works.

on a sidenote does ur 3rd line work?
it doesn't on my 1.5rc1
I had to add the bold line which was missing in the POD doc.
dont' think it was the use strict pragma
    open MYIN,"<$file" or die "Can't open input alignment";
    open MYOUT, ">$file2" or die "can't write to output";
    my $in  = Bio::AlignIO->newFh(-fh     => \*MYIN,
                               -format => 'fasta');
    my $out = Bio::AlignIO->newFh(-fh     =>  \*MYOUT,
                               -format => 'clustalw');
    print $out $_ while <$in>;

Cheers
kevin

From sb at mrc-dunn.cam.ac.uk  Wed Jun 14 03:49:10 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 14 Jun 2006 08:49:10 +0100
Subject: [Bioperl-l] Blast or blat against custom db?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
Message-ID: <448FBF76.1090505@mrc-dunn.cam.ac.uk>

Kevin Brown wrote:
[snip]
> So, my thought was to utilize something like blastall from within the
> script to find other oligos of similar match, but it means that I need
> to dump out the oligos designed, create the db with formatdb. [snip]
> I'm just trying to figure out how to do it all without leaving the
> script, but as yet haven't noticed a way to create a db from within perl
> using bioperl?
> 
> Any thoughts on directions I should look?

AFAIK there's no bioperl interface onto formatdb, but the way to do it 
is make a fasta file (perhaps using bioperl) with all the oligos (what 
you want to become the db), then use a perl system call (or similar) to 
run formatdb. Still in the same script you'd then run and analyse the 
blast with bioperl calls (presumably starting with StandAloneBlast - 
http://bioperl.org/wiki/HOWTO:Beginners#BLAST if you need it).

Just be sure to carefully craft your blast parameters so they're 
suitable for oligo-sized matches and test the 3' base of hits are identical.

From MEC at stowers-institute.org  Wed Jun 14 09:47:59 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 14 Jun 2006 08:47:59 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>

 
Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT; 

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu] 
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; 
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs 
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret 
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the 
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for 
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175 
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 
>6/13/2006
>
>


From oldham at ucla.edu  Tue Jun 13 22:03:04 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Tue, 13 Jun 2006 19:03:04 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDOEOLCJAA.oldham@ucla.edu>

Dear Malcolm, Chris, et al,

Thanks to everyone for your helpful suggestions.  When I run the code
below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
output file is still blank.  If I replace this list with a single ID
("542_at"), it works:

>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
GCGCAGCAGCGAGAATTTCGACGAG
>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
GAATTTCGACGAGCTGCTGAAGGCA
>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
CGACGAGCTGCTGAAGGCACTGGGT
........etc.

If I try a list of two IDs ("542_at" and "31799_at"), only the last one
is present in the output:

>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; Antisense;
GTTCATCACAAATCTATTGTGCTTG
>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
Antisense;
GTCCACTAAATGTAGTAACGAAATG
>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
Antisense;
TCCACTAAATGTAGTAACGAAATGT
........etc.

The same thing seems to happen if I go to 3 IDs, or 4 IDs (only the last
ID is present in the output file).  At this point I have no idea why
this is happening, and I am not sure how to interpret Malcolm's comment:

oops,

s/matches on of/matches one of/
s/nothing that/noting that/

Any ideas?  Thanks again................!

Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
			print OUT scalar(<PROBES>);
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Monday, June 12, 2006 8:48 AM
To: Cook, Malcolm; Chris Fields; Michael Oldham
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


oops,

s/matches on of/matches one of/
s/nothing that/noting that/

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>the regex matches anything)?  If there is nothing printed
>then either
>>the regex isn't working as expected or there is something logically
>>wrong.  The problem may be that the captured string must
>match the id
>>exactly, the id being the key to the %ID hash; any extra characters
>>picked up by the regex outside of your id key and you will not get
>>anything.  Looking at Malcolm's regex it should work just fine, but
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the
>>matched description lines to the outfile (no sequence) even if there
>>is a match?  Or is this what you wanted?   If you want the sequence
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 6/11/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006


From s_maheshwari84 at rediffmail.com  Thu Jun 15 07:42:24 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 Jun 2006 11:42:24 -0000
Subject: [Bioperl-l] simple problem plz look
Message-ID: <20060615114224.21669.qmail@webmail31.rediffmail.com>

I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI

From rkulasekaran at accelrys.com  Thu Jun 15 08:06:30 2006
From: rkulasekaran at accelrys.com (rkulasekaran at accelrys.com)
Date: Thu, 15 Jun 2006 17:36:30 +0530
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <OF88050CF5.C0508A24-ON6525718E.00425D40-6525718E.00428384@accelrys.com>

Hi,

Can you declare the array ( my @data ) before reading the index.

I guess that will work fine.

- Raja


"saurabh maheshwari" <s_maheshwari84 at rediffmail.com> 
Sent by: bioperl-l-bounces at lists.open-bio.org
15/06/2006 17:12
Please respond to
saurabh maheshwari <s_maheshwari84 at rediffmail.com>


To
bioperl-l at lists.open-bio.org
cc

Subject
[Bioperl-l] simple problem plz look


I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Click on the link below to report this email as spam
https://www.mailcontrol.com/sr/behF6u7j0vHYfoNqVfMn0T6lftsSPmT67PBEri3aA93L4mIZnnEsbOOgcm5LPEUItueIAtlw4aAQAjnhffjwxluskn5SCC6PU4sqvHqdy3UBLnb7IgqQIpogrs47CqHnPsig3hjMwg17c5A4zs49QdfwQIXZ3EkZGQpytOaqXTas8SlXA7tRyL!Oh9pq4bqQJsTF3icLnDHTJZLEigD5cPnlrScQD5EK 


From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 08:52:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 13:52:53 +0100
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
References: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <44915825.8040902@mrc-dunn.cam.ac.uk>

saurabh maheshwari wrote:
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>

I don't think general Perl problems are appropriate for this list.
Try subscribing to the beginners mailing list via http://learn.perl.org/

But in any case, say:
my @data;
$data[0][0] = 'P_p';


From cjfields at uiuc.edu  Thu Jun 15 11:18:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 10:18:32 -0500
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <002001c6908e$f8b11b30$15327e82@pyrimidine>

And exactly how is this applicable to BioPerl?

Start here:

http://learn.perl.org/

My guess: you need to declare 'my @data;' first.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Thursday, June 15, 2006 6:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] simple problem plz look
> 
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>
> 
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 13:42:52 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 10:42:52 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
Message-ID: <44919C1C.1060901@email.arizona.edu>

We are unable to parse BLAST 2.2.14 results from the NCBI website using 
SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
bioperl-live, but when users download either plain text or HTML blast 
outputs from the NCBI page, SearchIO cannot parse them.  This used to 
work prior to BLAST 2.2.14.  Should I try installing the entire 
bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
that makes any difference.)

Thanks,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597

From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 15:00:38 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 20:00:38 +0100
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
	<44919C1C.1060901@email.arizona.edu>
Message-ID: <4491AE56.6090505@mrc-dunn.cam.ac.uk>

Susan J. Miller wrote:
> We are unable to parse BLAST 2.2.14 results from the NCBI website using 
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
> bioperl-live, but when users download either plain text or HTML blast 
> outputs from the NCBI page, SearchIO cannot parse them.  This used to 
> work prior to BLAST 2.2.14.  Should I try installing the entire 
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
> that makes any difference.)

Parsing saved results from the website works fine here. Please be more 
specific in what you mean by 'unable to parse'. What error messages do 
you get? What exact code did you use to get those errors? Exactly what 
input data did you use? Exactly how did you generate that data?

Cheers,
Sendu.

From cjfields at uiuc.edu  Thu Jun 15 17:06:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 16:06:13 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
Message-ID: <002701c690bf$8b732410$15327e82@pyrimidine>

Bio::SearchIO can't handle HTML output directly; you have to junk the tags
first, and we can't really guarantee anymore that will work either (I
haven't tried it).  The FAQ tells you how:

http://www.bioperl.org/wiki/FAQ

I would avoid HTML parsing altogether.  The only sure-fire method that will
always work, according to NCBI, is XML output, and that's parsable using
Bio::SearchIO::blastxml.  You can also try tabular format, which
Bio::SearchIO::blasttable can parse as well.

However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
as well using BLASTP (and that's still set up to parse text output using
SearchIO I believe).  Could you give us an example of the type of BLAST you
were running, the sequence you used, and the error you had?  It could be
program-specific output that may be causing the problems.  The last time
text parsing broke it was changes specifically to only BLASTN/TBLASTX output
or something along those lines.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Susan J. Miller
> Sent: Thursday, June 15, 2006 12:43 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
> 
> We are unable to parse BLAST 2.2.14 results from the NCBI website using
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in
> bioperl-live, but when users download either plain text or HTML blast
> outputs from the NCBI page, SearchIO cannot parse them.  This used to
> work prior to BLAST 2.2.14.  Should I try installing the entire
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if
> that makes any difference.)
> 
> Thanks,
> -susan
> 
> Susan J. Miller
> Biotechnology Computing Facility
> Arizona Research Laboratories
> Bio West 228
> University of Arizona
> Tucson, AZ  85721
> (520) 626-2597
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 17:43:59 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 14:43:59 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <002701c690bf$8b732410$15327e82@pyrimidine>
References: <002701c690bf$8b732410$15327e82@pyrimidine>
Message-ID: <4491D49F.4030208@email.arizona.edu>

Chris Fields wrote:
> 
> However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
> to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
> as well using BLASTP (and that's still set up to parse text output using
> SearchIO I believe).  Could you give us an example of the type of BLAST you
> were running, the sequence you used, and the error you had?  It could be
> program-specific output that may be causing the problems.  The last time
> text parsing broke it was changes specifically to only BLASTN/TBLASTX output
> or something along those lines.

Hi Chris and Sendu,

Thanks for your replies.  I am using blastp from the NCBI BLAST page, 
with this input sequence:

MSRLLWRKVAGATVGPGPVPAPGRWVSSSVPASDPSDGQRRRQQQQQQQQQQQQQPQQPQVLSSEGGQLR
HNPLDIQMLSRGLHEQIFGQGGEMPGEAAVRRSVEHLQKHGLWGQPAVPLPDVELRLPPLYGDNLDQHFR
LLAQKQSLPYLEAANLLLQAQLPPKPPAWAWAEGWTRYGPEGEAVPVAIPEERALVFDVEVCLAEGTCPT
LAVAISPSAWYSWCSQRLVEERYSWTSQLSPADLIPLEVPTGASSPTQRDWQEQLVVGHNVSFDRAHIRE
QYLIQGSRMRFLDTMSMHMAISGLSSFQRSLWIAAKQGKHKVQPPTKQGQKSQRKARRGPAISSWDWLDI

I have tried saving HTML (with and without the graphical overview), 
plain text, and XML.  I am parsing with this script:

#!/usr/local/bin/perl -w

use Bio::SearchIO;

while ($fil = shift(@ARGV)) {

   $srchio = new Bio::SearchIO ('-format' => 'blast', '-file' => $fil);
   while ($result = $srchio->next_result) {

         $db = $result->database_name;
         $alg = $result->algorithm;
         print "DB $db\n ALG $alg\n";

         $qid = $result->query_name;
         print "QRY $qid\n";

         while ($hit = $result->next_hit) {

           $hitnam = $hit->name;
           print "\t$hitnam\n";

           $nhsp = 0;
           while ($hit->next_hsp) {
                 $nhsp++;
           }
           print "\tHSPS: $nhsp\n";
         } # end next_hit
   }
}

Interestingly, the results are different (but never correct) for the 
different types of output I've tried.  For xml, the script runs but 
produces no output, for plain text the script hangs with no output, and 
for html, I get these errors:


-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|27502689|gb|AAH42571.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 308.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 308.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|21779923|gb|AAM77583.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 333.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 333.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|1644239|dbj|BAA12223.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 358.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 358.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline Positives = 270/273 (98%), Gaps = 0/273 (0%) 
Query 78
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/blast.pm:1172
STACK: ./srchio.pl:8


At this point I should probably try installing all of bioperl-live, or 
at least get IteratedSearchResultEventBuilder.pm - or would you 
recommend something else?  Let me know if you need more info.

Thanks again,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597

From cjfields at uiuc.edu  Thu Jun 15 19:03:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 18:03:37 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <4491D49F.4030208@email.arizona.edu>
Message-ID: <002b01c690cf$efa05510$15327e82@pyrimidine>

...

> Hi Chris and Sendu,
> 
> Thanks for your replies.  I am using blastp from the NCBI BLAST page,
> with this input sequence:

...

> I have tried saving HTML (with and without the graphical overview),
> plain text, and XML.  I am parsing with this script:


> #!/usr/local/bin/perl -w
> 
> use Bio::SearchIO;
> ...
> }

I got this script to work.  I used your sequence and retrieved BLASTP text
output from NCBI BLASTP 2.2.14, then saved it from the web browser, and just
copied it to three separate files.  Using those files as input, they all
parse fine, with output like this:

DB All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding
environmental samples
 ALG BLASTP
QRY
        gi|27502689|gb|AAH42571.1|
        HSPS: 1
        gi|21779923|gb|AAM77583.1|
        HSPS: 1
...

> Interestingly, the results are different (but never correct) for the
> different types of output I've tried.  For xml, the script runs but
> produces no output, for plain text the script hangs with no output, and
> for html, I get these errors:

What's interesting is that HTML did anything at all.  You MUST strip out the
HTML tags as per the FAQ, which I pointed out before:

http://www.bioperl.org/wiki/FAQ

See the question : Does Bio::SearchIO parse the HTML output that BLAST
creates using the -T option?

Again, I would NOT attempt parsing HTML.  The only reason we have a FAQ
question about it is b/c it popped up on the list many many times in the
past (i.e. it is a FAQ) and someone found out that HTML::Strip works.  We
will never adequately support it beyond suggesting stripping the tags out.
NCBI changes their HTML output more often than their text output.

If you tried parsing XML with the format set to 'blast' you'll get nothing
(the blast text parser looks for text output using regexes, so it just
bypasses all the XML tags).  You must set:

-format => 'blastxml' 

You'll also need to install XML::SAX, and I would suggest installing
XML::SAX::ExpatXS and the Expat XML parser for your system to speed things
up.

The 'hanging' you mention using text parsing sounds like the old bug where
it got caught in an infinite loop.  I don't have this problem.  It could be
a couple of things:

1) You have an old version of bioperl and updated Bio::SearchIO, but you
haven't updated Bio::SearchIO::blast. That's the plugin module where the
error was (not Bio::SearchIO).  Try updating either that or install the
entire distribution from scratch.

2) You have two versions of Bioperl installed (an old one and bioperl-live)
and perl is using the old version of bioperl (and the old version of
SearchIO::blast).  Make sure you only have one version installed and that it
is bioperl-live.

> At this point I should probably try installing all of bioperl-live, or
> at least get IteratedSearchResultEventBuilder.pm - or would you
> recommend something else?  Let me know if you need more info.

If you have the entire distribution installed, you should have ISREB anyway.
ISREB (IteratedSearchResultEventBuilder) has nothing to do with the problems
here, though.

Chris

> Thanks again,
> -susan


From cain at cshl.edu  Thu Jun 15 11:25:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 15 Jun 2006 11:25:54 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
Message-ID: <1150385154.2622.152.camel@localhost.localdomain>

Hi all,

When running make test on a fairly new system, I got the following
failure:

t/Compatible.................No Set::Scalar. Unable to test Bio::Tree::Compatible
Can't locate Set/Scalar.pm in @INC
....
BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Tree/Compatible.pm line 138.
Compilation failed in require at t/Compatible.t line 42.
BEGIN failed--compilation aborted at t/Compatible.t line 42.
t/Compatible.................dubious                                         
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Set::Scalar is mentioned in Makefile.PL as an optional package (but not
required) and isn't mentioned in the INSTALL doc anywhere.  It looks
like the author of the test (t/Compatible.t) is trying to skip this test
if Set::Scalar isn't found, but the 'dubious' result gets marked
ultimately as a failure.

What is the right thing to do here?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/8cb53ee4/attachment.bin 

From hlapp at gmx.net  Fri Jun 16 00:42:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 16 Jun 2006 00:42:25 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
In-Reply-To: <1150385154.2622.152.camel@localhost.localdomain>
References: <1150385154.2622.152.camel@localhost.localdomain>
Message-ID: <D4E96C47-977E-474C-B093-82CDE775F6C1@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Should be fixed on the main trunk. -hilmar

On Jun 15, 2006, at 11:25 AM, Scott Cain wrote:

> Hi all,
>
> When running make test on a fairly new system, I got the following
> failure:
>
> t/Compatible.................No Set::Scalar. Unable to test  
> Bio::Tree::Compatible
> Can't locate Set/Scalar.pm in @INC
> ....
> BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl- 
> live/blib/lib/Bio/Tree/Compatible.pm line 138.
> Compilation failed in require at t/Compatible.t line 42.
> BEGIN failed--compilation aborted at t/Compatible.t line 42.
> t/Compatible.................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Set::Scalar is mentioned in Makefile.PL as an optional package (but  
> not
> required) and isn't mentioned in the INSTALL doc anywhere.  It looks
> like the author of the test (t/Compatible.t) is trying to skip this  
> test
> if Set::Scalar isn't found, but the 'dubious' result gets marked
> ultimately as a failure.
>
> What is the right thing to do here?
>
> Thanks,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFEkja5uV6N2JxL7qsRAjqCAJ9RTgPntJ+dmGHeiovS5FeG3QvZagCeMzmw
sKkizbLUYAsyJqVw/2SplcQ=
=ehd6
-----END PGP SIGNATURE-----

From rmb32 at cornell.edu  Thu Jun 15 21:37:03 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 15 Jun 2006 18:37:03 -0700
Subject: [Bioperl-l] reading and writing GFF3
Message-ID: <44920B3F.90405@cornell.edu>

There is stuff in bioperl for reading and writing GFF3.  There's 
Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
is the 'best' one to use?

Neither of these is working very well for me.

My proximate use case is reading in a RepeatMasker report with 
Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
writing those out to a GFF3 file.

Bio::Tools::GFF will take these things and write out something that 
closely resembles GFF3, but with Target attributes that don't seem to 
comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
commas instead of spaces.  I'm attaching a little script that 
illustrates this.

Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
features contained in them, throwing 'only Bio::SeqFeature::Annotated 
objects are writeable'.  This seems a bit silly, since one of the whole 
points of Bioperl is using polymorphism to make it easy to connect 
things together.  I've attached a little script to illustrate this one too.

So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
deprecated?  Why does Bio::FeatureIO::gff only accept 
Bio::SeqFeature::Annotated objects?

Thanks in advance.

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_featureio_gff_test.pl
Type: application/x-perl
Size: 1455 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_tools_gff_test.pl
Type: application/x-perl
Size: 1436 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment-0001.bin 

From cain at cshl.edu  Fri Jun 16 10:18:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 10:18:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <44920B3F.90405@cornell.edu>
References: <44920B3F.90405@cornell.edu>
Message-ID: <1150467493.2622.209.camel@localhost.localdomain>

Hi Rob,

I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
but that is actually a good thing.  The tighter constraints results in a
better, more consistent file format.

The reason only BSF::Annotated features are writable is that there needs
to be tight control on the 'type' of the feature, to insure that the
type is part of the Sequence Ontology.  It also makes it much easier to
properly write out the attributes in the ninth column, particularly the
ones that are 'reserved', like Parent, Dbxref, and Ontology_term.

BTG is still usable, but the GFF3 it puts out is actually more
'GFF3-like'; that is, it looks like GFF3, but because there are no
constraints on the type and the terms that are used in the ninth column,
you have to be very careful using it to produce GFF3, by making sure
that your feature objects conform to the standard before BTG tries to
write them out.  (Of course, one way to do that would be to convert your
feature objects to BSF::Annotated objects, but then you could use
BFIO::gff :-)

[Long pause while scott goes and monkeys with Bio::Tools::GFF]

OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
this is completely valid.  (I even fixed the escaping the of the stray
'=' in 'hind_R=2046'.)  The output I get is this:

##gff-version 3
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120

Scott


On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> There is stuff in bioperl for reading and writing GFF3.  There's 
> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
> is the 'best' one to use?
> 
> Neither of these is working very well for me.
> 
> My proximate use case is reading in a RepeatMasker report with 
> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
> writing those out to a GFF3 file.
> 
> Bio::Tools::GFF will take these things and write out something that 
> closely resembles GFF3, but with Target attributes that don't seem to 
> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
> commas instead of spaces.  I'm attaching a little script that 
> illustrates this.
> 
> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
> objects are writeable'.  This seems a bit silly, since one of the whole 
> points of Bioperl is using polymorphism to make it easy to connect 
> things together.  I've attached a little script to illustrate this one too.
> 
> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
> deprecated?  Why does Bio::FeatureIO::gff only accept 
> Bio::SeqFeature::Annotated objects?
> 
> Thanks in advance.
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/572bd98b/attachment.bin 

From rmb32 at cornell.edu  Fri Jun 16 14:36:22 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 11:36:22 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
References: <44920B3F.90405@cornell.edu>
	<1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <4492FA26.6030909@cornell.edu>

Thanks for the reply Scott.  It's good that the BSF::Annotated features 
control the type to be in the SO.  I sort of came to the "BTG is only 
gff3-/like/" conclusion myself as I poked around in the two modules in 
question, so I'd much rather use BSF::gff.  So I guess the question now 
is (and this will probably be a pretty common use case) how does one 
take an "old" Bio::SeqFeature::Generic or the like object and make it 
into a Bio::SeqFeature::Annotated?


Rob

Scott Cain wrote:
> Hi Rob,
>
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
>
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
>
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
>
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120
>
> Scott
>
>
>
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>   
>> There is stuff in bioperl for reading and writing GFF3.  There's 
>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
>> is the 'best' one to use?
>>
>> Neither of these is working very well for me.
>>
>> My proximate use case is reading in a RepeatMasker report with 
>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
>> writing those out to a GFF3 file.
>>
>> Bio::Tools::GFF will take these things and write out something that 
>> closely resembles GFF3, but with Target attributes that don't seem to 
>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
>> commas instead of spaces.  I'm attaching a little script that 
>> illustrates this.
>>
>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
>> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
>> objects are writeable'.  This seems a bit silly, since one of the whole 
>> points of Bioperl is using polymorphism to make it easy to connect 
>> things together.  I've attached a little script to illustrate this one too.
>>
>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
>> deprecated?  Why does Bio::FeatureIO::gff only accept 
>> Bio::SeqFeature::Annotated objects?
>>
>> Thanks in advance.
>>
>> Rob
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 16 15:12:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Jun 2006 14:12:28 -0500
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>

Scott, 

Looks like Robert also submitted a bug report related to this as well.
Could you check into it (pretty-please)?  I'm still GFF3-illiterate.

http://bugzilla.open-bio.org/show_bug.cgi?id=2025

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Scott Cain
> Sent: Friday, June 16, 2006 9:18 AM
> To: Robert Buels
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] reading and writing GFF3
> 
> Hi Rob,
> 
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
> 
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> 
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
> 
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> 
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
> 
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> 918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> 488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> 1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> 312     -       .       Target=hind_R%3D2046 59 120
> 
> Scott
> 
> 
> 
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > There is stuff in bioperl for reading and writing GFF3.  There's
> > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > is the 'best' one to use?
> >
> > Neither of these is working very well for me.
> >
> > My proximate use case is reading in a RepeatMasker report with
> > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > writing those out to a GFF3 file.
> >
> > Bio::Tools::GFF will take these things and write out something that
> > closely resembles GFF3, but with Target attributes that don't seem to
> > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > commas instead of spaces.  I'm attaching a little script that
> > illustrates this.
> >
> > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > objects are writeable'.  This seems a bit silly, since one of the whole
> > points of Bioperl is using polymorphism to make it easy to connect
> > things together.  I've attached a little script to illustrate this one
> too.
> >
> > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > deprecated?  Why does Bio::FeatureIO::gff only accept
> > Bio::SeqFeature::Annotated objects?
> >
> > Thanks in advance.
> >
> > Rob
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory


From rmb32 at cornell.edu  Fri Jun 16 15:30:23 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:30:23 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <449306CF.1030301@cornell.edu>

Woops, I should have said something about that.  I submitted it before I 
saw that Scott had already done the escaping in CVS.

Chris Fields wrote:
> Scott, 
>
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>> Sent: Friday, June 16, 2006 9:18 AM
>> To: Robert Buels
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>
>> Hi Rob,
>>
>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>> but that is actually a good thing.  The tighter constraints results in a
>> better, more consistent file format.
>>
>> The reason only BSF::Annotated features are writable is that there needs
>> to be tight control on the 'type' of the feature, to insure that the
>> type is part of the Sequence Ontology.  It also makes it much easier to
>> properly write out the attributes in the ninth column, particularly the
>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>
>> BTG is still usable, but the GFF3 it puts out is actually more
>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>> constraints on the type and the terms that are used in the ninth column,
>> you have to be very careful using it to produce GFF3, by making sure
>> that your feature objects conform to the standard before BTG tries to
>> write them out.  (Of course, one way to do that would be to convert your
>> feature objects to BSF::Annotated objects, but then you could use
>> BFIO::gff :-)
>>
>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>
>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>> this is completely valid.  (I even fixed the escaping the of the stray
>> '=' in 'hind_R=2046'.)  The output I get is this:
>>
>> ##gff-version 3
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>> 918     -       .       Target=Contig151 325 832
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>> 488     -       .       Target=Contig386 1 124
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>> 1718    +       .       Target=Contig358 1 311
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>> 312     -       .       Target=hind_R%3D2046 59 120
>>
>> Scott
>>
>>
>>
>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>     
>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>> is the 'best' one to use?
>>>
>>> Neither of these is working very well for me.
>>>
>>> My proximate use case is reading in a RepeatMasker report with
>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>> writing those out to a GFF3 file.
>>>
>>> Bio::Tools::GFF will take these things and write out something that
>>> closely resembles GFF3, but with Target attributes that don't seem to
>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>> commas instead of spaces.  I'm attaching a little script that
>>> illustrates this.
>>>
>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>> points of Bioperl is using polymorphism to make it easy to connect
>>> things together.  I've attached a little script to illustrate this one
>>>       
>> too.
>>     
>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>> Bio::SeqFeature::Annotated objects?
>>>
>>> Thanks in advance.
>>>
>>> Rob
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Fri Jun 16 15:34:16 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:34:16 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150486453.4412.30.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
Message-ID: <449307B8.5040802@cornell.edu>

So about that converting ye olde feature objects into 
Bio::SeqFeature::Annotated objects.  How do I do it?


Scott Cain wrote:
> That's OK--You added a few items that should be escaped that weren't, so
> I added those too.
>
> Thanks,
> Scott
>
>
> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>   
>> Woops, I should have said something about that.  I submitted it before
>> I saw that Scott had already done the escaping in CVS.
>>
>> Chris Fields wrote: 
>>     
>>> Scott, 
>>>
>>> Looks like Robert also submitted a bug report related to this as well.
>>> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>>>
>>> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>>>
>>> Chris
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>>>> Sent: Friday, June 16, 2006 9:18 AM
>>>> To: Robert Buels
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>>>
>>>> Hi Rob,
>>>>
>>>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>>>> but that is actually a good thing.  The tighter constraints results in a
>>>> better, more consistent file format.
>>>>
>>>> The reason only BSF::Annotated features are writable is that there needs
>>>> to be tight control on the 'type' of the feature, to insure that the
>>>> type is part of the Sequence Ontology.  It also makes it much easier to
>>>> properly write out the attributes in the ninth column, particularly the
>>>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>>>
>>>> BTG is still usable, but the GFF3 it puts out is actually more
>>>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>>>> constraints on the type and the terms that are used in the ninth column,
>>>> you have to be very careful using it to produce GFF3, by making sure
>>>> that your feature objects conform to the standard before BTG tries to
>>>> write them out.  (Of course, one way to do that would be to convert your
>>>> feature objects to BSF::Annotated objects, but then you could use
>>>> BFIO::gff :-)
>>>>
>>>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>>>
>>>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>>>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>>>> this is completely valid.  (I even fixed the escaping the of the stray
>>>> '=' in 'hind_R=2046'.)  The output I get is this:
>>>>
>>>> ##gff-version 3
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>>>> 918     -       .       Target=Contig151 325 832
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>>>> 488     -       .       Target=Contig386 1 124
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>>>> 1718    +       .       Target=Contig358 1 311
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>>>> 312     -       .       Target=hind_R%3D2046 59 120
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>>>     
>>>>         
>>>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>>>> is the 'best' one to use?
>>>>>
>>>>> Neither of these is working very well for me.
>>>>>
>>>>> My proximate use case is reading in a RepeatMasker report with
>>>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>>>> writing those out to a GFF3 file.
>>>>>
>>>>> Bio::Tools::GFF will take these things and write out something that
>>>>> closely resembles GFF3, but with Target attributes that don't seem to
>>>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>>>> commas instead of spaces.  I'm attaching a little script that
>>>>> illustrates this.
>>>>>
>>>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>>>> points of Bioperl is using polymorphism to make it easy to connect
>>>>> things together.  I've attached a little script to illustrate this one
>>>>>       
>>>>>           
>>>> too.
>>>>     
>>>>         
>>>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>>>> Bio::SeqFeature::Annotated objects?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Rob
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>     
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>   
>>>       
>> -- 
>> Robert Buels
>> SGN Bioinformatics Analyst
>> 252A Emerson Hall, Cornell University
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>>
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Fri Jun 16 15:28:52 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:28:52 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <1150486133.4412.25.camel@localhost.localdomain>

I tweaked the patch and applied it, and closed the bug.

Thanks for pointing it out--I doubt I would have noticed it in the
bioper-guts mailing, which I generally don't look too closely at :-o

Scott


On Fri, 2006-06-16 at 14:12 -0500, Chris Fields wrote:
> Scott, 
> 
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > Sent: Friday, June 16, 2006 9:18 AM
> > To: Robert Buels
> > Cc: bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] reading and writing GFF3
> > 
> > Hi Rob,
> > 
> > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > but that is actually a good thing.  The tighter constraints results in a
> > better, more consistent file format.
> > 
> > The reason only BSF::Annotated features are writable is that there needs
> > to be tight control on the 'type' of the feature, to insure that the
> > type is part of the Sequence Ontology.  It also makes it much easier to
> > properly write out the attributes in the ninth column, particularly the
> > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > 
> > BTG is still usable, but the GFF3 it puts out is actually more
> > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > constraints on the type and the terms that are used in the ninth column,
> > you have to be very careful using it to produce GFF3, by making sure
> > that your feature objects conform to the standard before BTG tries to
> > write them out.  (Of course, one way to do that would be to convert your
> > feature objects to BSF::Annotated objects, but then you could use
> > BFIO::gff :-)
> > 
> > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > 
> > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > this is completely valid.  (I even fixed the escaping the of the stray
> > '=' in 'hind_R=2046'.)  The output I get is this:
> > 
> > ##gff-version 3
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > 918     -       .       Target=Contig151 325 832
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > 488     -       .       Target=Contig386 1 124
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > 1718    +       .       Target=Contig358 1 311
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > 312     -       .       Target=hind_R%3D2046 59 120
> > 
> > Scott
> > 
> > 
> > 
> > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > is the 'best' one to use?
> > >
> > > Neither of these is working very well for me.
> > >
> > > My proximate use case is reading in a RepeatMasker report with
> > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > writing those out to a GFF3 file.
> > >
> > > Bio::Tools::GFF will take these things and write out something that
> > > closely resembles GFF3, but with Target attributes that don't seem to
> > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > commas instead of spaces.  I'm attaching a little script that
> > > illustrates this.
> > >
> > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > points of Bioperl is using polymorphism to make it easy to connect
> > > things together.  I've attached a little script to illustrate this one
> > too.
> > >
> > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > Bio::SeqFeature::Annotated objects?
> > >
> > > Thanks in advance.
> > >
> > > Rob
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/912257e8/attachment.bin 

From cain at cshl.edu  Fri Jun 16 15:34:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:34:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449306CF.1030301@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
Message-ID: <1150486453.4412.30.camel@localhost.localdomain>

That's OK--You added a few items that should be escaped that weren't, so
I added those too.

Thanks,
Scott


On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> Woops, I should have said something about that.  I submitted it before
> I saw that Scott had already done the escaping in CVS.
> 
> Chris Fields wrote: 
> > Scott, 
> > 
> > Looks like Robert also submitted a bug report related to this as well.
> > Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> > 
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> > 
> > Chris
> > 
> >   
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > > Sent: Friday, June 16, 2006 9:18 AM
> > > To: Robert Buels
> > > Cc: bioperl-l at bioperl.org
> > > Subject: Re: [Bioperl-l] reading and writing GFF3
> > > 
> > > Hi Rob,
> > > 
> > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > > but that is actually a good thing.  The tighter constraints results in a
> > > better, more consistent file format.
> > > 
> > > The reason only BSF::Annotated features are writable is that there needs
> > > to be tight control on the 'type' of the feature, to insure that the
> > > type is part of the Sequence Ontology.  It also makes it much easier to
> > > properly write out the attributes in the ninth column, particularly the
> > > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > > 
> > > BTG is still usable, but the GFF3 it puts out is actually more
> > > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > > constraints on the type and the terms that are used in the ninth column,
> > > you have to be very careful using it to produce GFF3, by making sure
> > > that your feature objects conform to the standard before BTG tries to
> > > write them out.  (Of course, one way to do that would be to convert your
> > > feature objects to BSF::Annotated objects, but then you could use
> > > BFIO::gff :-)
> > > 
> > > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > > 
> > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > > this is completely valid.  (I even fixed the escaping the of the stray
> > > '=' in 'hind_R=2046'.)  The output I get is this:
> > > 
> > > ##gff-version 3
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > > 918     -       .       Target=Contig151 325 832
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > > 488     -       .       Target=Contig386 1 124
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > > 1718    +       .       Target=Contig358 1 311
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > > 312     -       .       Target=hind_R%3D2046 59 120
> > > 
> > > Scott
> > > 
> > > 
> > > 
> > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > >     
> > > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > > is the 'best' one to use?
> > > > 
> > > > Neither of these is working very well for me.
> > > > 
> > > > My proximate use case is reading in a RepeatMasker report with
> > > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > > writing those out to a GFF3 file.
> > > > 
> > > > Bio::Tools::GFF will take these things and write out something that
> > > > closely resembles GFF3, but with Target attributes that don't seem to
> > > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > > commas instead of spaces.  I'm attaching a little script that
> > > > illustrates this.
> > > > 
> > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > > points of Bioperl is using polymorphism to make it easy to connect
> > > > things together.  I've attached a little script to illustrate this one
> > > >       
> > > too.
> > >     
> > > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > > Bio::SeqFeature::Annotated objects?
> > > > 
> > > > Thanks in advance.
> > > > 
> > > > Rob
> > > > 
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >       
> > > --
> > > ------------------------------------------------------------------------
> > > Scott Cain, Ph. D.                                         cain at cshl.edu
> > > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > > Cold Spring Harbor Laboratory
> > >     
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >   
> 
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/3dfde2ea/attachment.bin 

From cain at cshl.edu  Fri Jun 16 15:55:31 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:55:31 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449307B8.5040802@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
Message-ID: <1150487731.4412.35.camel@localhost.localdomain>

Um, yeah, good question.  The reason I didn't answer you when you wrote
before is that I was hoping for divine inspiration for an answer (or for
somebody else to answer, which would have been really great :-)

The short answer (and easy one for me to type) is that you will probably
need an ad hoc method to do it, which is the same thing I do when I need
to convert gff2 to gff3, to make sure the things I need mapped get
mapped the 'right' way (that is, the way I want them to go).  I don't
have any sample code that does this, but if you want to start working up
an ad hoc method, I will certainly try to help you as much as I can.

Scott


On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> So about that converting ye olde feature objects into 
> Bio::SeqFeature::Annotated objects.  How do I do it?
> 
> 
> Scott Cain wrote:
> > That's OK--You added a few items that should be escaped that weren't, so
> > I added those too.
> >
> > Thanks,
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >   
> >> Woops, I should have said something about that.  I submitted it before
> >> I saw that Scott had already done the escaping in CVS.
> >>
> >> Chris Fields wrote: 
> >>     
> >>> Scott, 
> >>>
> >>> Looks like Robert also submitted a bug report related to this as well

From rmb32 at cornell.edu  Fri Jun 16 16:31:08 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 13:31:08 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150487731.4412.35.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
Message-ID: <4493150C.1080909@cornell.edu>

Rather than cobble together some ad-hoc solution, I would be interested 
in working on a good solution to this problem, because it seems like 
it's just going to get more common as more people start wanting to write 
GFF3.  What about some code in whatever customarily makes these objects 
(probably BSF::Annotated's new() method?) that could take another type 
of Feature object and attempt to shoehorn its data into a new 
BSF::Annotated?  If it failed (because the type isn't in SO or 
whatever), it could throw() some informative error message.

Then, people could write straightforward code something like:

while(my $oldstylefeature = $features_in->next_feature) {
    $oldstylefeature->primary_tag('something_that_is_in_so');
    $oldstylefeature->something_else('some other something that needs to 
be changed for compliance');
    my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
    $gff3_out->write_feature($newfeature);
}

Does that sound like a good idea?  I'd be more than willing to implement 
this, since I'm going to need to do this sort of thing with many more 
things than just RepeatMasker.

Rob

Scott Cain wrote:
> Um, yeah, good question.  The reason I didn't answer you when you wrote
> before is that I was hoping for divine inspiration for an answer (or for
> somebody else to answer, which would have been really great :-)
>
> The short answer (and easy one for me to type) is that you will probably
> need an ad hoc method to do it, which is the same thing I do when I need
> to convert gff2 to gff3, to make sure the things I need mapped get
> mapped the 'right' way (that is, the way I want them to go).  I don't
> have any sample code that does this, but if you want to start working up
> an ad hoc method, I will certainly try to help you as much as I can.
>
> Scott
>
>
> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>   
>> So about that converting ye olde feature objects into 
>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>
>>
>> Scott Cain wrote:
>>     
>>> That's OK--You added a few items that should be escaped that weren't, so
>>> I added those too.
>>>
>>> Thanks,
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> Woops, I should have said something about that.  I submitted it before
>>>> I saw that Scott had already done the escaping in CVS.
>>>>
>>>> Chris Fields wrote: 
>>>>     
>>>>         
>>>>> Scott, 
>>>>>
>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Sat Jun 17 06:36:59 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 03:36:59 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>	
	<1150486453.4412.30.camel@localhost.localdomain>	
	<449307B8.5040802@cornell.edu>	
	<1150487731.4412.35.camel@localhost.localdomain>	
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <4493DB4B.4020509@cornell.edu>

Yep.  I'm almost finished with the first draft of a function that does 
this.  I'll polish it up over the weekend then on Monday I'll submit a 
bugzilla bug and patch with it so you can take a look.

Rob

Scott Cain wrote:
> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>   
>> Rather than cobble together some ad-hoc solution, I would be interested 
>> in working on a good solution to this problem, because it seems like 
>> it's just going to get more common as more people start wanting to write 
>> GFF3.  What about some code in whatever customarily makes these objects 
>> (probably BSF::Annotated's new() method?) that could take another type 
>> of Feature object and attempt to shoehorn its data into a new 
>> BSF::Annotated?  If it failed (because the type isn't in SO or 
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that needs to 
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to implement 
>> this, since I'm going to need to do this sort of thing with many more 
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>     
>>> Um, yeah, good question.  The reason I didn't answer you when you wrote
>>> before is that I was hoping for divine inspiration for an answer (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will probably
>>> need an ad hoc method to do it, which is the same thing I do when I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>> have any sample code that does this, but if you want to start working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> So about that converting ye olde feature objects into 
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>     
>>>>         
>>>>> That's OK--You added a few items that should be escaped that weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Woops, I should have said something about that.  I submitted it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote: 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Scott, 
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>               

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Fri Jun 16 23:56:44 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 23:56:44 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <4493150C.1080909@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
Message-ID: <1150516605.2600.9.camel@localhost.localdomain>

Rob,

I came to the same conclusion as well; I wrote my response as I was
heading out the door and while I was running errands, I realized the
right thing to do is to write a Bio::SeqFeature::Annotated method called
new_from_object, whose usage would be:

  my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);

where you would give it a Bio::SeqFeatureI compliant object and try to
create a BSFA like use suggested below.  You could allow passing in args
to control how different things are handled, like mapping non-SO types
to SO types.  I'll think about this over the weekend and let you know if
brilliance strikes me.

Scott


On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> Rather than cobble together some ad-hoc solution, I would be interested 
> in working on a good solution to this problem, because it seems like 
> it's just going to get more common as more people start wanting to write 
> GFF3.  What about some code in whatever customarily makes these objects 
> (probably BSF::Annotated's new() method?) that could take another type 
> of Feature object and attempt to shoehorn its data into a new 
> BSF::Annotated?  If it failed (because the type isn't in SO or 
> whatever), it could throw() some informative error message.
> 
> Then, people could write straightforward code something like:
> 
> while(my $oldstylefeature = $features_in->next_feature) {
>     $oldstylefeature->primary_tag('something_that_is_in_so');
>     $oldstylefeature->something_else('some other something that needs to 
> be changed for compliance');
>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>     $gff3_out->write_feature($newfeature);
> }
> 
> Does that sound like a good idea?  I'd be more than willing to implement 
> this, since I'm going to need to do this sort of thing with many more 
> things than just RepeatMasker.
> 
> Rob
> 
> Scott Cain wrote:
> > Um, yeah, good question.  The reason I didn't answer you when you wrote
> > before is that I was hoping for divine inspiration for an answer (or for
> > somebody else to answer, which would have been really great :-)
> >
> > The short answer (and easy one for me to type) is that you will probably
> > need an ad hoc method to do it, which is the same thing I do when I need
> > to convert gff2 to gff3, to make sure the things I need mapped get
> > mapped the 'right' way (that is, the way I want them to go).  I don't
> > have any sample code that does this, but if you want to start working up
> > an ad hoc method, I will certainly try to help you as much as I can.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >   
> >> So about that converting ye olde feature objects into 
> >> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>
> >>
> >> Scott Cain wrote:
> >>     
> >>> That's OK--You added a few items that should be escaped that weren't, so
> >>> I added those too.
> >>>
> >>> Thanks,
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>   
> >>>       
> >>>> Woops, I should have said something about that.  I submitted it before
> >>>> I saw that Scott had already done the escaping in CVS.
> >>>>
> >>>> Chris Fields wrote: 
> >>>>     
> >>>>         
> >>>>> Scott, 
> >>>>>
> >>>>> Looks like Robert also submitted a bug report related to this as well=
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/7ff49e0d/attachment.bin 

From hlapp at gmx.net  Sat Jun 17 12:20:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 17 Jun 2006 12:20:08 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You don't need a new method for this. Instead, support a -feature  
argument.

	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);

This should work for any instance of Bio::SeqFeatureI. If it is a  
B::SF::Annotated already it is obviously just a deep copy (if copy is  
desired - could be another parameter). Otherwise more will be involved.

Alternatively, and possibly better, is to write a specialized  
SeqFeatureI factory (that would implement  
Bio::Factory::ObjectFactoryI) and then delegate this job to it:

	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
		-type_ontology => $sequence_ontology,
		-source_ontology => $feature_source_ontology,
		-unflatten => 1);
	my $bsfa = $feat_factory->create_object({-feature => $feature});

This is preferable because it separates business logic that isn't  
necessarily related into defined units. I.e., the logic necessary to  
convert an ordinary feature into a strongly typed one is different  
from how to represent a strongly typed feature. IMHO anyway ...

Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
started as the result of a discussion thread earlier this (or last?)  
year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
though not in concept.

Maybe we need to get together again and thrash out a strategy; or a  
BOF at the GMOD meeting? I feel this does need a core group of people  
who care, hash out a strategy that will also solve the backwards  
compatibility problem with the current Bio::SeqFeatureI state-of- 
limbo, and allow us to implement the decisions with a few people in a  
concentrated effort. This will then also remove the only real large  
stumbling block towards a 1.6 release.

Maybe we should think about a little pre-GMOD hackathon to clear up  
this mess? Scott, you'll be there a day early? I'll be already back  
and Jason I believe will still be in town, although he may have other  
commitments already. Nonetheless, it shouldn't really take that much  
but rather dedicated time, a whiteboard, and a few people who care  
thrashing this out and then do it.

Thoughts?

	-hilmar

On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:

> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method  
> called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> ($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in  
> args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you  
> know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>> Rather than cobble together some ad-hoc solution, I would be  
>> interested
>> in working on a good solution to this problem, because it seems like
>> it's just going to get more common as more people start wanting to  
>> write
>> GFF3.  What about some code in whatever customarily makes these  
>> objects
>> (probably BSF::Annotated's new() method?) that could take another  
>> type
>> of Feature object and attempt to shoehorn its data into a new
>> BSF::Annotated?  If it failed (because the type isn't in SO or
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that  
>> needs to
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>> ($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to  
>> implement
>> this, since I'm going to need to do this sort of thing with many more
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>> wrote
>>> before is that I was hoping for divine inspiration for an answer  
>>> (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will  
>>> probably
>>> need an ad hoc method to do it, which is the same thing I do when  
>>> I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>> don't
>>> have any sample code that does this, but if you want to start  
>>> working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>
>>>> So about that converting ye olde feature objects into
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>>> That's OK--You added a few items that should be escaped that  
>>>>> weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>
>>>>>
>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>> it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>> Scott,
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>> as well=
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> --------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
ImoAXD/jrbF0gXzSr2CY4tQ=
=XfDq
-----END PGP SIGNATURE-----

From rmb32 at cornell.edu  Sat Jun 17 14:36:28 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 11:36:28 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <44944BAC.7000302@cornell.edu>

I'd love to help more with this, since with the new tomato genome coming 
in my job is going to be working more and more with annotations, but I'm 
not a core person and I can't go to the meeting in NC.  In the interests 
of getting my job done right now, I've implemented a -feature argument 
to Bio::SeqFeature::Annotated's constructor, which calls uses a method 
from_feature() I added.  If you guys want it, it's attached to bug 2026.

 From the perspective of a casual bioperl user, anything you guys can do 
to make the handling of features and annotations less fragmented and 
more robust would be wonderful.  I'd be happy to help with 
implementation if one of you grizzled veterans would give me marching 
orders. :-)

Rob

Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> You don't need a new method for this. Instead, support a -feature 
> argument.
>
>     my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>
> This should work for any instance of Bio::SeqFeatureI. If it is a 
> B::SF::Annotated already it is obviously just a deep copy (if copy is 
> desired - could be another parameter). Otherwise more will be involved.
>
> Alternatively, and possibly better, is to write a specialized 
> SeqFeatureI factory (that would implement 
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>
>     my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>         -type_ontology => $sequence_ontology,
>         -source_ontology => $feature_source_ontology,
>         -unflatten => 1);
>     my $bsfa = $feat_factory->create_object({-feature => $feature});
>
> This is preferable because it separates business logic that isn't 
> necessarily related into defined units. I.e., the logic necessary to 
> convert an ordinary feature into a strongly typed one is different 
> from how to represent a strongly typed feature. IMHO anyway ...
>
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan 
> started as the result of a discussion thread earlier this (or last?) 
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted, 
> though not in concept.
>
> Maybe we need to get together again and thrash out a strategy; or a 
> BOF at the GMOD meeting? I feel this does need a core group of people 
> who care, hash out a strategy that will also solve the backwards 
> compatibility problem with the current Bio::SeqFeatureI 
> state-of-limbo, and allow us to implement the decisions with a few 
> people in a concentrated effort. This will then also remove the only 
> real large stumbling block towards a 1.6 release.
>
> Maybe we should think about a little pre-GMOD hackathon to clear up 
> this mess? Scott, you'll be there a day early? I'll be already back 
> and Jason I believe will still be in town, although he may have other 
> commitments already. Nonetheless, it shouldn't really take that much 
> but rather dedicated time, a whiteboard, and a few people who care 
> thrashing this out and then do it.
>
> Thoughts?
>
>     -hilmar
>
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>
>> Rob,
>>
>> I came to the same conclusion as well; I wrote my response as I was
>> heading out the door and while I was running errands, I realized the
>> right thing to do is to write a Bio::SeqFeature::Annotated method called
>> new_from_object, whose usage would be:
>>
>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, 
>> %args);
>>
>> where you would give it a Bio::SeqFeatureI compliant object and try to
>> create a BSFA like use suggested below.  You could allow passing in args
>> to control how different things are handled, like mapping non-SO types
>> to SO types.  I'll think about this over the weekend and let you know if
>> brilliance strikes me.
>>
>> Scott
>>
>>
>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>> Rather than cobble together some ad-hoc solution, I would be interested
>>> in working on a good solution to this problem, because it seems like
>>> it's just going to get more common as more people start wanting to 
>>> write
>>> GFF3.  What about some code in whatever customarily makes these objects
>>> (probably BSF::Annotated's new() method?) that could take another type
>>> of Feature object and attempt to shoehorn its data into a new
>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>> whatever), it could throw() some informative error message.
>>>
>>> Then, people could write straightforward code something like:
>>>
>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>     $oldstylefeature->something_else('some other something that 
>>> needs to
>>> be changed for compliance');
>>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>>     $gff3_out->write_feature($newfeature);
>>> }
>>>
>>> Does that sound like a good idea?  I'd be more than willing to 
>>> implement
>>> this, since I'm going to need to do this sort of thing with many more
>>> things than just RepeatMasker.
>>>
>>> Rob
>>>
>>> Scott Cain wrote:
>>>> Um, yeah, good question.  The reason I didn't answer you when you 
>>>> wrote
>>>> before is that I was hoping for divine inspiration for an answer 
>>>> (or for
>>>> somebody else to answer, which would have been really great :-)
>>>>
>>>> The short answer (and easy one for me to type) is that you will 
>>>> probably
>>>> need an ad hoc method to do it, which is the same thing I do when I 
>>>> need
>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>>> have any sample code that does this, but if you want to start 
>>>> working up
>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>
>>>>> So about that converting ye olde feature objects into
>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> That's OK--You added a few items that should be escaped that 
>>>>>> weren't, so
>>>>>> I added those too.
>>>>>>
>>>>>> Thanks,
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> Woops, I should have said something about that.  I submitted it 
>>>>>>> before
>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Scott,
>>>>>>>>
>>>>>>>> Looks like Robert also submitted a bug report related to this 
>>>>>>>> as well=
>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> -------------------------------------------------------------------------- 
>>
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
>
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Sat Jun 17 16:21:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 17 Jun 2006 15:21:37 -0500
Subject: [Bioperl-l] OT : Re:  reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1D0C8412-3705-47EF-9AAA-1DD0B09AD6B5@uiuc.edu>


On Jun 17, 2006, at 11:20 AM, Hilmar Lapp wrote:
>
> Maybe we need to get together again and thrash out a strategy; or a
> BOF at the GMOD meeting? I feel this does need a core group of people
> who care, hash out a strategy that will also solve the backwards
> compatibility problem with the current Bio::SeqFeatureI state-of-
> limbo, and allow us to implement the decisions with a few people in a
> concentrated effort. This will then also remove the only real large
> stumbling block towards a 1.6 release.

That would be fantastic!

A bit OT, but if plans are afoot for a 1.6 release maybe the 'core  
group' that meets at NC could start drawing up a list of ideas/plans  
towards that release, even if it is still a ways off.  A roadmap of  
sorts so the community knows where to put forth the majority of their  
effort and focus.

Chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Mon Jun 19 06:16:57 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 19 Jun 2006 12:16:57 +0200
Subject: [Bioperl-l] doc.bioperl
Message-ID: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>

Hi,

I just noted that it can happen that the pages at doc.bioperl.org
state "No synopsis" whereas there is one in the PM file (use perldoc
or the CVS).
An example:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Fasta.html
No synopsis, No description, but

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup

shows both.

So, if you're looking for documentation don't forget to do e.g.
"perldoc Bio::DB::Fasta"

regards,
bernd

From cjfields at uiuc.edu  Mon Jun 19 10:38:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 09:38:01 -0500
Subject: [Bioperl-l] doc.bioperl
In-Reply-To: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>
Message-ID: <001501c693ad$f7689790$15327e82@pyrimidine>

This has been reported as a bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1926

Jason mentions in the bug report that the POD may contain something that
messes with the way PDOC deals with code so should be rewritten.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Monday, June 19, 2006 5:17 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] doc.bioperl
> 
> Hi,
> 
> I just noted that it can happen that the pages at doc.bioperl.org
> state "No synopsis" whereas there is one in the PM file (use perldoc
> or the CVS).
> An example:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-
> live/Bio/DB/Fasta.html
> No synopsis, No description, but
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-
> live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup
> 
> shows both.
> 
> So, if you're looking for documentation don't forget to do e.g.
> "perldoc Bio::DB::Fasta"
> 
> regards,
> bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Mon Jun 19 10:59:23 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 19 Jun 2006 09:59:23 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <83485BEB-2457-4FD6-90B8-353228868C3A@wustl.edu>

Hi,

Just curious if any other BioPerlers will be at the YAPC conference  
in Chicago next week (http://yapcchicago.org/). Some of us from the  
WashU GSC will be there, and it might be fun to meet some other  
BioPerl people over lunch or something. If there's enough interest, I  
will organize.

By the way, if you're unfamiliar with the conference and are  
interested in attending, I think registration is still open. The fee  
is low ($100).

Dave


-- 
Dave Messina
Informatics Analyst
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1415


From ClarkeW at AGR.GC.CA  Mon Jun 19 18:34:37 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 19 Jun 2006 18:34:37 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>

Hi,

I am getting the following warning and then exception 

 
-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is 1

---------------------------------------------------

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to [ACTG*] which does not look
healthy

 
NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
sequence)

 
when extracting display name and sequence from a MYSQL database. My code
is as follows:

 
my $sql = "select Clone_Name,Sequence from tbl_bgene";

     my $sth = $dbh->prepare($sql);

     $sth->execute();

     while (my $hash = $sth->fetchrow_hashref()) {

          # print("Name: ".$hash->{'Clone_Name'}."\n");

          my $seq = new Bio::Seq(  -display_id     =>
$hash->{'Clone_Name'},

                                   -seq      =>   $hash->{'Sequence'});

          $handle->write_seq($seq);

          # print("Sequence: ".$hash->{'Sequence'}."\n");

     }

 
For some reason it is failing on a particular sequence, which is a valid
DNA sequence. If anyone has any ideas on why this is I would appreciate
it.

 
Thanks, Wayne


From torsten.seemann at infotech.monash.edu.au  Mon Jun 19 19:30:19 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 09:30:19 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <4497338B.3030609@infotech.monash.edu.au>

> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 1
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.

Usually a '*' indicates a STOP codon in a protein sequence.
I don't think it is valid in a DNA sequence?

So my guess is that BioPerl is auto-detecting it as Protein sequence,
as A,C,T,G are all valid amino acids, and * is a stop codon.

So I think BioPerl is doing the right thing.

If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
Bio:Seq constructor.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/

From taerwin at gmail.com  Mon Jun 19 21:38:14 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 11:38:14 +1000
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>

Hi all,

Does anyone have a runnable for cap3? There seems to be some discussion
about one in the mailing archives (
http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
find any code.


Regards,

Tim

From osborne1 at optonline.net  Mon Jun 19 22:23:43 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 19 Jun 2006 22:23:43 -0400
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
Message-ID: <C0BCD46F.8EA5%osborne1@optonline.net>

Tim,

The code seems to be here, not clear if there's an executable:

http://seq.cs.iastate.edu/download.html


Brian O.


On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:

> Hi all,
> 
> Does anyone have a runnable for cap3? There seems to be some discussion
> about one in the mailing archives (
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
> find any code.
> 
> 
> 
> Regards,
> 
> Tim
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jun 19 23:23:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 22:23:26 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <000701c69418$e53b9110$15327e82@pyrimidine>

You really haven't given us much to work with more than "this doesn't work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array; hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?  I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>   $hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From taerwin at gmail.com  Mon Jun 19 23:05:13 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 13:05:13 +1000
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <C0BCD46F.8EA5%osborne1@optonline.net>
References: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
	<C0BCD46F.8EA5%osborne1@optonline.net>
Message-ID: <c7d2b5330606192005o63ed5d6i608d6b2076399932@mail.gmail.com>

Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3

Regards,

Tim

On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
>
> Tim,
>
> The code seems to be here, not clear if there's an executable:
>
> http://seq.cs.iastate.edu/download.html
>
>
> Brian O.
>
>
> On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
>
> > Hi all,
> >
> > Does anyone have a runnable for cap3? There seems to be some discussion
> > about one in the mailing archives (
> > http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> cannot
> > find any code.
> >
> >
> >
> > Regards,
> >
> > Tim
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>

From torsten.seemann at infotech.monash.edu.au  Mon Jun 19 23:07:12 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 13:07:12 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <4497338B.3030609@infotech.monash.edu.au>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
	<4497338B.3030609@infotech.monash.edu.au>
Message-ID: <44976660.7030107@infotech.monash.edu.au>

> If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
> Bio:Seq constructor.

That should be -alphabet => 'dna'.
D'oh!

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From Marc.Logghe at DEVGEN.com  Tue Jun 20 03:13:22 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 20 Jun 2006 09:13:22 +0200
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6D3D60B@ANTARESIA.be.devgen.com>

It is about 3 years old and did not test it with the current bioperl
release.
Feel free to play with it.
Cheers,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Tim Erwin
> Sent: Tuesday, June 20, 2006 5:05 AM
> To: Brian Osborne
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] cap3 runnable?
> 
> Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3
> 
> Regards,
> 
> Tim
> 
> On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
> >
> > Tim,
> >
> > The code seems to be here, not clear if there's an executable:
> >
> > http://seq.cs.iastate.edu/download.html
> >
> >
> > Brian O.
> >
> >
> > On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Does anyone have a runnable for cap3? There seems to be some 
> > > discussion about one in the mailing archives (
> > > 
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> > cannot
> > > find any code.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Tim
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cap3.pm
Type: application/octet-stream
Size: 3374 bytes
Desc: Cap3.pm
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/0976a7d9/attachment.obj 

From G.Tzotzos at unido.org  Tue Jun 20 05:18:48 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 11:18:48 +0200
Subject: [Bioperl-l] Error message
Message-ID: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D

Wagramerstrasse 5
A-1400 Vienna
Austria

Email: g.tzotzos at unido.org


From G.Tzotzos at unido.org  Tue Jun 20 07:36:18 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 13:36:18 +0200
Subject: [Bioperl-l] Error message
Message-ID: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D
Vienna, Austria


From s-merchant at northwestern.edu  Tue Jun 20 10:41:33 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Tue, 20 Jun 2006 09:41:33 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <002701c69477$9ffa7c10$c2987ca5@pc13>

Hey Dave,
  I am doing a talk on dictyBase at the YAPC . I think it would be great to
meet for lunch. 

Cheers,
Sohel Merchant.

dictyBase
Northwestern University,
Chicago

>

>Just curious if any other BioPerlers will be at the YAPC conference in 

>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
Some of us from the WashU 

>GSC will be there, and it might be fun to meet some other BioPerl 

>people over lunch or something. If there's enough interest, I will 

>organize.

>

>By the way, if you're unfamiliar with the conference and are interested 

>in attending, I think registration is still open. The fee is low 

>($100).

>

>Dave

>

>

>--


From cain at cshl.edu  Tue Jun 20 12:03:26 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 20 Jun 2006 12:03:26 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1150819406.2585.27.camel@localhost.localdomain>

Hi Hilmar,

Of course you are right--I was under the influence of a perl module that
I work with that does something similar, but both of your solutions are
better.

I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
look this week.

As for next week, I plan on spending the day at NESCent on Wednesday
(though I haven't told Todd or Jeff that I am arriving early yet) just
to make sure all the details are in place.  I imagine I'll have a fair
amount of free time to hash this stuff out.  Anyone else who is in town
(that is, in Durham, NC, USA) is welcome to come draw on a white board
too. :-)

Scott


On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> You don't need a new method for this. Instead, support a -feature  
> argument.
> 
> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
> 
> This should work for any instance of Bio::SeqFeatureI. If it is a  
> B::SF::Annotated already it is obviously just a deep copy (if copy is  
> desired - could be another parameter). Otherwise more will be involved.
> 
> Alternatively, and possibly better, is to write a specialized  
> SeqFeatureI factory (that would implement  
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
> 
> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
> 		-type_ontology => $sequence_ontology,
> 		-source_ontology => $feature_source_ontology,
> 		-unflatten => 1);
> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
> 
> This is preferable because it separates business logic that isn't  
> necessarily related into defined units. I.e., the logic necessary to  
> convert an ordinary feature into a strongly typed one is different  
> from how to represent a strongly typed feature. IMHO anyway ...
> 
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
> started as the result of a discussion thread earlier this (or last?)  
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
> though not in concept.
> 
> Maybe we need to get together again and thrash out a strategy; or a  
> BOF at the GMOD meeting? I feel this does need a core group of people  
> who care, hash out a strategy that will also solve the backwards  
> compatibility problem with the current Bio::SeqFeatureI state-of- 
> limbo, and allow us to implement the decisions with a few people in a  
> concentrated effort. This will then also remove the only real large  
> stumbling block towards a 1.6 release.
> 
> Maybe we should think about a little pre-GMOD hackathon to clear up  
> this mess? Scott, you'll be there a day early? I'll be already back  
> and Jason I believe will still be in town, although he may have other  
> commitments already. Nonetheless, it shouldn't really take that much  
> but rather dedicated time, a whiteboard, and a few people who care  
> thrashing this out and then do it.
> 
> Thoughts?
> 
> 	-hilmar
> 
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
> 
> > Rob,
> >
> > I came to the same conclusion as well; I wrote my response as I was
> > heading out the door and while I was running errands, I realized the
> > right thing to do is to write a Bio::SeqFeature::Annotated method  
> > called
> > new_from_object, whose usage would be:
> >
> >   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> > ($my_BSFI, %args);
> >
> > where you would give it a Bio::SeqFeatureI compliant object and try to
> > create a BSFA like use suggested below.  You could allow passing in  
> > args
> > to control how different things are handled, like mapping non-SO types
> > to SO types.  I'll think about this over the weekend and let you  
> > know if
> > brilliance strikes me.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> >> Rather than cobble together some ad-hoc solution, I would be  
> >> interested
> >> in working on a good solution to this problem, because it seems like
> >> it's just going to get more common as more people start wanting to  
> >> write
> >> GFF3.  What about some code in whatever customarily makes these  
> >> objects
> >> (probably BSF::Annotated's new() method?) that could take another  
> >> type
> >> of Feature object and attempt to shoehorn its data into a new
> >> BSF::Annotated?  If it failed (because the type isn't in SO or
> >> whatever), it could throw() some informative error message.
> >>
> >> Then, people could write straightforward code something like:
> >>
> >> while(my $oldstylefeature = $features_in->next_feature) {
> >>     $oldstylefeature->primary_tag('something_that_is_in_so');
> >>     $oldstylefeature->something_else('some other something that  
> >> needs to
> >> be changed for compliance');
> >>     my $newfeature = Bio::SeqFeature::Annotated->new 
> >> ($oldstylefeature);
> >>     $gff3_out->write_feature($newfeature);
> >> }
> >>
> >> Does that sound like a good idea?  I'd be more than willing to  
> >> implement
> >> this, since I'm going to need to do this sort of thing with many more
> >> things than just RepeatMasker.
> >>
> >> Rob
> >>
> >> Scott Cain wrote:
> >>> Um, yeah, good question.  The reason I didn't answer you when you  
> >>> wrote
> >>> before is that I was hoping for divine inspiration for an answer  
> >>> (or for
> >>> somebody else to answer, which would have been really great :-)
> >>>
> >>> The short answer (and easy one for me to type) is that you will  
> >>> probably
> >>> need an ad hoc method to do it, which is the same thing I do when  
> >>> I need
> >>> to convert gff2 to gff3, to make sure the things I need mapped get
> >>> mapped the 'right' way (that is, the way I want them to go).  I  
> >>> don't
> >>> have any sample code that does this, but if you want to start  
> >>> working up
> >>> an ad hoc method, I will certainly try to help you as much as I can.
> >>>
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >>>
> >>>> So about that converting ye olde feature objects into
> >>>> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>>>
> >>>>
> >>>> Scott Cain wrote:
> >>>>
> >>>>> That's OK--You added a few items that should be escaped that  
> >>>>> weren't, so
> >>>>> I added those too.
> >>>>>
> >>>>> Thanks,
> >>>>> Scott
> >>>>>
> >>>>>
> >>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>>>
> >>>>>
> >>>>>> Woops, I should have said something about that.  I submitted  
> >>>>>> it before
> >>>>>> I saw that Scott had already done the escaping in CVS.
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Scott,
> >>>>>>>
> >>>>>>> Looks like Robert also submitted a bug report related to this  
> >>>>>>> as well=
> >>>>>>> ---------------------------------------------------------------- 
> >>>>>>> --------
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
> 
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/4b71554e/attachment-0001.bin 

From osborne1 at optonline.net  Tue Jun 20 12:13:51 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 12:13:51 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>
Message-ID: <C0BD96FF.8EC3%osborne1@optonline.net>

George,

The docs I'm reading say to use 'swiss', not 'swissprot' but I think there's
some other problem that may be specific to SwissProt. Can you retrieve from
GenBank? E.g.:

my $seq_object = get_sequence('genbank', 2);

Brian O.


On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:

> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
> 
> use Bio::Perl;
> use strict;
> use warnings;
> 
> my $seq_object = get_sequence('swissprot', "P09651");
> 
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
> 
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
> 
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
> 
> Thanks
> 
> George
> 
> 
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
> 
> 
> 
> George T. Tzotzos Ph.D
> Vienna, Austria
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Tzotzos at unido.org  Tue Jun 20 12:21:32 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 18:21:32 +0200
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
References: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <76750E11-3BD6-42EB-832D-3A12BC6B4BEE@unido.org>

Brian

Neither <swiss> nor <swissprot> work. However, your suggestion does  
work fine. So does Chandan's.  Many thanks to both.

Cheers

George


On 20 Jun 2006, at 18:13, Brian Osborne wrote:

> George,
>
> The docs I'm reading say to use 'swiss', not 'swissprot' but I  
> think there's
> some other problem that may be specific to SwissProt. Can you  
> retrieve from
> GenBank? E.g.:
>
> my $seq_object = get_sequence('genbank', 2);
>
> Brian O.
>
>
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>
>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>> following script to test the installation:
>>
>> use Bio::Perl;
>> use strict;
>> use warnings;
>>
>> my $seq_object = get_sequence('swissprot', "P09651");
>>
>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>
>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>> get the message below.
>>
>> Any help on the nature of the problem and how to overcome it would be
>> greatly appreciated.
>>
>> Thanks
>>
>> George
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>> swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>> WebDBSeqI.pm:153
>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>> STACK toplevel tut2.pl:5
>>
>>
>>
>> George T. Tzotzos Ph.D
>> Vienna, Austria
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From ClarkeW at AGR.GC.CA  Tue Jun 20 12:57:34 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 12:57:34 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>


The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
trace is 
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
STACK: /home/wayne/bin/mast_fasta.pl:59

And the full script is attached. 

However I would like to clarify that the actual sequence is not ACTG*,
this was a notation to represent that I had checked it to be sure that
it was a valid DNA sequence but due to confidentiality I cannot disclose
the actual sequence. I know this makes it more difficult and that I
perhaps should have been clearer about this originally. The $handle is a
Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
name  

'Clone_Name' => 'sJ1485'
        };
then the error message. I hope this is more helpful than my last
message.

Thanks, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mast_fasta.pl
Type: application/octet-stream
Size: 1998 bytes
Desc: mast_fasta.pl
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/53770697/attachment.obj 

From cjfields at uiuc.edu  Tue Jun 20 13:16:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:16:32 -0500
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <000c01c6948d$46e992d0$15327e82@pyrimidine>

Brian,

Brian,

Looks like EBI switched the url parameter for swissprot 'swall' to
'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
this and solves the issue.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Tuesday, June 20, 2006 11:14 AM
> To: George Tzotzos; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Error message
> 
> George,
> 
> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
> there's
> some other problem that may be specific to SwissProt. Can you retrieve
> from
> GenBank? E.g.:
> 
> my $seq_object = get_sequence('genbank', 2);
> 
> Brian O.
> 
> 
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
> 
> > I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> > following script to test the installation:
> >
> > use Bio::Perl;
> > use strict;
> > use warnings;
> >
> > my $seq_object = get_sequence('swissprot', "P09651");
> >
> > write_sequence(">roa1.fasta", 'fasta', $seq_object);
> >
> > I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> > get the message below.
> >
> > Any help on the nature of the problem and how to overcome it would be
> > greatly appreciated.
> >
> > Thanks
> >
> > George
> >
> >
> > ------------- EXCEPTION  -------------
> > MSG: swissprot stream with no ID. Not swissprot in my book
> > STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> > swiss.pm:179
> > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> > WebDBSeqI.pm:153
> > STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> > STACK toplevel tut2.pl:5
> >
> >
> >
> > George T. Tzotzos Ph.D
> > Vienna, Austria
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chandan.kr.singh at gmail.com  Tue Jun 20 10:46:01 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Tue, 20 Jun 2006 20:16:01 +0530
Subject: [Bioperl-l] Error message
In-Reply-To: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
References: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
Message-ID: <2d4f320606200746ja53cebs73923c510b535c44@mail.gmail.com>

Hi
It seems the 'swall' servertype on EBI no longer exists. May be this  has
already been reported  and debugged. I hope somebody throws light on it.

As for George, if u r in hurry u can use Bio::DB::SwissProt module directly.
Here is a typical code to do this

use strict ;
use warnings ;
use Bio::DB::SwissProt ;
use Bio::Perl ;
my $seq_obj = new Bio::DB::SwissProt('-servertype' => 'expasy' ,
'-hostlocation' => 'us') ;
my $seq = $seq_obj->get_Seq_by_id('ROA1_HUMAN') ;
write_sequence("> roa.sp" , 'fasta' , $seq) ;


See the module for any help .

cheers
Chandan


On 6/20/06, George Tzotzos <G.Tzotzos at unido.org> wrote:
>
> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
>
> use Bio::Perl;
> use strict;
> use warnings;
>
> my $seq_object = get_sequence('swissprot', "P09651");
>
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
>
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
>
> Thanks
>
> George
>
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
>
>
>
>
>
> George T. Tzotzos Ph.D
>
> Wagramerstrasse 5
> A-1400 Vienna
> Austria
>
> Email: g.tzotzos at unido.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From osborne1 at optonline.net  Tue Jun 20 13:33:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 13:33:07 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <000c01c6948d$46e992d0$15327e82@pyrimidine>
Message-ID: <C0BDA993.8ED3%osborne1@optonline.net>

Chris,

You beat me to it!

Brian O.


On 6/20/06 1:16 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Brian,
> 
> Brian,
> 
> Looks like EBI switched the url parameter for swissprot 'swall' to
> 'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
> this and solves the issue.
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Tuesday, June 20, 2006 11:14 AM
>> To: George Tzotzos; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Error message
>> 
>> George,
>> 
>> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
>> there's
>> some other problem that may be specific to SwissProt. Can you retrieve
>> from
>> GenBank? E.g.:
>> 
>> my $seq_object = get_sequence('genbank', 2);
>> 
>> Brian O.
>> 
>> 
>> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>> 
>>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>>> following script to test the installation:
>>> 
>>> use Bio::Perl;
>>> use strict;
>>> use warnings;
>>> 
>>> my $seq_object = get_sequence('swissprot', "P09651");
>>> 
>>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>> 
>>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>>> get the message below.
>>> 
>>> Any help on the nature of the problem and how to overcome it would be
>>> greatly appreciated.
>>> 
>>> Thanks
>>> 
>>> George
>>> 
>>> 
>>> ------------- EXCEPTION  -------------
>>> MSG: swissprot stream with no ID. Not swissprot in my book
>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>>> swiss.pm:179
>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>>> WebDBSeqI.pm:153
>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>>> STACK toplevel tut2.pl:5
>>> 
>>> 
>>> 
>>> George T. Tzotzos Ph.D
>>> Vienna, Austria
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue Jun 20 13:44:42 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 13:44:42 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A66@onncrxms4.agr.gc.ca>

Hi all, 

It seems that there is a newline character which is causing the problem,
this wasn't obvious at first due to the size of my shell window but that
is what is giving the mismatch error. Thanks to Chris and Torsten for
the help and for pointing me in the direction of validate_seq which was
helpful in finding the problem.

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun 20 13:55:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:55:28 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>
Message-ID: <000e01c69492$b74e0ec0$15327e82@pyrimidine>

> -----Original Message-----
> From: Clarke, Wayne [mailto:ClarkeW at AGR.GC.CA]
> Sent: Tuesday, June 20, 2006 11:58 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> 
> The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
> trace is
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
> STACK: /home/wayne/bin/mast_fasta.pl:59
> 
> And the full script is attached.

Have you tried a newer version of Bioperl to see if it fixed the issue?  v.
1.5.1 has been out for a bit now and it's pretty stable.

> However I would like to clarify that the actual sequence is not ACTG*,
> this was a notation to represent that I had checked it to be sure that
> it was a valid DNA sequence but due to confidentiality I cannot disclose
> the actual sequence. I know this makes it more difficult and that I
> perhaps should have been clearer about this originally. 

That's not a problem.  We run into that here a bit.  Example data is fine.

> The $handle is a
> Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
> name
> 
> 'Clone_Name' => 'sJ1485'
>         };
> then the error message. I hope this is more helpful than my last
> message.
> 
> Thanks, Wayne

Make sure you aren't using bioperl-specific methods when you run
Data::Dumper on your hash or the script crashes.

Okay, I was able to reproduce your error using PrimarySeq from v. 1.4 (BTW,
the error message changes if you use a newer version of Bioperl but it is
still there).  See if you can follow me here...

I used this script:
-------------------------
use Bio::Seq;
use Bio::SeqIO;
use Data::Dumper;

my $hash = {'Clone'     => 'test',
            'Sequence'  => 'ACTG*'};

my $seqout = Bio::SeqIO->new (-format   => 'fasta',
                              -fh       => \*STDOUT);

print Dumper($hash);

my $seq = Bio::Seq->new(-seq            => $hash->{'Sequence'},
                        -display_id     => $hash->{'Clone'});

$seqout->write_seq($seq);
-------------------------

And everything works fine, with this output:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => 'ACTG*'
        };
>test
ACTG*

Changing the anonymous hash to this causes the crash and error.

my $hash = {'Clone'     => 'test',
            'Sequence'  => ['ACTG*']};

Gets this:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => [
                          'ACTG*'
                        ]
        };

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 1
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [ARRAY(0x2354b0)] which does not look
healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\core/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:268
STACK: Bio::PrimarySeq::new C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new C:\Perl\src\bioperl\core/Bio/Seq.pm:497
STACK: C:\Perl\Scripts\seq-test\test.pl:17
-----------------------------------------------------------

It could be that the sequence data is stored in another complex data type
(object, hash) that's causing the problem.  Looks like you retrieve your
hash from another method ('my $hash = $sth->fetchrow_hashref()'); you might
want to check that method to make sure you're getting the right kind of data
into your hash.
 
Chris


From rmb32 at cornell.edu  Tue Jun 20 14:09:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 20 Jun 2006 12:09:38 -0600
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150819406.2585.27.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
Message-ID: <449839E2.5080402@cornell.edu>

Getting to know this code a little better, I notice a couple of little 
things: 

1.) my patch attached to bug 2026 draws unnecessary distinctions between 
feature types that use tags, and those that use annotations, since all 
features are now Bio::AnnotatableI's and the *_tags_* methods are 
implemented in AnnotatableI in terms of annotation objects now.  You 
guys should probably just ignore it, since from the sound of it you're 
going to be changing all of this around anyway.  Wish I could be there 
to help and learn more.

2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar 
accessors to use when translating Bio::Annotation::* objects to and from 
scalar tags.  Seems to me, this would be much better accomplished by 
using polymorphism of some sort, probably adding a multipurpose as_tag() 
accessor in Bio::AnnotationI and the objects that implement it, then 
using that in Bio::AnnotatableI instead of %tag2text.  Does this make 
sense, or am I misinterpreting something here?  Reason I've noticed this 
is because I've been wrestling with how to translate  
Bio::Annotation::Target objects to and from scalar tag values, since a 
Target is being represented as an ordered list of 3 or 4 scalar tags in 
old things that were designed to interoperate with gff2, and I can't 
figure out a nice way to do it using the rather inflexible %tag2text 
mechanism.

Sorry to be a pain, just wanted to get that in there before you guys 
start your jam session in Durham.

Rob

Scott Cain wrote:
> Hi Hilmar,
>
> Of course you are right--I was under the influence of a perl module that
> I work with that does something similar, but both of your solutions are
> better.
>
> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
> look this week.
>
> As for next week, I plan on spending the day at NESCent on Wednesday
> (though I haven't told Todd or Jeff that I am arriving early yet) just
> to make sure all the details are in place.  I imagine I'll have a fair
> amount of free time to hash this stuff out.  Anyone else who is in town
> (that is, in Durham, NC, USA) is welcome to come draw on a white board
> too. :-)
>
> Scott
>
>
> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>   
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> You don't need a new method for this. Instead, support a -feature  
>> argument.
>>
>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>
>> This should work for any instance of Bio::SeqFeatureI. If it is a  
>> B::SF::Annotated already it is obviously just a deep copy (if copy is  
>> desired - could be another parameter). Otherwise more will be involved.
>>
>> Alternatively, and possibly better, is to write a specialized  
>> SeqFeatureI factory (that would implement  
>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>
>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>> 		-type_ontology => $sequence_ontology,
>> 		-source_ontology => $feature_source_ontology,
>> 		-unflatten => 1);
>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>
>> This is preferable because it separates business logic that isn't  
>> necessarily related into defined units. I.e., the logic necessary to  
>> convert an ordinary feature into a strongly typed one is different  
>> from how to represent a strongly typed feature. IMHO anyway ...
>>
>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
>> started as the result of a discussion thread earlier this (or last?)  
>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
>> though not in concept.
>>
>> Maybe we need to get together again and thrash out a strategy; or a  
>> BOF at the GMOD meeting? I feel this does need a core group of people  
>> who care, hash out a strategy that will also solve the backwards  
>> compatibility problem with the current Bio::SeqFeatureI state-of- 
>> limbo, and allow us to implement the decisions with a few people in a  
>> concentrated effort. This will then also remove the only real large  
>> stumbling block towards a 1.6 release.
>>
>> Maybe we should think about a little pre-GMOD hackathon to clear up  
>> this mess? Scott, you'll be there a day early? I'll be already back  
>> and Jason I believe will still be in town, although he may have other  
>> commitments already. Nonetheless, it shouldn't really take that much  
>> but rather dedicated time, a whiteboard, and a few people who care  
>> thrashing this out and then do it.
>>
>> Thoughts?
>>
>> 	-hilmar
>>
>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>
>>     
>>> Rob,
>>>
>>> I came to the same conclusion as well; I wrote my response as I was
>>> heading out the door and while I was running errands, I realized the
>>> right thing to do is to write a Bio::SeqFeature::Annotated method  
>>> called
>>> new_from_object, whose usage would be:
>>>
>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
>>> ($my_BSFI, %args);
>>>
>>> where you would give it a Bio::SeqFeatureI compliant object and try to
>>> create a BSFA like use suggested below.  You could allow passing in  
>>> args
>>> to control how different things are handled, like mapping non-SO types
>>> to SO types.  I'll think about this over the weekend and let you  
>>> know if
>>> brilliance strikes me.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>       
>>>> Rather than cobble together some ad-hoc solution, I would be  
>>>> interested
>>>> in working on a good solution to this problem, because it seems like
>>>> it's just going to get more common as more people start wanting to  
>>>> write
>>>> GFF3.  What about some code in whatever customarily makes these  
>>>> objects
>>>> (probably BSF::Annotated's new() method?) that could take another  
>>>> type
>>>> of Feature object and attempt to shoehorn its data into a new
>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>> whatever), it could throw() some informative error message.
>>>>
>>>> Then, people could write straightforward code something like:
>>>>
>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>     $oldstylefeature->something_else('some other something that  
>>>> needs to
>>>> be changed for compliance');
>>>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>>>> ($oldstylefeature);
>>>>     $gff3_out->write_feature($newfeature);
>>>> }
>>>>
>>>> Does that sound like a good idea?  I'd be more than willing to  
>>>> implement
>>>> this, since I'm going to need to do this sort of thing with many more
>>>> things than just RepeatMasker.
>>>>
>>>> Rob
>>>>
>>>> Scott Cain wrote:
>>>>         
>>>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>>>> wrote
>>>>> before is that I was hoping for divine inspiration for an answer  
>>>>> (or for
>>>>> somebody else to answer, which would have been really great :-)
>>>>>
>>>>> The short answer (and easy one for me to type) is that you will  
>>>>> probably
>>>>> need an ad hoc method to do it, which is the same thing I do when  
>>>>> I need
>>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>>>> don't
>>>>> have any sample code that does this, but if you want to start  
>>>>> working up
>>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>
>>>>>           
>>>>>> So about that converting ye olde feature objects into
>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>
>>>>>>
>>>>>> Scott Cain wrote:
>>>>>>
>>>>>>             
>>>>>>> That's OK--You added a few items that should be escaped that  
>>>>>>> weren't, so
>>>>>>> I added those too.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Scott
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>>>> it before
>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Scott,
>>>>>>>>>
>>>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>>>> as well=
>>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>>> --------
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>                   
>>> -- 
>>> ---------------------------------------------------------------------- 
>>> --
>>> Scott Cain, Ph. D.                                          
>>> cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> - --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (Darwin)
>>
>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>> ImoAXD/jrbF0gXzSr2CY4tQ=
>> =XfDq
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From hlapp at gmx.net  Tue Jun 20 14:24:45 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 20 Jun 2006 14:24:45 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449839E2.5080402@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
	<449839E2.5080402@cornell.edu>
Message-ID: <A3627468-CCA4-41FD-8C09-F5E1BFCE67D0@gmx.net>

Yes, this is the sore problem area. AnnotatableI used to have only a  
single method (annotation()), the *_tag_* methods are new since 1.5  
(and truly a developer release feature - don't rely on them staying).

Likewise, the tag2text is an utterly ugly artifact (after all, this  
is an interface) rooted in the above addition. If we can't manage to  
remove it I'll remove my name from that module ;)

	-hilmar

On Jun 20, 2006, at 2:09 PM, Robert Buels wrote:

> Getting to know this code a little better, I notice a couple of little
> things:
>
> 1.) my patch attached to bug 2026 draws unnecessary distinctions  
> between
> feature types that use tags, and those that use annotations, since all
> features are now Bio::AnnotatableI's and the *_tags_* methods are
> implemented in AnnotatableI in terms of annotation objects now.  You
> guys should probably just ignore it, since from the sound of it you're
> going to be changing all of this around anyway.  Wish I could be there
> to help and learn more.
>
> 2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar
> accessors to use when translating Bio::Annotation::* objects to and  
> from
> scalar tags.  Seems to me, this would be much better accomplished by
> using polymorphism of some sort, probably adding a multipurpose  
> as_tag()
> accessor in Bio::AnnotationI and the objects that implement it, then
> using that in Bio::AnnotatableI instead of %tag2text.  Does this make
> sense, or am I misinterpreting something here?  Reason I've noticed  
> this
> is because I've been wrestling with how to translate
> Bio::Annotation::Target objects to and from scalar tag values, since a
> Target is being represented as an ordered list of 3 or 4 scalar  
> tags in
> old things that were designed to interoperate with gff2, and I can't
> figure out a nice way to do it using the rather inflexible %tag2text
> mechanism.
>
> Sorry to be a pain, just wanted to get that in there before you guys
> start your jam session in Durham.
>
> Rob
>
> Scott Cain wrote:
>> Hi Hilmar,
>>
>> Of course you are right--I was under the influence of a perl  
>> module that
>> I work with that does something similar, but both of your  
>> solutions are
>> better.
>>
>> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
>> look this week.
>>
>> As for next week, I plan on spending the day at NESCent on Wednesday
>> (though I haven't told Todd or Jeff that I am arriving early yet)  
>> just
>> to make sure all the details are in place.  I imagine I'll have a  
>> fair
>> amount of free time to hash this stuff out.  Anyone else who is in  
>> town
>> (that is, in Durham, NC, USA) is welcome to come draw on a white  
>> board
>> too. :-)
>>
>> Scott
>>
>>
>> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> You don't need a new method for this. Instead, support a -feature
>>> argument.
>>>
>>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>>
>>> This should work for any instance of Bio::SeqFeatureI. If it is a
>>> B::SF::Annotated already it is obviously just a deep copy (if  
>>> copy is
>>> desired - could be another parameter). Otherwise more will be  
>>> involved.
>>>
>>> Alternatively, and possibly better, is to write a specialized
>>> SeqFeatureI factory (that would implement
>>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>>
>>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>>> 		-type_ontology => $sequence_ontology,
>>> 		-source_ontology => $feature_source_ontology,
>>> 		-unflatten => 1);
>>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>>
>>> This is preferable because it separates business logic that isn't
>>> necessarily related into defined units. I.e., the logic necessary to
>>> convert an ordinary feature into a strongly typed one is different
>>> from how to represent a strongly typed feature. IMHO anyway ...
>>>
>>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan
>>> started as the result of a discussion thread earlier this (or last?)
>>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,
>>> though not in concept.
>>>
>>> Maybe we need to get together again and thrash out a strategy; or a
>>> BOF at the GMOD meeting? I feel this does need a core group of  
>>> people
>>> who care, hash out a strategy that will also solve the backwards
>>> compatibility problem with the current Bio::SeqFeatureI state-of-
>>> limbo, and allow us to implement the decisions with a few people  
>>> in a
>>> concentrated effort. This will then also remove the only real large
>>> stumbling block towards a 1.6 release.
>>>
>>> Maybe we should think about a little pre-GMOD hackathon to clear up
>>> this mess? Scott, you'll be there a day early? I'll be already back
>>> and Jason I believe will still be in town, although he may have  
>>> other
>>> commitments already. Nonetheless, it shouldn't really take that much
>>> but rather dedicated time, a whiteboard, and a few people who care
>>> thrashing this out and then do it.
>>>
>>> Thoughts?
>>>
>>> 	-hilmar
>>>
>>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>>
>>>
>>>> Rob,
>>>>
>>>> I came to the same conclusion as well; I wrote my response as I was
>>>> heading out the door and while I was running errands, I realized  
>>>> the
>>>> right thing to do is to write a Bio::SeqFeature::Annotated method
>>>> called
>>>> new_from_object, whose usage would be:
>>>>
>>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object
>>>> ($my_BSFI, %args);
>>>>
>>>> where you would give it a Bio::SeqFeatureI compliant object and  
>>>> try to
>>>> create a BSFA like use suggested below.  You could allow passing in
>>>> args
>>>> to control how different things are handled, like mapping non-SO  
>>>> types
>>>> to SO types.  I'll think about this over the weekend and let you
>>>> know if
>>>> brilliance strikes me.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>>
>>>>> Rather than cobble together some ad-hoc solution, I would be
>>>>> interested
>>>>> in working on a good solution to this problem, because it seems  
>>>>> like
>>>>> it's just going to get more common as more people start wanting to
>>>>> write
>>>>> GFF3.  What about some code in whatever customarily makes these
>>>>> objects
>>>>> (probably BSF::Annotated's new() method?) that could take another
>>>>> type
>>>>> of Feature object and attempt to shoehorn its data into a new
>>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>>> whatever), it could throw() some informative error message.
>>>>>
>>>>> Then, people could write straightforward code something like:
>>>>>
>>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>>     $oldstylefeature->something_else('some other something that
>>>>> needs to
>>>>> be changed for compliance');
>>>>>     my $newfeature = Bio::SeqFeature::Annotated->new
>>>>> ($oldstylefeature);
>>>>>     $gff3_out->write_feature($newfeature);
>>>>> }
>>>>>
>>>>> Does that sound like a good idea?  I'd be more than willing to
>>>>> implement
>>>>> this, since I'm going to need to do this sort of thing with  
>>>>> many more
>>>>> things than just RepeatMasker.
>>>>>
>>>>> Rob
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> Um, yeah, good question.  The reason I didn't answer you when you
>>>>>> wrote
>>>>>> before is that I was hoping for divine inspiration for an answer
>>>>>> (or for
>>>>>> somebody else to answer, which would have been really great :-)
>>>>>>
>>>>>> The short answer (and easy one for me to type) is that you will
>>>>>> probably
>>>>>> need an ad hoc method to do it, which is the same thing I do when
>>>>>> I need
>>>>>> to convert gff2 to gff3, to make sure the things I need mapped  
>>>>>> get
>>>>>> mapped the 'right' way (that is, the way I want them to go).  I
>>>>>> don't
>>>>>> have any sample code that does this, but if you want to start
>>>>>> working up
>>>>>> an ad hoc method, I will certainly try to help you as much as  
>>>>>> I can.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> So about that converting ye olde feature objects into
>>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>>
>>>>>>>
>>>>>>> Scott Cain wrote:
>>>>>>>
>>>>>>>
>>>>>>>> That's OK--You added a few items that should be escaped that
>>>>>>>> weren't, so
>>>>>>>> I added those too.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Woops, I should have said something about that.  I submitted
>>>>>>>>> it before
>>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>>
>>>>>>>>> Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Scott,
>>>>>>>>>>
>>>>>>>>>> Looks like Robert also submitted a bug report related to this
>>>>>>>>>> as well=
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> ---
>>>>>>>>>> --------
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------- 
>>>> ---
>>>> --
>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> - --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (Darwin)
>>>
>>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>>> ImoAXD/jrbF0gXzSr2CY4tQ=
>>> =XfDq
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> -------------------------------------------------------------------- 
>>> ----
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 20 16:22:45 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Jun 2006 21:22:45 +0100
Subject: [Bioperl-l] Bio::Map changes
Message-ID: <44985915.8010607@sendu.me.uk>

Some initial changes have been made to some modules in Bio::Map to allow 
Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
(see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)

Further changes are needed in some remaining Bio::Map modules for this 
addition to be complete (a number of Bio::Map related tests in the test 
suite currently fail), notably Bio::Map::Cyto* since they had 
implemented their own Range-related features.

I propose bringing all Bio::Map into line so it behaves with and makes 
good use of the RangeI nature of Position. Beyond this initial change I 
want to add relative positioning and more, but I'll describe that in a 
future post to this thread.

Can anyone see any issues with ranged positions (it's done in a backward 
compatible way)? Do any developers want to maintain control of a 
Bio::Map module or shall I just dive in?


From cjfields at uiuc.edu  Tue Jun 20 23:50:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 22:50:55 -0500
Subject: [Bioperl-l] EUtilities interface
Message-ID: <002301c694e5$e5f3a750$15327e82@pyrimidine>

I'm working on a new eutilities interface which I hope to commit by late
summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
generic web database interface, which I call Bio::DB::WebDBI, and the
EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
NCBI for any information available via Entrez Utilities (i.e. taxonomy,
pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
info like Bio::DB::WebDBSeqI.  

My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
Does anyone think this will be an issue?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From bix at sendu.me.uk  Wed Jun 21 04:20:37 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 09:20:37 +0100
Subject: [Bioperl-l] Bio::RangeI intersection proposal
Message-ID: <44990155.6050501@sendu.me.uk>

Bio::Map::PositionI (in bioperl-live) needs intersections of a list of 
ranges. It inherits from Bio::RangeI but unlike RangeI's union, 
intersection does not take a list. PositionI currently calls 
intersection repeatedly to handle a list.

If there is no particular reason for this limitation, I propose making 
RangeI intersection handle lists natively. This won't do any harm to 
existing code at the time of the change, but its possible that someone 
has written a module that implements RangeI but overrides intersection 
(without making it accept a list), so that future code written that 
expects a RangeI to handle lists will break when getting a RangeI from 
that module.

So the question is, has anyone overridden intersection in RangeI? Is the 
small risk of possible breakage compensated by the benefit of 
intersections of a list of ranges (which is surely useful in lots of 
situations, not just for PositionI)?

I'm tempted to go ahead with this unless there are objections.

From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 03:19:12 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 09:19:12 +0200
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <002701c69477$9ffa7c10$c2987ca5@pc13>
References: <002701c69477$9ffa7c10$c2987ca5@pc13>
Message-ID: <4498F2F0.7010203@biomax.com>

Sohel Merchant wrote:

> 
>>Just curious if any other BioPerlers will be at the YAPC conference in 
> 
> 
>>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).

Not in chicago, but yesterday I got the OK from Biomax management to go 
the YAPC::Europe, http://www.birmingham2006.com/. So in the end of 
August I'll be in Birmingham. Yeah!

Is anybody interested in writing parsers for Perl 6 there?

CU, Bernhard


-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************

From cjfields at uiuc.edu  Wed Jun 21 11:08:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:08:28 -0500
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <4498F2F0.7010203@biomax.com>
Message-ID: <000301c69544$8d537710$15327e82@pyrimidine>

Speaking of Perl6, there was interest here at one point in getting a
bioperl-experimental going, which at this point in the game should involve
Perl6.  If there were enough interest in it we could probably get it set up
via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
experimental modules in some way to prevent confusion (bioperl6-live???),
though I'm not up to speed Perl6-wise so I'm not sure about namespace
collisions and so on.

bioperl-experimental would be, like the name implies, a sort of testing
ground for ideas (good and bad).  It seemed like it was going to take off a
few years ago but it lost steam, I'm guess.

As for your parsers, would you build them from the ground up (i.e. from
Bio::Root::Root on up)?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernhard Schmalhofer
> Sent: Wednesday, June 21, 2006 2:19 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Sohel Merchant
> Subject: Re: [Bioperl-l] YAPC anyone?
> 
> Sohel Merchant wrote:
> 
> >
> >>Just curious if any other BioPerlers will be at the YAPC conference in
> >
> >
> >>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
> 
> Not in chicago, but yesterday I got the OK from Biomax management to go
> the YAPC::Europe, http://www.birmingham2006.com/. So in the end of
> August I'll be in Birmingham. Yeah!
> 
> Is anybody interested in writing parsers for Perl 6 there?
> 
> CU, Bernhard
> 
> 
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 11:16:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:16:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <44990155.6050501@sendu.me.uk>
Message-ID: <000401c69545$a4a3ad30$15327e82@pyrimidine>

I personally have no objections as long as it doesn't break API.  Don't know
how the senior guys feel (Jason, Brian, Heikki, Hilmar...); I'm not a user
of Bio::Map modules myself.

Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, June 21, 2006 3:21 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::RangeI intersection proposal
> 
> Bio::Map::PositionI (in bioperl-live) needs intersections of a list of
> ranges. It inherits from Bio::RangeI but unlike RangeI's union,
> intersection does not take a list. PositionI currently calls
> intersection repeatedly to handle a list.
> 
> If there is no particular reason for this limitation, I propose making
> RangeI intersection handle lists natively. This won't do any harm to
> existing code at the time of the change, but its possible that someone
> has written a module that implements RangeI but overrides intersection
> (without making it accept a list), so that future code written that
> expects a RangeI to handle lists will break when getting a RangeI from
> that module.
> 
> So the question is, has anyone overridden intersection in RangeI? Is the
> small risk of possible breakage compensated by the benefit of
> intersections of a list of ranges (which is surely useful in lots of
> situations, not just for PositionI)?
> 
> I'm tempted to go ahead with this unless there are objections.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed Jun 21 11:24:47 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:24:47 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000401c69545$a4a3ad30$15327e82@pyrimidine>
References: <000401c69545$a4a3ad30$15327e82@pyrimidine>
Message-ID: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>


On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:

> Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Actually, it doesn't go by age but by the amount of hair you still  
have. ;)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 11:28:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:28:58 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>
Message-ID: <000501c69547$6a9f28b0$15327e82@pyrimidine>

Then I'm really a senior guy...

; {

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:25 AM
> To: Chris Fields
> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> 
> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> 
> > Actually, sounds weird to have me say "senior guys"; I'm 35 years old!
> 
> Actually, it doesn't go by age but by the amount of hair you still
> have. ;)
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed Jun 21 11:53:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:53:08 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000501c69547$6a9f28b0$15327e82@pyrimidine>
References: <000501c69547$6a9f28b0$15327e82@pyrimidine>
Message-ID: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>

We could run a Mr Seniority competition at BOSC with the attendees  
judging who got the weirdest looking hair loss. You'd take the  
challenge? The judging panel would need to be gender-mixed though.

On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:

> Then I'm really a senior guy...
>
> ; {
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, June 21, 2006 10:25 AM
>> To: Chris Fields
>> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
>>
>>
>> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
>>
>>> Actually, sounds weird to have me say "senior guys"; I'm 35 years  
>>> old!
>>
>> Actually, it doesn't go by age but by the amount of hair you still
>> have. ;)
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 12:08:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 11:08:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>
Message-ID: <000301c6954c$e89c7a60$15327e82@pyrimidine>

I'd love to be at BOSC but I can't go (finishing up my postdoc this year,
which is probably the primary cause of my hair loss).  Would the judges
accept a recent picture?

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:53 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> We could run a Mr Seniority competition at BOSC with the attendees
> judging who got the weirdest looking hair loss. You'd take the
> challenge? The judging panel would need to be gender-mixed though.
> 
> On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:
> 
> > Then I'm really a senior guy...
> >
> > ; {
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Wednesday, June 21, 2006 10:25 AM
> >> To: Chris Fields
> >> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> >>
> >>
> >> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> >>
> >>> Actually, sounds weird to have me say "senior guys"; I'm 35 years
> >>> old!
> >>
> >> Actually, it doesn't go by age but by the amount of hair you still
> >> have. ;)
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 12:25:50 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 18:25:50 +0200
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <000301c69544$8d537710$15327e82@pyrimidine>
References: <000301c69544$8d537710$15327e82@pyrimidine>
Message-ID: <4499730E.8090800@biomax.com>

Chris Fields wrote:
> Speaking of Perl6, there was interest here at one point in getting a
> bioperl-experimental going, which at this point in the game should involve
> Perl6.  If there were enough interest in it we could probably get it set up
> via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
> experimental modules in some way to prevent confusion (bioperl6-live???),
> though I'm not up to speed Perl6-wise so I'm not sure about namespace
> collisions and so on.

As far as I understood it, the plan is to have a very smooth migration 
path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When 
new stuff is coming along, or when refactoring is done, you drop in

   use v6;

or

   use v6-pugs;

and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm 
or Audrey Tangs presentation at the Nordic Perl Workshop: 
http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
So I would argue against having a completely seperate Perl6 experimental
repository.

> bioperl-experimental would be, like the name implies, a sort of testing
> ground for ideas (good and bad).  It seemed like it was going to take off a
> few years ago but it lost steam, I'm guess.
> 
> As for your parsers, would you build them from the ground up (i.e. from
> Bio::Root::Root on up)?

I'm just a casual Bio::Perl user and never hacked on any internals. So I 
don't know whether the current Bio::Perl framework is a good fit.

The idea that is floating in my mind is to make a showcase of Perl 6 
parsing, by tackling the various sequences and alignment formats.
So this would involve shopping around for the cleanest parser 
implementations and porting that to Perl6.

Which repository to use is more a question of social engineering.
Are there more Pugs/Perl6 hackers interested in cool biological hacking,
or biologist aching to try out Perl6?

Regards,
   Bernhard Schmalhofer

-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************

From cjfields at uiuc.edu  Wed Jun 21 14:01:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 13:01:02 -0500
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <4499730E.8090800@biomax.com>
Message-ID: <000b01c6955c$ad0e6750$15327e82@pyrimidine>

> Chris Fields wrote:
> > Speaking of Perl6, there was interest here at one point in getting a
> > bioperl-experimental going, which at this point in the game should
> involve
> > Perl6.  If there were enough interest in it we could probably get it set
> up
> > via CVS and moving along.  We might need to split the Perl6 stuff from
> Perl5
> > experimental modules in some way to prevent confusion (bioperl6-
> live???),
> > though I'm not up to speed Perl6-wise so I'm not sure about namespace
> > collisions and so on.
> 
> As far as I understood it, the plan is to have a very smooth migration
> path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When
> new stuff is coming along, or when refactoring is done, you drop in
> 
>    use v6;
> 
> or
> 
>    use v6-pugs;
> 
> and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm
> or Audrey Tangs presentation at the Nordic Perl Workshop:
> http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
> So I would argue against having a completely seperate Perl6 experimental
> repository.

Makes sense.  I know Pugs is the Perl6 implementation in Haskell but I also
know eventually Parrot will be taking over as the compiler (hopefully).
Perl6 is pretty exciting since it's built to support OOP from the ground up,
unlike the bolted-on OOP for Perl5, and has several other features that make
it very useful (the new way regexes are handled).  I just haven't had time
to play around with it seriously enough.  I may try using Pugs a bit more,
though.

So, as long as Perl5-Perl6 work together a separate repository wouldn't be
necessary.  

> > bioperl-experimental would be, like the name implies, a sort of testing
> > ground for ideas (good and bad).  It seemed like it was going to take
> off a
> > few years ago but it lost steam, I'm guess.
> >
> > As for your parsers, would you build them from the ground up (i.e. from
> > Bio::Root::Root on up)?
>
> I'm just a casual Bio::Perl user and never hacked on any internals. So I
> don't know whether the current Bio::Perl framework is a good fit.
> 
> The idea that is floating in my mind is to make a showcase of Perl 6
> parsing, by tackling the various sequences and alignment formats.
> So this would involve shopping around for the cleanest parser
> implementations and porting that to Perl6.
> 
> Which repository to use is more a question of social engineering.
> Are there more Pugs/Perl6 hackers interested in cool biological hacking,
> or biologist aching to try out Perl6?

I suppose the best way is initially to use a non-bioperl approach using
Perl6, then try working the parsers in using 'use v6-pugs;'.  Bioperl is
heavily object-oriented so the code would probably need to be refactored
from the bottom up (or top down, depending on your view) to fit Perl6.
Having a perl5->perl6 translator helps, though.  And, again, having Perl5
and Perl6 work together helps as well.

Chris

> Regards,
>    Bernhard Schmalhofer
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************


From dwaner at scitegic.com  Wed Jun 21 14:14:00 2006
From: dwaner at scitegic.com (dwaner at scitegic.com)
Date: Wed, 21 Jun 2006 11:14:00 -0700
Subject: [Bioperl-l] EMBL release 87 format changes.
Message-ID: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>

With release 87 of EMBL (June 19th, 2006), there have been some minor 
changes to the flat file record format. In particular, the SV (sequence 
version) tag has been moved from its own line to a field in the ID line. 
See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.

Is somone already working on updating the SeqIO::embl parser, or should I 
volunteer?

- David

From bix at sendu.me.uk  Wed Jun 21 14:23:28 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 19:23:28 +0100
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <002301c694e5$e5f3a750$15327e82@pyrimidine>
References: <002301c694e5$e5f3a750$15327e82@pyrimidine>
Message-ID: <44998EA0.1010406@sendu.me.uk>

Chris Fields wrote:
> I'm working on a new eutilities interface which I hope to commit by late
> summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> generic web database interface, which I call Bio::DB::WebDBI, and the
> EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
> NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
> info like Bio::DB::WebDBSeqI.  
> 
> My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
> Does anyone think this will be an issue?

Well, I don't. Sounds good to me. What's the intended relationship 
between WebDBI and EUtilitiesI? Would your work end up in the removal of 
direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just 
convert the code that gets the XML to a one line statement or so?

From cjfields at uiuc.edu  Wed Jun 21 15:00:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 14:00:02 -0500
Subject: [Bioperl-l] EMBL release 87 format changes.
In-Reply-To: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>
Message-ID: <000c01c69564$e68b39b0$15327e82@pyrimidine>

That would be great!  Post a patch/fix via bugzilla:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

and we can add it and test it out.  Or if you have CVS access you can do it
yourself.  Not sure who's taking care of SeqIO::embl at the moment....

Added bit : you'll need to update both next_seq and write_seq.  next_seq
should probably handle both old and new EMBL format and write_seq should
only write new format (unless someone else disagrees???)

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of dwaner at scitegic.com
> Sent: Wednesday, June 21, 2006 1:14 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] EMBL release 87 format changes.
> 
> With release 87 of EMBL (June 19th, 2006), there have been some minor
> changes to the flat file record format. In particular, the SV (sequence
> version) tag has been moved from its own line to a field in the ID line.
> See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.
> 
> Is somone already working on updating the SeqIO::embl parser, or should I
> volunteer?
> 
> - David
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 17:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 16:16:38 -0500
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <44998EA0.1010406@sendu.me.uk>
Message-ID: <001b01c69577$fc7068f0$15327e82@pyrimidine>

> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Wednesday, June 21, 2006 1:23 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] EUtilities interface
> 
> Chris Fields wrote:
> > I'm working on a new eutilities interface which I hope to commit by late
> > summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> > generic web database interface, which I call Bio::DB::WebDBI, and the
> > EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can
> query
> > NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> > pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-
> only
> > info like Bio::DB::WebDBSeqI.
> >
> > My only concern is confusion over names, particularly WebDBI vs.
> WebDBSeqI.
> > Does anyone think this will be an issue?
> 
> Well, I don't. Sounds good to me. What's the intended relationship
> between WebDBI and EUtilitiesI? Would your work end up in the removal of
> direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just
> convert the code that gets the XML to a one line statement or so?

Well, right now all it does is use URI to build queries, submit them to
Entrez Utilities, then grab the response; I've been hacking at it on and off
for a few months now.  It needs some error handling and added methods
(mainly for proxies and handling WebEnv/query_key), though once I have it in
decent enough shape I'll go ahead and add it to CVS.  

Theoretically once the response is returned it can be parsed like any stream
(see WebDBSeqI/NCBIHelper for an idea of how sequences are parsed and
returned using SeqIO).  This should work as long as there is an appropriate
class to handle the data stream and the appropriate 'plugin' to parse the
data into objects; i.e. dbSNP can be handled by ClusterIO::dbSNP, sequences
by SeqIO::genbank/fasta, pubmed by Bio::Biblio::IO::pubmedxml, and so on.
If you don't have an object or want the raw data stream, you could submit a
request using the various eutility (efetch, epost, esearch) and save as raw
format to an output file or STDOUT.  

Here's a rough diagram:

                      |------------------->Bio::DB::DBFetch (EBI
interface)----->plugins for Bio* classes
Bio::Root::Root       |
LWP::UserAgent ------Bio::DB::WebDBI------>Bio::DB::EUtilitiesI (NCBI
interface)----->plugins for Bio* classes
                      |
                      |------------------->others?

You probably don't need a Bio::*IO::plugin for each type; tax data in
Bioperl seems to primarily utilizes the NCBI Tax database, so
Bio::DB::Taxonomy::entrez shouldn't be too hard to adapt to act as a plugin.
Bio::DB::Taxonomy::entrez uses XML::Twig to parse everything into
Bio::Taxonomy::Node objects and is able to retrieve single and multiple ID's
using the same method, though I would probably use XML::SAX instead.  If I
remember correctly there were issues with Bio::DB::Taxonomy that you brought
up...

Chris


From bix at sendu.me.uk  Thu Jun 22 09:28:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Jun 2006 14:28:25 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <44985915.8010607@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk>
Message-ID: <449A9AF9.2000305@sendu.me.uk>

Sendu Bala wrote:
> Some initial changes have been made to some modules in Bio::Map to allow 
> Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
> (see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)
> 
> Further changes are needed in some remaining Bio::Map modules for this 
> addition to be complete

Range is now done.

The next step is to tidy up all of Bio::Map*, which involves a major 
reimplementation of the whole system (but with no significant API 
change). Basically, the current system is a awkward mix of older 'marker 
has a single position on a map' and new 'markers have multiple positions 
on multiple maps'. This gives us strange things like SimpleMap's 
add_element method which adds a reference to the element to the map 
without the element itself knowing it is now on the map (because it is 
Position that defines what maps an element is on).

The reimplementation will make Position central to the model, allowing 
for lots of other things to work properly without anything becoming 
inconsistent (as is currently the case).

The general tidy up will involve redoing and perhaps even removing 
things. For instance, OrderedPositionWithDistance has never worked so 
will be deleted (with OrderedPosition gaining the distance functionality 
its docs says it already has).

But now is the time to speak up and change my mind if necessary!

From golharam at umdnj.edu  Thu Jun 22 17:05:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 22 Jun 2006 17:05:00 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package)
Message-ID: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>

Hi all,

I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
baseml in the PAML package to measure the distances of some non-coding
regions.  

I started with the coding regions, and used the script
bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
something similar for non-coding regions.  However, when I call
Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
meaning matrix was never defined.  

I wanted to find out if anyone on here has done this before or knows a
way to measure substitution frequencies of non-coding regions with the
PAML package.  The documentation with PAML is sparse so I'm not sure how
to interpret its output directly - that's why I'm using Bioperl.  

Hopefully someone can help me before I start digging into the
code...Thanks.

Ryan


From n.haigh at sheffield.ac.uk  Fri Jun 23 02:43:48 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 23 Jun 2006 07:43:48 +0100
Subject: [Bioperl-l] CVS Export
Message-ID: <000001c69690$61afb540$b07f6f58@nathan243dd61f>

I may have asked this previously, but I can?t find the answer to my question
anywhere so I?ll have to ask it again ? sorry.

Is it possible to export files/directories from cvs that have changed
between to tags/branches/head? Specifically, I?d like to export (as I don?t
want the cvs administrative directories) files that have been added to
Bioperl since the 1.4 release.

Cheers
Nath

----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh MPharmacol. Ph.D.
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????????????????????????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 
----------------------------------------------------------------------------
------


From cjfields at uiuc.edu  Fri Jun 23 10:58:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Jun 2006 09:58:24 -0500
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c696d5$7da6c640$15327e82@pyrimidine>

***sounds of crickets***

Ryan,

It's a pretty good possibility that Jason and the rest are on the road to
conferences and such.  There's been mention of a Durham, NC meeting and, of
course, YAPC is happening soon as well.  I wish I could help but I know
diddly about PAML besides the HOWTO on the wiki (though I may be using it
myself soon).  Sorry, you may have to be a bit patient for a more productive
response.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Thursday, June 22, 2006 4:05 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
> package)
> 
> Hi all,
> 
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
> 
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
> 
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not sure how
> to interpret its output directly - that's why I'm using Bioperl.
> 
> Hopefully someone can help me before I start digging into the
> code...Thanks.
> 
> Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Fri Jun 23 14:27:19 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Jun 2006 13:27:19 -0500
Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output
Message-ID: <CED81D34E37D5043A1211565277A51E50563FC85@exchkc02.stowers-institute.org>

Guy,

I've just downloaded and installed your latest 1.1.0 version of
exonerate but unfortunately did not find any mention in the ChangeLog of
addressing this bug, though I still see in the TODO:

    o Should GFF show all coordinates on the +ve strand? (jason_p2g eg)

I was half expecting to see this fixed in this version based on this old
thread.  

Can you please confirm that it has not yet been addressed, and accept my
request that you continue to keep this change on your list for future
versions...

Also, might you elaborate on this entry from the ChangeLog.  I don't see
it mentioned in the manpage.

    o Added %tcs etc to --ryo for dumping coding sequences 

Thanks,

Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at portal.open-bio.org 
>[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Guy Slater
>Sent: Friday, September 02, 2005 11:52 AM
>To: Cook, Malcolm
>Cc: bioperl-l
>Subject: RE: [Bioperl-l] methods, etc. for Bio::SearchIO on 
>exonerate output
>
>On Fri, 2 Sep 2005, Cook, Malcolm wrote:
>
>> Hmmmm - I'd better get some clarification from Guy too.  
>>  
>> Guy, if you don't mind reading the thread below and chiming in on our
>> discussion of interpreting the output of your excellent exonerate
>> program:
>>  
>> The sections of the manpage (
>> <http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html>
>> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear
>> relevant are these 2 excerpts:
>>  
>>  1) When an alignment is reported on the reverse complement of a
>> sequence, the coordinates are simply given on the reverse complement
>> copy of the sequence. Hence positions on the sequences are never
>> negative. Generally, the forward strand is indicated by '+', 
>the reverse
>> strand by '-', and an unknown or not-applicable strand (as 
>in the case
>> of a protein sequence) is indicated by '.' "
>>  
>> 2)  --forwardcoordinates <boolean> By default, all coordinates are
>> reported on the forward strand. Setting this option to false 
>reverts to
>> the old behaviour (pre-0.8.3) whereby alignments on the reverse
>> complement of a sequence are reported using coordinates on 
>the reverse
>> complement. 
>>  
>> We see GFF DUMP coordinates still reported on the reverse stand
>> regardless of the setting of --forwardcoordinates.  So these two
>> excerpts from you manpage seem contradictory to me.     Unless I
>> understand `--forwardcoordinates FALSE` to only effect the 
>coordinates
>> reported in the alignment section, not in the GFF DUMP 
>section, which is
>> what it appears to do in practice.
>>  
>> Guy, can you confirm that the --forwardcoordinates option 
>has no effect
>> on GFF output?
>>  
>
>Hi,
>
>Yes, it has no effect, and this is a bug
>(sorry - it was due to my misinterpretation of the GFF2 spec)
>- its on the list of things to be fixed for exonerate 1.1 (soon)
>
>> Further, can you tell us if you plan to comport more closely 
>to the GFF
>> spec, in particular in this case by reporting 
>forwardcoordinates in the
>> GFF DUMP section too?   I see 
>> I see in your TODO list "    o Should GFF show all coordinates on the
>> +ve strand? (jason_p2g eg)".  Hear hear!  I second the motion.
>>  
>> And TODO item " GFF3 support ? http://song.sf.net/" gets my 
>vote too....
>> though this is more of a sticky wicket....
>>  
>
>Yup, GFF3 support is on the list,
>but probably it will not be done in time for exonerate 1.1
>Of course, I'd welcome a patch ...    ;)
>
>(I'm mainly working on getting the cdna2genome
> and genome2genome models working properly for 1.1)
>
>Cheers,
>
>Guy.
>
>> Cheers and Thanks!
>>  
>> Malcolm Cook
>>  
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Friday, September 02, 2005 9:46 AM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> I've already talked to Guy about some of this and I assume 
>fixes will be
>> part of the next release, but it can't hurt to have more people
>> requesting.  The main problem right now is reverse strand hits in GFF
>> output are still screwed up even if you provide the 
>--forwardcoordinates
>> option. 
>> 
>> If someone wanted to write/donate a VULGAR to GFF subroutine (okay
>> VULGAR to a list of Bio::Search::HSP::GenericHSP).  We can also
>> reconstruct everything needed from that, I gave a stab at it 
>once, but
>> there was something missing (or maybe it was pre --forwardcoordinates
>> option).   
>> 
>> 
>> -jason 
>> 
>> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:
>> 
>> 
>> Jason,
>>  
>> Thanks for the scripts and clues (esp re: using the --ryo option to
>> inject the needed length into the exonerate output to compensate).
>>  
>> I'm considering asking exonerate author to comport with GFF spec.  Do
>> you think this is a road to take?
>>  
>> Cheers,
>>  
>> Malcolm
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Wednesday, August 31, 2005 12:35 PM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> 
>http://fungal.genome.duke.edu/~jes12/software/scripts/process_e
>xonerate_
>> gff3.pl
>> 
>> You may still want to massage it some, but I use the script in this
>> basic form, maybe with a few tweaks:
>> 
>> Note that it requires you to run exonerate with specific 
>--ryo options
>> so that it includes the length of the query and hit sequences in the
>> report output. should be covered in the perldoc in the script.
>> 
>> Without the ryo options enabled,  you'll need to modify the 
>script more
>> to have access to the original sequence db, use 
>Bio::DB::Fasta,  and put
>> in some $dbh->length($seqid) calls instead.
>> 
>> I don't think the part which writes HSP/match lines is 
>actually correct
>> - it is trying to roll gapped HSPs from the similarity features. 
>> 
>> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse
>> instance and/or grepping out the lines I really think I need.  
>> You may want to s/exon/CDS/ for the protein2genome output as well.
>> 
>> -jason
>> 
>> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
>> 
>> 
>> Jason, 
>> 
>> This message is in regards to an old thread  in which you offered to
>> shared a 'script for munging over' exonerate output for lading in
>> DB::GFF (c.f.
>> <http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html>
>> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)
>> 
>> Would you be willing to still share that script, if you've got it
>> around? 
>> 
>> Thanks, and regards, 
>> 
>> Malcolm Cook -  <mailto:mec at stowers-institute.org>
>> mec at stowers-institute.org - 816-926-4449
>> Database Applications Manager - Bioinformatics
>> Stowers Institute for Medical Research - Kansas City, MO  USA
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>
>-- 
>%!PS % <------ Guy St.C. Slater ------> 
>http://www.ebi.ac.uk/~guy/  <------
>210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 
>12/d{exch moveto}
>a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 
>f}a/h{setlinewidth newpath dup
>g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 
>0 108 arc d e
>18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From oldham at ucla.edu  Fri Jun 23 12:18:39 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 23 Jun 2006 09:18:39 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>

Hello again,

I finally got it to work, using the following script.  However, it takes
about 5 hours to run on a fast computer.  Using grep (in bash), on the
other hand, takes about 5 minutes (see below if you are interested).
Thanks to everyone for your help!

SLOW perl script:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_all_X';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
print @ID;
chomp @ID;

while (my $line = <PROBES>) {
	foreach my $identifier (@ID) {
		if($line=~/^>probe:\w+:$identifier:/) {
				print OUT $line;
				print OUT scalar(<PROBES>);
		}
	}
}
exit;


FAST bash script:

#!/usr/bin/bash
exec<"ID_all_X"
while read line; do
	echo $line
	grep -A 1 :$line: HG_U95Av2_probe_fasta >>myresults.txt
done


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Wednesday, June 14, 2006 6:48 AM
To: Michael Oldham; Chris Fields
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT;

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu]
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086;
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date:
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date:
>6/13/2006
>
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.9.2/373 - Release Date: 6/22/2006


From pmiguel at purdue.edu  Sat Jun 24 10:17:46 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 10:17:46 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <449D498A.9020107@purdue.edu>

Brian Osborne wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>   
I would be very disappointed to lose one part of bptutorial.pl--this was 
described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
only purpose I've ever used bptutorial.pl for--to find all the methods 
available to any given object. Eg:

bptutorial.pl 100 Bio::PrimarySeq

 ***Methods for Object Bio::PrimarySeq ********


 Methods taken from package Bio::IdentifiableI
 lsid_string   namespace_string

 Methods taken from package Bio::PrimarySeq
 accession   accession_number   alphabet   authority   can_call_new   desc
 description   direct_seq_set   display_id   display_name   id   is_circular
 length   namespace   new   object_id   primary_id   seq
 subseq   validate_seq   version

 Methods taken from package Bio::PrimarySeqI
 moltype   revcom   translate   trunc

 Methods taken from package Bio::Root::Root
 DESTROY   confess   debug   throw   verbose

 Methods taken from package Bio::Root::RootI
 carp   deprecated   stack_trace   stack_trace_dump   
throw_not_implemented   warn
 warn_not_implemented


Phillip SanMiguel


From sdavis2 at mail.nih.gov  Sat Jun 24 10:45:52 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 24 Jun 2006 10:45:52 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a singlelargefile
References: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>
Message-ID: <001a01c6979c$ff576dd0$6501a8c0@WATSON>


----- Original Message ----- 
From: "Michael Oldham" <oldham at ucla.edu>
To: "Cook, Malcolm" <MEC at stowers-institute.org>; "Chris Fields" 
<cjfields at uiuc.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Friday, June 23, 2006 12:18 PM
Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
singlelargefile


> Hello again,
>
> I finally got it to work, using the following script.  However, it takes
> about 5 hours to run on a fast computer.  Using grep (in bash), on the
> other hand, takes about 5 minutes (see below if you are interested).
> Thanks to everyone for your help!
>
> SLOW perl script:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID_all_X';
>
> unless (open(IDFILE, $IDs)) {
> print "Could not open file $IDs!\n";
> }
>
> my $probes = 'HG_U95Av2_probe_fasta';
>
> unless (open(PROBES, $probes)) {
> print "Could not open file $probes!\n";
> }
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> print @ID;
> chomp @ID;
>
> while (my $line = <PROBES>) {
> foreach my $identifier (@ID) {
> if($line=~/^>probe:\w+:$identifier:/) {
> print OUT $line;
> print OUT scalar(<PROBES>);
> }
> }
> }

This could probably be done MUCH faster using a hash on the sequence 
identifier.  (I have to admit that I didn't follow the first part of this 
conversation, so I could be misunderstanding some part of what you are 
trying to do.)  If you have a couple hundred-thousand sequences, my guess is 
that it could be done in under 30 seconds, but I could be wrong about the 
exact time.  The important part is to make a hash of your sequences with the 
key being the $identifier.  Then, loop through your @ID array doing 
something like (untested):

#open files as before and read in @ID as before

my %seq_hash;

while (my $line = <PROBES>) {
    if ($line =~/^>probe:\w+:$identifier:/) {
        $seq_hash{$identifier}=<PROBES>;
    }
}

foreach my $id (@ID) {
    print OUT ">$id\n" . $seq_hash{$id};
}


From arareko at campus.iztacala.unam.mx  Sat Jun 24 11:27:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 10:27:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D498A.9020107@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>
	<449D498A.9020107@purdue.edu>
Message-ID: <449D59C7.4030008@campus.iztacala.unam.mx>

Hi Philip,

Have you tried the Deobfuscator interface? It's a newer and better way 
to browse all the methods available in BioPerl:

http://bioperl.org/wiki/Deobfuscator
http://bioperl.org/cgi-bin/deob_interface.cgi

Regards,
Mauricio.

Phillip SanMiguel wrote:
> Brian Osborne wrote:
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
>>   
> I would be very disappointed to lose one part of bptutorial.pl--this was 
> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
> only purpose I've ever used bptutorial.pl for--to find all the methods 
> available to any given object. Eg:
> 
> bptutorial.pl 100 Bio::PrimarySeq
> 
>  ***Methods for Object Bio::PrimarySeq ********
> 
> 
>  Methods taken from package Bio::IdentifiableI
>  lsid_string   namespace_string
> 
>  Methods taken from package Bio::PrimarySeq
>  accession   accession_number   alphabet   authority   can_call_new   desc
>  description   direct_seq_set   display_id   display_name   id   is_circular
>  length   namespace   new   object_id   primary_id   seq
>  subseq   validate_seq   version
> 
>  Methods taken from package Bio::PrimarySeqI
>  moltype   revcom   translate   trunc
> 
>  Methods taken from package Bio::Root::Root
>  DESTROY   confess   debug   throw   verbose
> 
>  Methods taken from package Bio::Root::RootI
>  carp   deprecated   stack_trace   stack_trace_dump   
> throw_not_implemented   warn
>  warn_not_implemented
> 
> 
> Phillip SanMiguel
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From golharam at umdnj.edu  Sat Jun 24 10:43:29 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 10:43:29 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>
Message-ID: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>

I've managed to code three methods to calculate K into a perl script
using the algorithms as described in "Molecular Evolution" by Wen-Hsuing
Li.   I'd be happy to contribute it as a script...


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:40 AM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from 
> baseml in the PAML package to measure the distances of some non-coding

> regions.
>
> I started with the coding regions, and used the script 
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do 
> something similar for non-coding regions.  However, when I call 
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' 
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a

> way to measure substitution frequencies of non-coding regions with the

> PAML package.  The documentation with PAML is sparse so I'm not
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the 
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 12:59:21 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 12:59:21 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D59C7.4030008@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>
	<449D59C7.4030008@campus.iztacala.unam.mx>
Message-ID: <449D6F69.1090104@purdue.edu>

Yes I have. It is very useful.
But in situations where I don't have web access? Or I am working with 
Bioperl 1.5?

Mauricio Herrera Cuadra wrote:
> Hi Philip,
>
> Have you tried the Deobfuscator interface? It's a newer and better way 
> to browse all the methods available in BioPerl:
>
> http://bioperl.org/wiki/Deobfuscator
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Regards,
> Mauricio.
>
> Phillip SanMiguel wrote:
>   
>> Brian Osborne wrote:
>>     
>>> Jay,
>>>
>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>
>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>> don't want to have to maintain two bptutorials.
>>>   
>>>       
>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>> available to any given object. Eg:
>>
>> bptutorial.pl 100 Bio::PrimarySeq
>>
>>  ***Methods for Object Bio::PrimarySeq ********
>>
>>
>>  Methods taken from package Bio::IdentifiableI
>>  lsid_string   namespace_string
>>
>>  Methods taken from package Bio::PrimarySeq
>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>  length   namespace   new   object_id   primary_id   seq
>>  subseq   validate_seq   version
>>
>>  Methods taken from package Bio::PrimarySeqI
>>  moltype   revcom   translate   trunc
>>
>>  Methods taken from package Bio::Root::Root
>>  DESTROY   confess   debug   throw   verbose
>>
>>  Methods taken from package Bio::Root::RootI
>>  carp   deprecated   stack_trace   stack_trace_dump   
>> throw_not_implemented   warn
>>  warn_not_implemented
>>
>>
>> Phillip SanMiguel
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   


From arareko at campus.iztacala.unam.mx  Sat Jun 24 13:35:54 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 12:35:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D6F69.1090104@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
Message-ID: <449D77FA.70103@campus.iztacala.unam.mx>

Currently I'm modifying the Deobfuscator so it'd be capable of browsing 
the different BioPerl packages as well as their respective releases, but 
haven't got many spare time to finish it :(

Dave and I committed the Deobfuscator into the bioperl-live source tree 
(in /doc directory), so it'd be included in future releases of BioPerl. 
I'm also working on a command line version which won't need a CGI 
environment to have the same functionality, this would address the web 
access situation that you mention.

Phillip SanMiguel wrote:
> Yes I have. It is very useful.
> But in situations where I don't have web access? Or I am working with 
> Bioperl 1.5?
> 
> Mauricio Herrera Cuadra wrote:
>> Hi Philip,
>>
>> Have you tried the Deobfuscator interface? It's a newer and better way 
>> to browse all the methods available in BioPerl:
>>
>> http://bioperl.org/wiki/Deobfuscator
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> Regards,
>> Mauricio.
>>
>> Phillip SanMiguel wrote:
>>   
>>> Brian Osborne wrote:
>>>     
>>>> Jay,
>>>>
>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>
>>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>>> don't want to have to maintain two bptutorials.
>>>>   
>>>>       
>>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>>> available to any given object. Eg:
>>>
>>> bptutorial.pl 100 Bio::PrimarySeq
>>>
>>>  ***Methods for Object Bio::PrimarySeq ********
>>>
>>>
>>>  Methods taken from package Bio::IdentifiableI
>>>  lsid_string   namespace_string
>>>
>>>  Methods taken from package Bio::PrimarySeq
>>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>>  length   namespace   new   object_id   primary_id   seq
>>>  subseq   validate_seq   version
>>>
>>>  Methods taken from package Bio::PrimarySeqI
>>>  moltype   revcom   translate   trunc
>>>
>>>  Methods taken from package Bio::Root::Root
>>>  DESTROY   confess   debug   throw   verbose
>>>
>>>  Methods taken from package Bio::Root::RootI
>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>> throw_not_implemented   warn
>>>  warn_not_implemented
>>>
>>>
>>> Phillip SanMiguel
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>     
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Sat Jun 24 09:39:56 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 09:39:56 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
References: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>

baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
>
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not  
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 13:48:15 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 13:48:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D77FA.70103@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
Message-ID: <449D7ADF.3030604@purdue.edu>


Yes, that would be better than bptutorial.pl 100 then. For some modules 
bptutorial.pl 100 doesn't seem to give any of the methods they have 
access to. Whereas the deobfuscator does.

Mauricio Herrera Cuadra wrote:
> Currently I'm modifying the Deobfuscator so it'd be capable of 
> browsing the different BioPerl packages as well as their respective 
> releases, but haven't got many spare time to finish it :(
>
> Dave and I committed the Deobfuscator into the bioperl-live source 
> tree (in /doc directory), so it'd be included in future releases of 
> BioPerl. I'm also working on a command line version which won't need a 
> CGI environment to have the same functionality, this would address the 
> web access situation that you mention.
>
> Phillip SanMiguel wrote:
>> Yes I have. It is very useful.
>> But in situations where I don't have web access? Or I am working with 
>> Bioperl 1.5?
>>
>> Mauricio Herrera Cuadra wrote:
>>> Hi Philip,
>>>
>>> Have you tried the Deobfuscator interface? It's a newer and better 
>>> way to browse all the methods available in BioPerl:
>>>
>>> http://bioperl.org/wiki/Deobfuscator
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Phillip SanMiguel wrote:
>>>  
>>>> Brian Osborne wrote:
>>>>    
>>>>> Jay,
>>>>>
>>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>>
>>>>> - Do we remove the file bptutorial.pl from the package now? I'd 
>>>>> say yes, we
>>>>> don't want to have to maintain two bptutorials.
>>>>>         
>>>> I would be very disappointed to lose one part of 
>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for 
>>>> Bioinformatics_. It is the only purpose I've ever used 
>>>> bptutorial.pl for--to find all the methods available to any given 
>>>> object. Eg:
>>>>
>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>
>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>
>>>>
>>>>  Methods taken from package Bio::IdentifiableI
>>>>  lsid_string   namespace_string
>>>>
>>>>  Methods taken from package Bio::PrimarySeq
>>>>  accession   accession_number   alphabet   authority   
>>>> can_call_new   desc
>>>>  description   direct_seq_set   display_id   display_name   id   
>>>> is_circular
>>>>  length   namespace   new   object_id   primary_id   seq
>>>>  subseq   validate_seq   version
>>>>
>>>>  Methods taken from package Bio::PrimarySeqI
>>>>  moltype   revcom   translate   trunc
>>>>
>>>>  Methods taken from package Bio::Root::Root
>>>>  DESTROY   confess   debug   throw   verbose
>>>>
>>>>  Methods taken from package Bio::Root::RootI
>>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>>> throw_not_implemented   warn
>>>>  warn_not_implemented
>>>>
>>>>
>>>> Phillip SanMiguel
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>   
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From jason at bioperl.org  Sat Jun 24 14:42:57 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 14:42:57 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
References: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>

You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun 24 15:07:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 14:07:06 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D7ADF.3030604@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
	<449D7ADF.3030604@purdue.edu>
Message-ID: <EF5998FD-BA4F-439C-873E-71E55DBA0F4D@uiuc.edu>

As a quickie method I use the script from the FAQ; you have to  
install Class::Inspector:

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector->methods 
($class,'full','public')}), "\n";

Works well, though doesn't have the links and so on like  
Deobfuscator; I use HTML-generated ActiveState docs:

glaciers-115 chris$ methods.pl Bio::SeqIO
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::debug
Bio::Root::Root::except
Bio::Root::Root::finally
Bio::Root::Root::otherwise
Bio::Root::Root::throw
Bio::Root::Root::try
Bio::Root::Root::verbose
Bio::Root::Root::with
Bio::Root::RootI::carp
Bio::Root::RootI::confess
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::SeqIO::DESTROY
Bio::SeqIO::PRINT
Bio::SeqIO::READLINE
Bio::SeqIO::TIEHANDLE
Bio::SeqIO::alphabet
Bio::SeqIO::fh
Bio::SeqIO::location_factory
Bio::SeqIO::new
Bio::SeqIO::newFh
Bio::SeqIO::next_seq
Bio::SeqIO::object_factory
Bio::SeqIO::sequence_builder
Bio::SeqIO::sequence_factory
Bio::SeqIO::write_seq


Chris

On Jun 24, 2006, at 12:48 PM, Phillip SanMiguel wrote:

>
> Yes, that would be better than bptutorial.pl 100 then. For some  
> modules
> bptutorial.pl 100 doesn't seem to give any of the methods they have
> access to. Whereas the deobfuscator does.
>
> Mauricio Herrera Cuadra wrote:
>> Currently I'm modifying the Deobfuscator so it'd be capable of
>> browsing the different BioPerl packages as well as their respective
>> releases, but haven't got many spare time to finish it :(
>>
>> Dave and I committed the Deobfuscator into the bioperl-live source
>> tree (in /doc directory), so it'd be included in future releases of
>> BioPerl. I'm also working on a command line version which won't  
>> need a
>> CGI environment to have the same functionality, this would address  
>> the
>> web access situation that you mention.
>>
>> Phillip SanMiguel wrote:
>>> Yes I have. It is very useful.
>>> But in situations where I don't have web access? Or I am working  
>>> with
>>> Bioperl 1.5?
>>>
>>> Mauricio Herrera Cuadra wrote:
>>>> Hi Philip,
>>>>
>>>> Have you tried the Deobfuscator interface? It's a newer and better
>>>> way to browse all the methods available in BioPerl:
>>>>
>>>> http://bioperl.org/wiki/Deobfuscator
>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>
>>>> Regards,
>>>> Mauricio.
>>>>
>>>> Phillip SanMiguel wrote:
>>>>
>>>>> Brian Osborne wrote:
>>>>>
>>>>>> Jay,
>>>>>>
>>>>>> Excellent! Now we need to answer a few more questions for  
>>>>>> ourselves:
>>>>>>
>>>>>> - Do we remove the file bptutorial.pl from the package now? I'd
>>>>>> say yes, we
>>>>>> don't want to have to maintain two bptutorials.
>>>>>>
>>>>> I would be very disappointed to lose one part of
>>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for
>>>>> Bioinformatics_. It is the only purpose I've ever used
>>>>> bptutorial.pl for--to find all the methods available to any given
>>>>> object. Eg:
>>>>>
>>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>>
>>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>>
>>>>>
>>>>>  Methods taken from package Bio::IdentifiableI
>>>>>  lsid_string   namespace_string
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeq
>>>>>  accession   accession_number   alphabet   authority
>>>>> can_call_new   desc
>>>>>  description   direct_seq_set   display_id   display_name   id
>>>>> is_circular
>>>>>  length   namespace   new   object_id   primary_id   seq
>>>>>  subseq   validate_seq   version
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeqI
>>>>>  moltype   revcom   translate   trunc
>>>>>
>>>>>  Methods taken from package Bio::Root::Root
>>>>>  DESTROY   confess   debug   throw   verbose
>>>>>
>>>>>  Methods taken from package Bio::Root::RootI
>>>>>  carp   deprecated   stack_trace   stack_trace_dump
>>>>> throw_not_implemented   warn
>>>>>  warn_not_implemented
>>>>>
>>>>>
>>>>> Phillip SanMiguel
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pmiguel at purdue.edu  Sat Jun 24 15:37:08 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 15:37:08 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <449D9464.6030508@purdue.edu>

Here is an example bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1682

It was a bug fixed in a module in BioPerl 1.4  back in October of 2004. 
The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the 
module. However the version of the module currently available from CPAN 
is 1.6. (That is the current "stable" release, BioPerl 1.4.0)

I've written a script that relies on that bug being fixed. How should I 
deal with this when I want to give the script to others to use? Just 
tell them "You must have BioPerl 1.5 installed". Give them instructions 
for patching the module code?

How long before the next "stable" release? Maybe a year? Should not a 
BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or 
would that be very difficult?

By the way, I think the revision graph viewer is great for someone, at 
best, peripherally involved in BioPerl to figure out which module 
version is associated with which BioPerl version, for example:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/QualI.pm?graph=1
Phillip SanMiguel

From golharam at umdnj.edu  Sat Jun 24 14:57:52 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 14:57:52 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>
Message-ID: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>

Hi Jason,

It looks like DNAStatistics is only for coding sequences.  I'm trying to
calculate the Ks of exons and the K (or Ki) of introns.  All the methods
in bioperl are based on coding sequences.  Only the  PAUP package (that
I've found) does non-coding sequences.   I would have used it but you
need to pay for it and we don't have the funding to purchase much at the
moment.

I brielfy looked at PHYLIP and EMBOSS but it didn't look as
straight-forward as I was hoping it would be.  Either that, or I was
getting fustrated looking for a simple solution.  

In the end, I found a molecular evolution book that talks about several
methods used for non-coding sequences so I went ahead and implemented
them.  They seem to work well.  

Ryan


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 2:43 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script 
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Sat Jun 24 18:37:15 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 18:37:15 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
Message-ID: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>

I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
alignments and parsing the resulting alignments.

The ClustalW output is being sent to STDOUT.  Is there a way I can
redirect the output to STDERR instead?

Here's how I'm using it:

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

(Forgive me if it in the docs - I've been coding for a week straight now
including saturday)

Thanks, Ryan


From cjfields at uiuc.edu  Sat Jun 24 20:16:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 19:16:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>

On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

A BioPerl module version is not the same as the distribution  
version.  All the modules have different version numbers  
corresponding to CVS commits for various code changes.  If you want  
to see the version for the distribution, read this:

http://www.bioperl.org/wiki/ 
FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

Many 'bug fixes', you'll find, have less to do with problems/bugs in  
BioPerl code than they do with outside code changes beyond our  
control.  By that I mean changes to other programs modify output so  
parsers break (BLAST, PAML, etc), or changes to API for remote  
databases that break queries (recent changes in EBI database  
concerning Swissprot, for example).  So, the code is considered  
'stable' at the time of release, but past that point issues beyond  
our control may break certain modules parsing output, accessing  
remote databases, and so on, at any time. This link:

http://www.bioperl.org/wiki/FAQ#BioPerl_in_General

should answer a few more questions you may have.  The FAQ is very  
helpful...

In general, if there are problems with code you could look at the  
latest developer's release (1.5.1, released in Oct 2005) to see if  
any bugs have been fixed.  They may be fixed post-1.5.1 and will be  
in CVS; you can always suggest using 1.5.1 (it's pretty stable) and  
updating only the fixed modules from CVS if needed.

> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

No, it's not that easy.  BioPerl isn't like most CPAN modules with  
one or two developers.  See the wiki page for details on planning  
releases to see why:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

It takes a lot of effort and coordination, much more so than the  
average CPAN module.  I believe some of the core developers are  
meeting this weekend; maybe something will come of that and we'll get  
an idea of a next release.

Chris

> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat Jun 24 21:02:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 24 Jun 2006 21:02:36 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>


On Jun 24, 2006, at 3:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

Either way. If the patch is trivial you could also provide the patch  
as an option. Generally we don't support that though. (Not everything  
that we don't support we don't support because it doesn't work.  
Sometimes it's just a statement along 'it-probably-works-but-don't- 
bug-us-if-it-doesn't'.)

>
> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

1.5.1 fixes a number of other problems too, so there isn't really  
much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,  
so investing time into creating 1.4.1 we think is not the best  
investment we can make.

Our current goal is to release 1.5.2 and possibly more development  
versions all leading on a steady path to 1.6.0. There's very few (but  
significant) stumbling blocks on this path that will require I  
believe some dedicated time from a couple  of people and after that  
there shouldn't be any real obstacles. It's quite possible that at  
BOSC or as early as next week at the GMOD meeting we could see a leap  
forward, typically it's those meetings that pull the respective  
people away from their daily obligations (short of an actual  
hackathons).

Some time back in spring 1.6 was put in proximity to BOSC, but that's  
probably not going to happen, but quite possibly not that much  
afterwards.

	-hilmar

>
> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Jun 24 21:21:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:21:56 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c697f5$c08d1150$15327e82@pyrimidine>

According to the docs ( ;> ) the default behaviour is to return "a BioPerl
Bio::SimpleAlign object which can then be printed and/or saved in multiple
formats using the AlignIO.pm module"; you should be able to do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight now
> including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Sat Jun 24 21:38:06 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 21:38:06 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
References: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>

they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm  
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as
> straight-forward as I was hoping it would be.  Either that, or I was
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about  
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun 24 21:40:49 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:40:49 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>
Message-ID: <000401c697f8$62d41e70$15327e82@pyrimidine>

...
> > I've written a script that relies on that bug being fixed. How
> > should I
> > deal with this when I want to give the script to others to use? Just
> > tell them "You must have BioPerl 1.5 installed". Give them
> > instructions
> > for patching the module code?
> 
> Either way. If the patch is trivial you could also provide the patch
> as an option. Generally we don't support that though. (Not everything
> that we don't support we don't support because it doesn't work.
> Sometimes it's just a statement along 'it-probably-works-but-don't-
> bug-us-if-it-doesn't'.)

The bug was fixed post-1.4 release according to the link, so Phillip should
use v1.5.1 or newer.

Hilmar's right.  It's hard to address every single complaint about code not
working or method not implemented w/o having patches or fixes submitted.
It's not my top priority to fix bugs in modules submitted by other authors
when I don't know the code.  I'll try if I have the free time, but that's
getting to be a precious commodity lately...

> > How long before the next "stable" release? Maybe a year? Should not a
> > BioPerl 1.4.1 be released so CPAN would get bug fixes like this
> > one? Or
> > would that be very difficult?
> 
> 1.5.1 fixes a number of other problems too, so there isn't really
> much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,
> so investing time into creating 1.4.1 we think is not the best
> investment we can make.
> 
> Our current goal is to release 1.5.2 and possibly more development
> versions all leading on a steady path to 1.6.0. There's very few (but
> significant) stumbling blocks on this path that will require I
> believe some dedicated time from a couple  of people and after that
> there shouldn't be any real obstacles. It's quite possible that at
> BOSC or as early as next week at the GMOD meeting we could see a leap
> forward, typically it's those meetings that pull the respective
> people away from their daily obligations (short of an actual
> hackathons).
> 
> Some time back in spring 1.6 was put in proximity to BOSC, but that's
> probably not going to happen, but quite possibly not that much
> afterwards.
> 
> 	-hilmar
...

Nice to know.  I guess a Release Pumpkin will be picked as well.  BOSC is
right around the corner so I guess we can expect something announced soon as
to a possible roadmap (we can't talk about 'timelines' in the States, it's
not patriotic).  

Chris


From golharam at umdnj.edu  Sat Jun 24 23:03:01 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:03:01 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000301c697f5$c08d1150$15327e82@pyrimidine>
Message-ID: <000301c69803$df899f20$2f01a8c0@GOLHARMOBILE1>

Thanks Chris.  It is in fact when you call align() that clustalw
generates the output that you see on the console.  The alignment is
generates I'm parsing right away.  Here's the output (an example) of
what I'm referring to:

-- BEGIN --
 CLUSTAL W (1.83) Multiple Sequence Alignments


Sequence format is Pearson
Sequence 1: human           271 aa
Sequence 2: mouse           264 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  90
Guide tree        file created:   [/tmp/TX4yxP9uKQ/80W87TkT5Z.dnd]
Start of Multiple Alignment
There are 1 groups
Aligning...
Group 1: Sequences:   2      Score:5469
Alignment Score 1480
GCG-Alignment file created      [/tmp/TX4yxP9uKQ/xE4GNyY7Rc]
-- END --

How do I get this to do to stderr instead of stdout? 

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Saturday, June 24, 2006 9:22 PM
To: golharam at umdnj.edu; bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
redirect?


According to the docs ( ;> ) the default behaviour is to return "a
BioPerl Bio::SimpleAlign object which can then be printed and/or saved
in multiple formats using the AlignIO.pm module"; you should be able to
do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output 
> redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some 
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can 
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight 
> now including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Sat Jun 24 23:05:41 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:05:41 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>
Message-ID: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>

>>they make no assumption about coding sequence,
>>where do you get that impression

I get that information from the 1.5 api docs:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/

Its documented under the description section.  

Oh well, I have it coded and working...might as well use it.

Ryan
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:38 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as 
> straight-forward as I was hoping it would be.  Either that, or I was 
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Sun Jun 25 07:33:58 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Jun 2006 12:33:58 +0100
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
References: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <449E74A6.3020709@sendu.me.uk>

Ryan Golhar wrote:
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);

You can suppress the output completely using
$aln_factory->quiet(1);

(supplying quiet => 1 to new() should also work according to the docs, 
but doesn't seem to be implemented, though I could be wrong)

If you really want the messages on STDERR you could try redirecting 
STDOUT to STDERR before calling align():
open(OLDOUT, ">&STDOUT");
open(STDOUT, ">&STDERR");
my $aa_aln = $aln_factory->align(\@aa_seq);
open(STDOUT, ">&OLDOUT");

I haven't tested either of these ideas, but I think they should both 
work - try them out and let us know.

Ideally there would be a saner way of doing this, but it isn't readily 
apparent to me.


From jason at bioperl.org  Sun Jun 25 08:37:11 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 25 Jun 2006 08:37:11 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with
	(baseml from PAML package)]
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>


On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:

>>> they make no assumption about coding sequence,
>>> where do you get that impression
>
> I get that information from the 1.5 api docs:
>
> http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
great - I would also always point people to the LIVE code  
documentation not the 1.5.0-RC1 which is +1 years old, but nothing  
particular has changed in this module since 1.5.0 that I know of.   
Someday someone will put a new ball of docs up on the site, but I  
hope that will come with the next development or stable release.

> Its documented under the description section.
>
i don't really see what you refer to since there is a lot of  
documentation, but perhaps it should be clarified - I had hoped this  
was a sufficient description:
"This object contains routines for calculating various statistics and  
distances for DNA alignments."

> Oh well, I have it coded and working...might as well use it.
>
Sounds like your best bet for your situation.

For the record and in the mailing list archives - as long as you  
don't call a method that contains "KaKs" it will work fine.  You can  
calculate distances using the currently implemented distance methods:

    JukesCantor
    Uncorrected
    F81
    Kimura
    Tamura
    F84 (Felsenstien 84)
    TajimaNei
    JinNei


It will be more productive is to just drop the discussion since you  
seem to be fine without all of this anyways  - if you decide you  
would like to use it and contribute new distances methods or doc  
fixes I am sure we'll enjoy your contributions.


-jason
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Sun Jun 25 13:05:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 12:05:34 -0500
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
	with(baseml from PAML package)]
In-Reply-To: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>
Message-ID: <000901c69879$97b7d5b0$15327e82@pyrimidine>

> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing
> particular has changed in this module since 1.5.0 that I know of.
> Someday someone will put a new ball of docs up on the site, but I
> hope that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as it's
the most up-to-date, that fact isn't really emphasized much on the docs
page; the link is along with the other toolkits at the bottom of the page
and is listed as Bioperl Core Code (some users don't seem to get that, in
general, bioperl=bioperl core).  Could be this is causing a bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or that I
should be able to!).  Maybe the best way to go is to have a wiki page for
this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or at
least link to the Release Pumpkin page which indicates the code release date
for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.  Work
is underway to correct them, but do not expect this code to give you the
right answer currently!  Use dnadist/distmat in the PHLYIP or EMBOSS
packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some point
relatively soon myself so I may be up to some testing on one/both of these
modules if needed.

Chris

....


From golharam at umdnj.edu  Sun Jun 25 13:20:12 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sun, 25 Jun 2006 13:20:12 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
 with(baseml from PAML package)]
In-Reply-To: <000901c69879$97b7d5b0$15327e82@pyrimidine>
Message-ID: <000801c6987b$9e65f840$2f01a8c0@GOLHARMOBILE1>

Exactly.  Also on the page it says (in the descriptionfor
Bio::Align::DNAStatistics):

In order to use these methods there are
several pre-requisites for the alignment.

   1
   DNA alignment must be based on protein alignment. Use the subroutine
aa_to_dna_aln    in Bio::Align::Utilities to achieve this.

 Etc etc etc


The rest of the pre-reqs also mention that the sequences should be
coding sequences.  Because of this, I thought DNAStatistics was only for
coding sequences and could not be used for non-coding sequences...

Anyway, I've gotten past my troubles and am on to finish this project.
I think the isssues I ran into others might run into as well.  I'd be
happy to contribue what I can but need to finish this stuff first...
Thanks for all your help Jason, Chris, Sendu!

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Sunday, June 25, 2006 1:06 PM
To: 'Jason Stajich'; golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] DNA distance methods [was
Bio::Tools::Phylo::PAML with(baseml from PAML package)]


> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code 
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing 
> particular has changed in this module since 1.5.0 that I know of. 
> Someday someone will put a new ball of docs up on the site, but I hope

> that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as
it's the most up-to-date, that fact isn't really emphasized much on the
docs page; the link is along with the other toolkits at the bottom of
the page and is listed as Bioperl Core Code (some users don't seem to
get that, in general, bioperl=bioperl core).  Could be this is causing a
bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or
that I should be able to!).  Maybe the best way to go is to have a wiki
page for this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or
at least link to the Release Pumpkin page which indicates the code
release date for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.
Work is underway to correct them, but do not expect this code to give
you the right answer currently!  Use dnadist/distmat in the PHLYIP or
EMBOSS packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some
point relatively soon myself so I may be up to some testing on one/both
of these modules if needed.

Chris

....


From pmiguel at purdue.edu  Sun Jun 25 15:02:14 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sun, 25 Jun 2006 15:02:14 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
Message-ID: <449EDDB6.8020401@purdue.edu>

Chris Fields wrote:
> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>
> [...]
>> How long before the next "stable" release? Maybe a year? Should not a
>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or
>> would that be very difficult?
>
> No, it's not that easy.  BioPerl isn't like most CPAN modules with one 
> or two developers.  See the wiki page for details on planning releases 
> to see why:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> It takes a lot of effort and coordination, much more so than the 
> average CPAN module.  I believe some of the core developers are 
> meeting this weekend; maybe something will come of that and we'll get 
> an idea of a next release.
>
> Chris
Hi Chris,
   Thanks for the information--the key part being that a bug fix from a 
couple of years ago has not propagated into the current stable release. 
Below I'll try to convince you that this is a serious problem. (Not 
because it is your fault, of course. I'm just trying to deliver my take 
on the situation to the bioperl-programmer-warriors who happen to be 
listening...)
   It isn't a problem for me to edit the offending statement in the 
QualI.pm module on systems I generally use. Or even install a 
developer's release of bioperl. My problem is one of advocacy. Maybe I 
have a warped view of the world, but it seems that except for those 
directly involved in the bioperl or GMOD projects, everyone looks to 
CPAN when they install bioperl.
    I write scripts that I sometimes want to send to biologists even 
less programming-capable than I am. I can just barely envision those 
biologists pestering their sysadmin to do a CPAN install of bioperl 
modules so that my script will work. But installing a non-CPAN set of 
modules probably isn't going to happen.
    So, this being the case, how can I, with a clear conscious, advocate 
bioperl to the junior bioinformaticians with whom I happen to interact?
    My take, for what it is worth, is that 1.5 has become an unratified 
stable release. How hard would it be to take 1.5.1--as is--and deposit 
that in CPAN? What would be the downside?

Phillip SanMiguel
   

From hlapp at gmx.net  Sun Jun 25 15:42:20 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 25 Jun 2006 15:42:20 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>

We did not and will not deposit 1.5.1 into CPAN due to the API issues  
in some (rather central) interfaces. These issues are changes over  
the 1.4 API and some of those changes are going to go away. Once we  
deposit it into CPAN we would sanction the changed API as the new  
'official' API and would open a huge can of backward liability worms.  
If you just continue to use the 1.4 API on the 1.5.1 release you  
don't need to be concerned about an API method you're using going away.

As I said, the people from the core group of developers who have  
traditionally shepherded releases all think that doing a 1.4.1  
release wouldn't be the best investment of their time. You are most  
welcome to disagree and volunteer your time to coordinate the 1.4.1  
release, and a lot of people will appreciate your efforts - including  
the bioperl developers and 'core'. It shouldn't be much work  
theoretically.

	-hilmar

On Jun 25, 2006, at 3:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one
>> or two developers.  See the wiki page for details on planning  
>> releases
>> to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the
>> average CPAN module.  I believe some of the core developers are
>> meeting this weekend; maybe something will come of that and we'll get
>> an idea of a next release.
>>
>> Chris
> Hi Chris,
>    Thanks for the information--the key part being that a bug fix  
> from a
> couple of years ago has not propagated into the current stable  
> release.
> Below I'll try to convince you that this is a serious problem. (Not
> because it is your fault, of course. I'm just trying to deliver my  
> take
> on the situation to the bioperl-programmer-warriors who happen to be
> listening...)
>    It isn't a problem for me to edit the offending statement in the
> QualI.pm module on systems I generally use. Or even install a
> developer's release of bioperl. My problem is one of advocacy. Maybe I
> have a warped view of the world, but it seems that except for those
> directly involved in the bioperl or GMOD projects, everyone looks to
> CPAN when they install bioperl.
>     I write scripts that I sometimes want to send to biologists even
> less programming-capable than I am. I can just barely envision those
> biologists pestering their sysadmin to do a CPAN install of bioperl
> modules so that my script will work. But installing a non-CPAN set of
> modules probably isn't going to happen.
>     So, this being the case, how can I, with a clear conscious,  
> advocate
> bioperl to the junior bioinformaticians with whom I happen to  
> interact?
>     My take, for what it is worth, is that 1.5 has become an  
> unratified
> stable release. How hard would it be to take 1.5.1--as is--and deposit
> that in CPAN? What would be the downside?
>
> Phillip SanMiguel
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Jun 25 16:20:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 15:20:20 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <7C28EA28-031A-4B1C-9625-A643247445FD@uiuc.edu>


On Jun 25, 2006, at 2:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one or two developers.  See the wiki page for details on planning  
>> releases to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the  
>> average CPAN module.  I believe some of the core developers are  
>> meeting this weekend; maybe something will come of that and we'll  
>> get an idea of a next release.
>>
>> Chris
> Hi Chris,
>   Thanks for the information--the key part being that a bug fix  
> from a couple of years ago has not propagated into the current  
> stable release. Below I'll try to convince you that this is a  
> serious problem. (Not because it is your fault, of course. I'm just  
> trying to deliver my take on the situation to the bioperl- 
> programmer-warriors who happen to be listening...)
>   It isn't a problem for me to edit the offending statement in the  
> QualI.pm module on systems I generally use. Or even install a  
> developer's release of bioperl. My problem is one of advocacy.  
> Maybe I have a warped view of the world, but it seems that except  
> for those directly involved in the bioperl or GMOD projects,  
> everyone looks to CPAN when they install bioperl.

Again, it's not as easy as you make it seem.  The idea is to upgrade  
the CPAN version to stable releases (even numbered) and that odd- 
numbered releases would be developer versions.  Yes, it has been a  
while since the last stable version; it could be a while until the  
next as there have been suggestions of an interim 1.5.x release or so  
before that occurs (though he did say 1.6 could be soon after BOSC  
which is in August).  Hilmar has explained that there are some  
stumbling blocks to get around before the next major release (if  
those 'stumbling blocks' are what I think they are, I agree).  It's  
very likely implementation of changes that he mentions may require  
refactoring code, changing API, etc.  Not easy in a project like  
this, a large core of contributors and with the developers scattered  
all over the world, all with different priorities (we all have $jobs  
after all).

That's why we have a Release Pumpkin, akin to the Pumpkings that have  
ushered forth regular perl releases.  It requires a large,  
coordinated effort with one person acting as overseer, pushing  
everybody to meet deadlines.  Not easy and not, by a long shot, your  
typical CPAN module.

>    I write scripts that I sometimes want to send to biologists even  
> less programming-capable than I am. I can just barely envision  
> those biologists pestering their sysadmin to do a CPAN install of  
> bioperl modules so that my script will work. But installing a non- 
> CPAN set of modules probably isn't going to happen.
>    So, this being the case, how can I, with a clear conscious,  
> advocate bioperl to the junior bioinformaticians with whom I happen  
> to interact?

Give those biologists some credit. Quite frankly, I would expect any  
bioinformaticist or computational biologist, junior or otherwise, to  
know or at least learn how to install from CPAN or from CVS,  
otherwise they need to change their job title.  And, as a  
microbiologist myself (i.e. one of those biologists you mention) and  
as one who regularly interacts with biologists with little to no  
computer science experience, I believe I can speak from experience.   
I find the install documents that come with BioPerl and available on  
the wiki pretty much cover everything, from how to install to the  
dependencies required to problems one may encounter.   The web site  
has a tone of documentation, including the FAQ (*cough* which covers  
this ground *cough*).

If they are running perl scripts and using a system that requires  
sysadmin privileges they probably know what thy are doing anyway.  If  
not they probably have students/employees that do know what's going  
on (and who may be the ones actually running the scripts).  You can't  
please everybody, so I think you can proceed with a clear conscious  
knowing you did the best that you can to help!

>    My take, for what it is worth, is that 1.5 has become an  
> unratified stable release. How hard would it be to take 1.5.1--as  
> is--and deposit that in CPAN? What would be the downside?

Ah I see Hilmar has responded.  I think he adequately answers this.   
API is everything; changing API suddenly is bad bad bad.

> Phillip SanMiguel

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From akholloway at ucdavis.edu  Mon Jun 26 00:15:16 2006
From: akholloway at ucdavis.edu (Alisha Holloway)
Date: Sun, 25 Jun 2006 21:15:16 -0700
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
 package)
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <a06230932c0c50e71ad97@[10.0.1.2]>

Hi Ryan & Jason,

Sorry I didn't get back to you sooner.  I escaped the central valley 
heat (108!) and went to the coast for the weekend.  I do have a 
script that will call baseml and then parse the results.  Here it is 
and, Ryan, I can show you how to retrieve other parts of the data as 
well, but you may already know how to do this.  I know it's ugly, I 
got it working and didn't clean it up.  Just let me know if you need 
more info.

Alisha

At 11:05 PM -0400 6/24/06, Ryan Golhar wrote:
>  >>they make no assumption about coding sequence,
>>>where do you get that impression
>
>I get that information from the 1.5 api docs:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
>Its documented under the description section. 
>
>Oh well, I have it coded and working...might as well use it.
>
>Ryan
>-----Original Message-----
>From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
>Stajich
>Sent: Saturday, June 24, 2006 9:38 PM
>To: golharam at umdnj.edu
>Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
>package)
>
>
>they make no assumption about coding sequence,where do you get that 
>impression.  the ka,ks are for coding but the tamura/nei kimura, 
>jukes-cantor are all for any type of sequence.
>
>the phylip and emboss are pretty straightforward IMHO - you give it 
>an alignment and you get out a matrix of pairwise numbers....
>\
>but whatever makes sense to you - we are using the same methods as 
>are in Li's book (that is where I took the equations from).
>-j
>On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:
>
>>  Hi Jason,
>>
>>  It looks like DNAStatistics is only for coding sequences.  I'm
>>  trying to
>>  calculate the Ks of exons and the K (or Ki) of introns.  All the 
>>  methods
>>  in bioperl are based on coding sequences.  Only the  PAUP package 
>>  (that
>>  I've found) does non-coding sequences.   I would have used it but you
>>  need to pay for it and we don't have the funding to purchase much 
>>  at the
>>  moment.
>>
>>  I brielfy looked at PHYLIP and EMBOSS but it didn't look as
>>  straight-forward as I was hoping it would be.  Either that, or I was
>>  getting fustrated looking for a simple solution.
>>
>>  In the end, I found a molecular evolution book that talks about
>>  several
>>  methods used for non-coding sequences so I went ahead and implemented
>>  them.  They seem to work well.
>>
>>  Ryan
>>
>>
>>  -----Original Message-----
>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>  Jason
>>  Stajich
>>  Sent: Saturday, June 24, 2006 2:43 PM
>>  To: golharam at umdnj.edu
>>  Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from 
>>  PAML
>>  package)
>>
>>
>>  You should look at the Align::DNAStatistics module if you just want
>>  pairwise DNA distance.  I put in several different distance methods.
>>  Or you can use the distance methods implemented in PHYLIP or EMBOSS
>>  programs -- I thought you wanted the somewhat more sophisticated ML
>>  approaches that are implemented in PAML?
>>
>>  --jason
>>
>>  On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>>
>>>  I've managed to code three methods to calculate K into a perl script
>>>  using the algorithms as described in "Molecular Evolution" by Wen-
>>>  Hsuing
>>>  Li.   I'd be happy to contribute it as a script...
>>>
>>>
>>>
>>>  -----Original Message-----
>>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>>  Jason
>>>  Stajich
>>>  Sent: Saturday, June 24, 2006 9:40 AM
>>>  To: golharam at umdnj.edu
>>>  Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>>>  PAML
>>>  package)
>>>
>>>
>>>  baseml is not well-supported to my knowledge - I think I started with
>>>  attempt to capture a small amount of the data in the file.  There are
>  >> some people who have made modifications to possible parse it in-house
>>>  but I know of no submitted patches.   Many of the knowledgeable
>>>  people are probably at the evolution meetings  this week.
>>>
>>>  I have no idea about the full set of information in the report files
>>>  without going back to the Yang papers first.   It depends on how much
>>>  of that information you really want to capture of just the
>>>  substitution rates.
>>>
>>>  I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>>>  work+PAML.
>>>
>>>  -jason
>>>  On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>>
>>>>  Hi all,
>>>>
>>>>  I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>>>  baseml in the PAML package to measure the distances of some non-
>>>>  coding
>>>
>>>>  regions.
>>>>
>>>>  I started with the coding regions, and used the script
>>>>  bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>>>  something similar for non-coding regions.  However, when I call
>>>>  Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>>>  meaning matrix was never defined.
>>>>
>>>>  I wanted to find out if anyone on here has done this before or
>>>>  knows a
>>>
>>>>  way to measure substitution frequencies of non-coding regions with
>>>>  the
>>>
>>>>  PAML package.  The documentation with PAML is sparse so I'm not
>>>>  sure how
>>>>  to interpret its output directly - that's why I'm using Bioperl.
>>>>
>>>>  Hopefully someone can help me before I start digging into the
>>>>  code...Thanks.
>>>>
>>>>  Ryan
>>>>
>>>>  _______________________________________________
>>>>  Bioperl-l mailing list
>>>>  Bioperl-l at lists.open-bio.org
>>>>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  --
>>>  Jason Stajich
>>>  Duke University
>>>  http://www.duke.edu/~jes12
>>>
>>
>>  --
>>  Jason Stajich
>>  Duke University
>>  http://www.duke.edu/~jes12
>>
>
>--
>Jason Stajich
>Duke University
>http://www.duke.edu/~jes12


-- 
Alisha Holloway

Postdoctoral Fellow
Section of Evolution & Ecology
3347 Storer Hall
University of California
Davis, CA  95616

530-754-9551 Office
512-297-3958 Cell
530-752-1449 Fax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: batch_baseml_50nt.pl
Type: application/octet-stream
Size: 5395 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: baseml.ctl
Type: application/octet-stream
Size: 1699 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0001.obj 

From fernan at iib.unsam.edu.ar  Mon Jun 26 08:47:30 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 09:47:30 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
Message-ID: <20060626124730.GA53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
|
| We did not and will not deposit 1.5.1 into CPAN due to the API issues  
| in some (rather central) interfaces. These issues are changes over  
| the 1.4 API and some of those changes are going to go away. Once we  
| deposit it into CPAN we would sanction the changed API as the new  
| 'official' API and would open a huge can of backward liability worms.  
| If you just continue to use the 1.4 API on the 1.5.1 release you  
| don't need to be concerned about an API method you're using going away.
| 
| As I said, the people from the core group of developers who have  
| traditionally shepherded releases all think that doing a 1.4.1  
| release wouldn't be the best investment of their time. You are most  
| welcome to disagree and volunteer your time to coordinate the 1.4.1  
| release, and a lot of people will appreciate your efforts - including  
| the bioperl developers and 'core'. It shouldn't be much work  
| theoretically.
| 
| 	-hilmar
|
+----]

I understand that, being a volunteer project, people can
decide where to best invest their time. If core developers
are no longer using 1.4 in their production setups, it is
reasonable to expect that they invest all of their time in
1.5 or any other bioperl version that they're using.

However, when view as an issue related to the setting of a
policy for the whole project, then it makes sense to have a
policy saying for how long a stable release will be
supported, and when and in which case bugfixes that are committed
to and tested in the development branch (as it should be)
will get merged back to stable. 

I'm not knowledgeable enough about the bioperl release
engineering process, nor about the internal development
process, but just guessing I'd expect that whenever anyone
submits a bugfix, it should be the responsibility of
the committer to check (against the project policy,
(written or implicit) or with the core developers in a
difficult case) whether the fix should be committed to more
than one branch.

A patch like the one that started this thread, should have
been committed to the 1.4 branch without too much thinking.
And it would have cost the committer only a few seconds more
of her/his time. 

But you only get this by setting and enforcing a policy.

After a number of these fixes has accumulated, then making a
new release shouldn't represent too much effort, nor it
should be expected that the tests that passed before would
break now. And in the worst case (no tarball release),
people can be directed to obtain the most current 'stable'
code from the repository, containing all bugfixes. 

I guess that this is what was meant by Phillip.

Fernan


From hlapp at gmx.net  Mon Jun 26 09:59:00 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 09:59:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.

Sure. But for some reason he or she forgot. So what do you suggest we  
do - and I mean as a community, because this is a community project.  
Come after the guy  until he commits it to the branch? Or post an  
email to the list saying what you think is the right way and then do  
it (yourself)?

>
> But you only get this by setting and enforcing a policy.

Man, this is not a company. Take a step back and think again. What do  
you suggest we - again we as a community - do to enforce a policy?  
Take increasing levels of disciplinary action if someone keeps  
forgetting to commit to the branch?

While there are clearly some rules everybody needs to follow and if  
you violate them deliberately and repeatedly you will get your CVS  
privileges withdrawn, by and large we as a community need to accept  
some responsibility for making the project what we think it should be  
- and do so not by invoking disciplinary action but by living by  
example and by taking action yourself when you think action is due.

If Bioperl were a company and you asked for a 1.4.1 release and the  
customer service rep told you nope there's a 1.5.1 that you should  
use instead and that will do just fine, what will you do? Argue with  
him about the company policies and whether they are properly enforced  
or not?

Obviously doing so will be a waste of your time. In Bioperl it is at  
the bottom of it no less waste of your time, because instead you now  
have the opportunity to make happen what you believe needs to happen.  
We have had a history of rapidly and un-bureaucratically putting  
people in power of what they wanted to do. We have also had a history  
of not listening much to people who don't want to put their feet  
where their mouth is.

I'm sorry if what I'm saying puts people off, but really this is an  
open-source project and if you ask me it's one with the least  
barriers of entry for new developers or 'activists' that you can find  
in the open source arena. This doesn't come without some degree of  
anarchy, but really IMHO that's more of an advantage than a  
disadvantage.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Mon Jun 26 10:13:00 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Jun 2006 10:13:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <BDC70861-52D3-4389-9073-07F456661B14@bioperl.org>

fair enough - we can certainly merge fixes onto the branch -  I am  
not sure why that is such a big deal.

once the changes are made to the branch, If someone then wants to  
update to the latest code on  1.4 branch,  they would  to volunteer  
to do the last step of:
cvs export -r branch-1-4 -d bioperl-1-4-1 bioperl-live

then validate it, then make a tar ball, we can submit a 1.4.x to  
CPAN, but honestly a lot of other fixes have accumulated since the  
1.4 branch and I don't think we want to keep merging back to it, we'd  
rather move forward. the not-so-compatible changes that got checked  
in after the 1.4 branch (having to do with Annotateable) has been  
part of the problem as this has not been fully fixed to make things  
backwards compatable.

Nathan asked earlier on the list about how to get a list of modules  
added since 1.4 and I can only say how to generate a diff to the  
current version of the code which might be more than what he is  
asking for. read the docs on cvs diff where you specify the two tags  
you want to diff between.


We certainly have a problem of meeting the needs of several different  
user groups - developers who need latest code, and users who want  
stable releases.  We either get funding to support stable releases  
more deliberately, things that don't seem to be on the main radar  
screen of primary developers or people who are tied to working with  
older stable releases.  Since most of us who are coding and making  
changes are just working from a CVS checkout we don't have a lot of  
pressure to make a release -- and we don't want to dump newly buggy  
(or broken interfaces) into CPAN on purpose.  It also seems like many  
reported bugs have already been fixed on the latest branch but people  
are less interested in back-fixing on the old branch.


Our hope is that 1.6 would be a good replacement for 1.4 - presumably  
API consistent for the most part, but we are suffering from lack of  
time of people willing to do the work to make this happen.

I have mentioned in the past that I cannot be the release master for  
the project and it is time for someone else to step up and make this  
happen.  Chris Fields has done a phenomenal job answering questions,  
fixing bugs, and helping run the project as some of us have started  
to have too busy of a schedule to keep daily tabs on Bioperl.  But he  
too will probably have to cycle off as his career responsibilities  
(and job search) takes more time.   I don't have a good answer for  
anyone on how to make this happen more smoothly, I am hopeful that  
the gmod mtg will spur some more commits and a roadplan for releasing  
the next dev release and seeing what can happen with 1.6.  If we  
funded a Bioperl coordinator I am sure that would help things more  
and manage the different sets of priorities of the user groups.

I think a dedicated hackathon to bioperl work could get 1.6 out after  
one week of solid work with some bug squashing followup.

Barring that we'll have to see what everyone else wants to see done  
to get the next release out.  The person leading the release doesn't  
have to really program things they just need to organize people  
around a time-frame, a set of features that need to be tested and  
fixed, and commitments from people of what they will do.

Much of the release process is documented on the bioperl wiki site,  
if this is not clear enough please make a note on the page/talk page  
and we can start .  My hope is that the wiki can be a good repository  
of the thought process behind the project.  right now too much of it  
is floating in the minds of former and current project coordinators.

...just some of my thoughts as I get ready to be off-line starting  
next week for 4 weeks...

-jason


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> +----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
> |
> | We did not and will not deposit 1.5.1 into CPAN due to the API  
> issues
> | in some (rather central) interfaces. These issues are changes over
> | the 1.4 API and some of those changes are going to go away. Once we
> | deposit it into CPAN we would sanction the changed API as the new
> | 'official' API and would open a huge can of backward liability  
> worms.
> | If you just continue to use the 1.4 API on the 1.5.1 release you
> | don't need to be concerned about an API method you're using going  
> away.
> |
> | As I said, the people from the core group of developers who have
> | traditionally shepherded releases all think that doing a 1.4.1
> | release wouldn't be the best investment of their time. You are most
> | welcome to disagree and volunteer your time to coordinate the 1.4.1
> | release, and a lot of people will appreciate your efforts -  
> including
> | the bioperl developers and 'core'. It shouldn't be much work
> | theoretically.
> |
> | 	-hilmar
> |
> +----]
>
> I understand that, being a volunteer project, people can
> decide where to best invest their time. If core developers
> are no longer using 1.4 in their production setups, it is
> reasonable to expect that they invest all of their time in
> 1.5 or any other bioperl version that they're using.
>
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.
>
> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
>
> But you only get this by setting and enforcing a policy.
>
> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.
>
> I guess that this is what was meant by Phillip.
>
> Fernan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Mon Jun 26 10:44:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 15:44:55 +0100
Subject: [Bioperl-l] Tests
Message-ID: <449FF2E7.3040101@sendu.me.uk>

What level of testing is expected to be done in a test file? Is there 
such a thing as too many tests? Tests for every possible (documented) 
way of achieving a result with a module's method? Tests for every 
conceivable way of misusing a method?

If I come across a test for a module that doesn't test for everything 
the module can do, should I add tests as a matter of course? Would this 
be beneficial, or a waste of time (given that the module probably is 
bug-free already)?


From cjfields at uiuc.edu  Mon Jun 26 11:24:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:24:00 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <001301c69934$909f83c0$15327e82@pyrimidine>

...
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.

In a project this large which relies on a lot of outside resources
maintaining API and availability at all times, having a completely bug-free
fix for any reasonable length of time is impossible.  As a small example,
almost every time NCBI changes BLAST output, it breaks our text parsers, and
though we recommend using the BLAST XML format parser (which is much more
stable), almost everybody continues using text parsing and wants that fixed.
Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
notification, so remote BLAST parsing can break at any time.  Fold into that
any software changes that change output or API (PAML comes to mind).  Fold
into that remote database changes (EBI interface to Swissprot).  Oh, let's
not forget sequence format changes (recent SwissProt and GenBank changes).
And, worst of all, we can't expect them to maintain API or output b/c
they're updating based on user input/suggestions or bug fixes which require
them to make changes.  What's 'stable' about that?

It's very easy to say you want something and then not volunteer to do it; if
you want something then put forth the time and effort to get it done.  Put
your money where your mouth is (as they say in my home state).

Again (for the third or fourth time now), putting together a release takes
some time and effort.  I actually think it takes more effort than Hilmar
suggests; either way, it requires someone to act as the leader (release
pumpkin) to handle changes, and I don't see anybody stepping forward.
Personally, if I have the time, maybe I'll handle an interim release, but
I'm looking for a job starting in the fall as well as finishing up research
for publication so that will take up almost all the time I have.  As Hilmar
says, if you want to do it, fine.  Realize, though, many many changes have
been made since 1.4 and many more will likely be made on the road to 1.6

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.

This is a large open-source project with a ton of developers all over the
world.  Check out the AUTHORS file; it's at best incomplete and still has
about 100 contributors.  

(Hey, my name's not on there!!!)

> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
> 
> But you only get this by setting and enforcing a policy.

You need to realize what this project is, what it is not, and how it
evolved.  A little history lesson might get you (and others) to understand
just how complex it all is (and how old some of the code is).

http://www.bioperl.org/wiki/FAQ#Can_you_explain_the_Object_Model_design_and_
rationale.3F

explains a bit on the project design.

http://www.bioperl.org/wiki/History_of_BioPerl

explains how BioPerl came to be.  

This is not a job or a company but an open-source project; it's origins are
based in the scientific community.  You're probably right about the person
not committing the change to the 1.4 branch.  We probably should have a
policy for commits to stable releases.  But how can we logically rationalize
doing so now for 1.4, almost three years hence?  We're post 1.5.1 and likely
going into 1.6 as we speak.  It's too late for 1.4 changes IMHO, frankly,
but you're welcome to try.  I don't think it's worth the effort.

As for policy enforcement, what would you want us to do?  This is a
volunteer effort.  Fire him/her?  Frankly they should be commended for
getting the fix committed in the first place, and if someone points out that
it should be committed to the 1.4 branch then fine; it shouldn't be hard to
do so even long after the commit to the main branch is made.  It just
requires someone to do so.

Again, this is NOT your typical CPAN module with one or two developers or a
project that relies on doing one thing very well.  This project has over 100
developers and is supposed to do everything adequately (and many things very
well). 

> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.

You can download a tarball from the latest CVS code at any time.  There is a
link for doing just that at the bottom of the anonymous CVS page:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/

Chris


From hlapp at gmx.net  Mon Jun 26 11:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 11:30:05 -0400
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
References: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>


On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:

> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests?

No, not really.

> Tests for every possible (documented)
> way of achieving a result with a module's method?

Ideally that's the minimum.

> Tests for every conceivable way of misusing a method?

If some or known already (from reports) or you think can be  
anticipated, yes. Generally, if a method documents what are invalid  
values for its input it's a good idea to test what the method does if  
supplied with such values. The one thing it shouldn't do is silently  
ignore them, or produce a result anyway (which presumably would be a  
wrong result by definition).

>
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would  
> this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?

It would certainly be beneficial. It'd be great if you were willing  
to volunteer for this.

Note that a module being bug free now doesn't mean it always will be.  
The main point of tests is not only to weed out bugs at the time it  
is written, but also to make sure that future changes to the module  
itself, or to other modules it interacts with or inherits from, don't  
break it.

	-hilmar

>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Mon Jun 26 11:39:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:39:25 +0100
Subject: [Bioperl-l] Tests
In-Reply-To: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
References: <449FF2E7.3040101@sendu.me.uk>
	<AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
Message-ID: <449FFFAD.40506@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:
>
>> If I come across a test for a module that doesn't test for everything
>> the module can do, should I add tests as a matter of course? Would this
>> be beneficial, or a waste of time (given that the module probably is
>> bug-free already)?
> 
> It would certainly be beneficial. It'd be great if you were willing to 
> volunteer for this.

I doubt I have time to do this on the global scale[*], but certainly I 
will for the modules I work on.


Cheers,
Sendu.

* Though... it would certainly be a good way of getting to know all of 
Bioperl intimately!

From bix at sendu.me.uk  Mon Jun 26 11:42:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:42:33 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00069.6010107@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).
> 
> The general tidy up will involve redoing and perhaps even removing 
> things.

Does anyone know what the intent behind the split Bio::Map::MappableI 
and Bio::Map::MarkerI was? I somehow get the impression these started as 
one interface but then became two. The split /seems/ to be MappableI as 
a map element with one position on one map, whilst MarkerI is a map 
element with multiple positions on multiple maps. But MarkerI has no 
synopsis or description, and MappableI says it does what MarkerI does 
(but doesn't). So I'm left guessing atm.

Do we want to keep the split? If yes, what exactly should be the 
difference between the two? If no, would it be ok to just get rid of 
MarkerI (folding it back into MappableI)?


From cjfields at uiuc.edu  Mon Jun 26 11:45:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:45:51 -0500
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <001a01c69937$9a1c1320$15327e82@pyrimidine>

My opinion: tests should cover methods and expected results and are based on
what the module actually accomplishes.  Some classes (like SeqIO, SearchIO)
are normally relatively easy to build tests for b/c the expected results are
in the file being parsed.  Tests which check calculated results from modules
(Bio::Align::DNAStatictics for instance) I would think are trickier since
you should confirm the calculations are correct through independent means.

Links:

http://www.bioperl.org/wiki/Advanced_BioPerl#Designing_Good_Tests

http://search.cpan.org/~mschwern/Test-Simple-0.62/lib/Test/Tutorial.pod

The link above uses Test::Simple or Test::More; we use Test (but have
considered moving to Test::More using Devel::Cover).

My 2c

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 26, 2006 9:45 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Tests
> 
> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests? Tests for every possible (documented)
> way of achieving a result with a module's method? Tests for every
> conceivable way of misusing a method?
> 
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Mon Jun 26 12:15:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 17:15:32 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00824.20002@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).

To do this I actually need to make some slightly more significant API 
changes than I had hoped. To make Position central, all maps, mappables 
and markers need to be able to add and remove Positions (and similar 
things). As I see it, we can say that such methods are fundamental to 
the coordination required between Bio::Map modules. I feel that I'm 
therefore justified in implementing these kinds of methods in the 
interfaces (which would allow all the downstream modules that implement 
those interfaces to work in the new system without much/any alteration).

Am I justified? Should I try harder to do it without implementations in 
the interfaces?

From pmiguel at purdue.edu  Mon Jun 26 12:53:56 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Mon, 26 Jun 2006 12:53:56 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <001301c69934$909f83c0$15327e82@pyrimidine>
References: <001301c69934$909f83c0$15327e82@pyrimidine>
Message-ID: <44A01124.5040102@purdue.edu>

Chris Fields wrote:
> ...
>   
>> However, when view as an issue related to the setting of a
>> policy for the whole project, then it makes sense to have a
>> policy saying for how long a stable release will be
>> supported, and when and in which case bugfixes that are committed
>> to and tested in the development branch (as it should be)
>> will get merged back to stable.
>>     
>
> In a project this large which relies on a lot of outside resources
> maintaining API and availability at all times, having a completely bug-free
> fix for any reasonable length of time is impossible.  As a small example,
> almost every time NCBI changes BLAST output, it breaks our text parsers, and
> though we recommend using the BLAST XML format parser (which is much more
> stable), almost everybody continues using text parsing and wants that fixed.
> Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
> notification, so remote BLAST parsing can break at any time.  Fold into that
> any software changes that change output or API (PAML comes to mind).  Fold
> into that remote database changes (EBI interface to Swissprot).  Oh, let's
> not forget sequence format changes (recent SwissProt and GenBank changes).
> And, worst of all, we can't expect them to maintain API or output b/c
> they're updating based on user input/suggestions or bug fixes which require
> them to make changes.  What's 'stable' about that?
>
> It's very easy to say you want something and then not volunteer to do it; if
> you want something then put forth the time and effort to get it done.  Put
> your money where your mouth is (as they say in my home state).
>
> Again (for the third or fourth time now), putting together a release takes
> some time and effort.  I actually think it takes more effort than Hilmar
> suggests; either way, it requires someone to act as the leader (release
> pumpkin) to handle changes, and I don't see anybody stepping forward.
> Personally, if I have the time, maybe I'll handle an interim release, but
> I'm looking for a job starting in the fall as well as finishing up research
> for publication so that will take up almost all the time I have.  As Hilmar
> says, if you want to do it, fine.  Realize, though, many many changes have
> been made since 1.4 and many more will likely be made on the road to 1.6
>
>   
Hi Chris et al.,

    I was just reporting the situation from where I sit. I think this 
issue was important enough to bring to everyones attention. I've done so 
and I'm more than satisfied with the response. I hope my emails were not 
too abrasive.
    I've have now read the wiki about coordinating a release. You are 
right, that does sound hard. At least to me--I've never even used CVS, 
nor contributed a module to CPAN. I just don't see myself as being 
qualified to coordinate a 1.4.1 release. So since I'm not, for that 
reason, able to volunteer to do it myself, I'll withdraw my request for 
a new release to CPAN.
    That being said, I think Fernan's suggestion bears keeping in mind 
once 1.6 has been released and bug fixes are being committed. By that 
time, I hope I'll be savvy enough to help out in the process.
    Thanks for your attention,

Phillip SanMiguel
   

From fernan at iib.unsam.edu.ar  Mon Jun 26 15:24:51 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 16:24:51 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
	<ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
Message-ID: <20060626192451.GB53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 11:01):
| 
| On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:
| 
| >I'm not knowledgeable enough about the bioperl release
| >engineering process, nor about the internal development
| >process, but just guessing I'd expect that whenever anyone
| >submits a bugfix, it should be the responsibility of
| >the committer to check (against the project policy,
| >(written or implicit) or with the core developers in a
| >difficult case) whether the fix should be committed to more
| >than one branch.
| >
| >A patch like the one that started this thread, should have
| >been committed to the 1.4 branch without too much thinking.
| >And it would have cost the committer only a few seconds more
| >of her/his time.
| 
| Sure. But for some reason he or she forgot. So what do you suggest we  
| do - and I mean as a community, because this is a community project.  
| Come after the guy until he commits it to the branch? 

No, I never said or implied that.

| Or post an email to the list saying what you think is the
| right way and then do  it (yourself)?

Of course I could volunteer some of my time to
do that (that is, go over the commit history and see what
changes could be merged back to 1.4, if that seems to be
useful), provided I get a polite reply to my 'email
to the list saying what [I] think is the right way'.

I'm a volunteer in other open source, community projects,
and I do contribute regularly so I see no problem except the
obvious scarcity of free time in doing the same for bioperl.

| >But you only get this by setting and enforcing a policy.
| 
| Man, this is not a company. Take a step back and think again. What do  
| you suggest we - again we as a community - do to enforce a policy?  
| Take increasing levels of disciplinary action if someone keeps  
| forgetting to commit to the branch?

Seems like you were pissed off by what I said ...

What I was just trying to say is that merely by formulating
and communicating a policy you could be taking steps towards
making it a reality. Maybe 'enforcing' was an unfortunate
word to use here ... 

You don't have to punish anyone, just sending a polite email
to the list reminding people about the policy once in a
while, should be enough. It's OK if some committer doesn't
care, or just forgets about doing the right thing once in a
while ...

But of course, you might be pissed off by me talking about
something that I know nothing about (the devleopment of
bioperl), given that I'm just a bioperl user.

Perhaps my mistake was to bring here ideas from
other projects (in which I do contribute regularly) without
realizing that, not being a contributor, I could be
punished for suggesting how things could be done better.

| While there are clearly some rules everybody needs to follow and if  
| you violate them deliberately and repeatedly you will get your CVS  
| privileges withdrawn, by and large we as a community need to accept  
| some responsibility for making the project what we think it should be  
| - and do so not by invoking disciplinary action but by living by  
| example and by taking action yourself when you think action is due.

I completely agree. When I said 'setting a policy' I just
meant something along the lines of clearly stating what are
those 'rules everybody needs to follow'. My suggestion was
to add a 'merge trivial fixes back to stable' rule to that
list.

I agree with Jason: why is that such a big deal. 

| If Bioperl were a company and you asked for a 1.4.1 release and the  
| customer service rep told you nope there's a 1.5.1 that you should  
| use instead and that will do just fine, what will you do? Argue with  
| him about the company policies and whether they are properly enforced  
| or not?
| 
| Obviously doing so will be a waste of your time. In Bioperl it is at  
| the bottom of it no less waste of your time, because instead you now  
| have the opportunity to make happen what you believe needs to happen.

Right, but first i have to realize what needs to happen. I
realized it when I read your reply to Philips message.

I then proceeded to write my thoughts and send them to the
list, to see what kind of feedback I get. 

Hopefully, someone with commit privileges would think that
what I said makes sense and just proceed to doing it (saving
me from the task :)

Or perhaps, someone, as Jason did, would say that it's
not worth to try to merge back things to 1.4 and move
forward instead. In his message he even explained what the
problems and needs are (lack of man-time, need for
volunteers) and politely asked for help.

| We have had a history of rapidly and un-bureaucratically putting  
| people in power of what they wanted to do. We have also had a history  
| of not listening much to people who don't want to put their feet  
| where their mouth is.

I would call your reply (this message) a barrier of entry
for new developers. In the above paragraph I guess you are
referring to the bioperl motto: 'whoever codes it wins'.
That is true in any open source project. But at least to me,
that doesn't say that you should not listen to people just
because they haven't contributed a single line of code.

| I'm sorry if what I'm saying puts people off, but really this is an  
| open-source project and if you ask me it's one with the least  
| barriers of entry for new developers or 'activists' that you can find  
| in the open source arena. 


Let me disagree. The barriers of entry are not just the
giving away of a developer accounts and/or repository write
privileges. 

I'm a regular contributor in another open source, community
project (FreeBSD) that has more and higher barriers of entry
with respect to giving away privileges (for example for
committing changes to the repository). Nonetheless FreeBSD
has historically shown to have few and low barriers of entry
for incorporating people to the project (without the need to
give away commit privileges, making them responsible for
parts of the FreeBSD source code/documentation/ports/etc).

IMO, that comes from a very good communication of the
direction of the project, what needs to be done, how to do
it, and a tendency of privileged and older members to listen
to people's suggestions, inviting and helping people
to jump the fence and become part of the project. It's not
an untought occurrence that FreeBSD has ?mentors? that
introduce new members, help them to get acquainted with how
the project works, policies, etc. and supervise their
actions.

| This doesn't come without some degree of  
| anarchy, but really IMHO that's more of an advantage than a  
| disadvantage.
| 	-hilmar
|
+----]

Fernan

PS: finally, let me just add that english is not my native
language. Although I'm quite familiar with it, once in a
while, an unfortunate choice of words might blur my intented
meaning or the strength I wanted to convey. In case that has
been the case, let me put clearly that it has not been my
intention to criticize the way the project does things, but
to suggest ideas for the future (merge back trivial changes
to a 'stable' branch as a policy) based on my experience
with other projects. Whether that fits bioperl or not was
what I would have expected as a reply.

From cjfields at uiuc.edu  Mon Jun 26 16:18:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 15:18:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626192451.GB53298@iib.unsam.edu.ar>
Message-ID: <002701c6995d$b738f790$15327e82@pyrimidine>


> | >A patch like the one that started this thread, should have
> | >been committed to the 1.4 branch without too much thinking.
> | >And it would have cost the committer only a few seconds more
> | >of her/his time.
> |
> | Sure. But for some reason he or she forgot. So what do you suggest we
> | do - and I mean as a community, because this is a community project.
> | Come after the guy until he commits it to the branch?
> 
> No, I never said or implied that.

Right, you didn't say that.  But you didn't clarify your statements either.
I think you're treading into dangerous waters when you come in and criticize
something w/o bothering to read up on how things have been done here.  As
you say yourself below, it's 'something that I know nothing about (the
devleopment of bioperl), given that I'm just a bioperl user'.  It's akin to
"I don't think you're coding things correctly, here's the right way to do
it" w/o knowing what the code is used for.

> | Or post an email to the list saying what you think is the
> | right way and then do  it (yourself)?
> 
> Of course I could volunteer some of my time to
> do that (that is, go over the commit history and see what
> changes could be merged back to 1.4, if that seems to be
> useful), provided I get a polite reply to my 'email
> to the list saying what [I] think is the right way'.

You will get a polite email when you respond politely.  I actually agree
with many things you say, but you sure aren't making any friends here by the
way you consistently take the opposite stance and judge what other people
do.  I think you have a point about having a stable release be supported for
a period of time.  My point is, how long?  We didn't really get an idea of
that from you, did we?

> I'm a volunteer in other open source, community projects,
> and I do contribute regularly so I see no problem except the
> obvious scarcity of free time in doing the same for bioperl.

And others here also volunteer elsewhere (GMOD, DAS, Ensembl, etc).  Don't
presume we don't have experience in open-source.  That's being pretty
judgmental.  

> | >But you only get this by setting and enforcing a policy.
> |
> | Man, this is not a company. Take a step back and think again. What do
> | you suggest we - again we as a community - do to enforce a policy?
> | Take increasing levels of disciplinary action if someone keeps
> | forgetting to commit to the branch?
> 
> Seems like you were pissed off by what I said ...

????Ya think????  

You know, okay, forget it.  This is completely non-productive.  We'll all
agree to disagree, argue, whatever.  The points made here, as I see them:

1)  Commits should be made to stable releases (as well as to the main branch
in CVS) to fix bugs as long as that release is supported.  I agree with
this, but someone has to volunteer, and the length of time a release is
supported also worked out.  Almost would be better going to a regular
release schedule (once every 3-6 months or so) where the code is given as is
to CPAN, whether it passes tests or not.

2)  More communication about the direction Bioperl is heading; personally I
haven't see a problem with this as much as there is no information about a
roadmap.  That is being alleviated soon I believe, thought people out there
need to be patient.

3)  Volunteer.  If you have something you believe needs to be done and you
believe so fervently, then put up or shut up.  Make (nice polite)
suggestions otherwise.  Don't judge code or "the way things are done" and
don't presume what kind of experience people have that you don't know and
haven't met.  End of story.

Chris


From torsten.seemann at infotech.monash.edu.au  Mon Jun 26 22:57:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 27 Jun 2006 12:57:47 +1000
Subject: [Bioperl-l] Comments on new PDOC documentation
Message-ID: <44A09EAB.2030401@infotech.monash.edu.au>

Hello all,

I am very happy to see the PDOC software has been improved, as I use the 
  online web documentation frequently. Thanks to Jason, Raphael and 
Patrick for making this happen.

http://doc.bioperl.org/bioperl-live/

Now for some comments...

1. CSS

It uses CSS which is excellent, reducing HTML size and allowing easy 
tweaks to the design. However its current implementation has some issues:

A. it seems to only use ID, rather than CLASS, to specify styles.
    ID values must be unique in a page, and are for one-off styles.
    CLASS may be re-used throught a page. eg "sub" and "subArea".
    Many browsers do not enforce this however...

B. it seems to be doing unusual, but possibly deliberate, things with
    the POD when determining what CSS ID to give it, but perhaps this is
    more to do with how Bioperl formats the POD on some subheadings
    eg.
    <a name="_pod_Reporting Bugs" id="_pod_Reporting Bugs">
    <a name="_pod_AUTHOR - Ewan Birney" id="_pod_AUTHOR - Ewan Birney">

C. the "Description" sections etc are in a proportional font, but
    I think it should be "font-family: monospace" as many authors have
    exploited the traditional monospace of most editors to format
    their comments, which are now lost

2. FRAMES

I notice it still uses HTML Frames. Although this reduces code size 
also, it makes it impossible to LINK directly to a specific 
documentation page with all the frames intact. It may be better to use 3 
DIV elements which are part of each page, and they could be server-side 
included so there is no HTML duplication.

3. MERGING OF BIOPERL DOCS

One facet of the docs I find frustrating is that bioperl-live and 
bioperl-run (and the others) are separate! This means that you have to 
keep switching between them, and more importantly, class-names to 
classes in other packages are not present; this is particularly bad when 
browsing bioperl-run.

Is there any chance of creating a "merged" bioperl-doc page somehow?

4. STYLE

Choice of colours and layouts is such a personal thing.
I guess people can download http://doc.bioperl.org/css/perl.css
and re-edit it, and get their Browser to over-ride the supplied CSS with 
  their version.

5. CONCLUSION

Please don't get the wrong idea, I love the new PDOC, I would just like 
to love it more. And yes I understand the nightmare that is parsing 
Perl/POD and generating compatible CSS :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From bix at sendu.me.uk  Tue Jun 27 06:21:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 11:21:57 +0100
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <44A106C5.9040706@sendu.me.uk>

Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
Is the idea of a Bio::Score of interest? See bug, but basically an 
object that can handle multiple kinds of scores effectively.

I would like to use such a thing in Bioperl, but what standard needs to 
be met before Bioperl gets a new kind of object?

From hlapp at gmx.net  Tue Jun 27 08:24:16 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 08:24:16 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A106C5.9040706@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
Message-ID: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>

So you basically want to attach semantic information to a number, and  
type the number thereby?

If so, an ontology would be the more natural choice (and in the end  
more flexible one) for expressing this kind of information.

Have you looked at the concept of 'quantitation types', e.g. in MAGE  
(the XML [MGAE-ML] or the object model [MAGE-OM])?

There is no quantitation type ontology at a repository I know of. I  
have used my own ones in the past and they have been pretty useful.

	-hilmar

On Jun 27, 2006, at 6:21 AM, Sendu Bala wrote:

> Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
> Is the idea of a Bio::Score of interest? See bug, but basically an
> object that can handle multiple kinds of scores effectively.
>
> I would like to use such a thing in Bioperl, but what standard  
> needs to
> be met before Bioperl gets a new kind of object?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 08:52:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 13:52:05 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A129F5.3030500@sendu.me.uk>

Hilmar Lapp wrote:
> So you basically want to attach semantic information to a number, and 
> type the number thereby?

Basically, I want to be able to stick a bunch of (different kinds of) 
numbers into an object, and later get the 'best' one out (of a 
particular kind), or sort multiple of those objects.


> If so, an ontology would be the more natural choice (and in the end more 
> flexible one) for expressing this kind of information.

I'm not really sure I understand 'and type the number', or what (useful) 
flexibility doing it with an ontology would provide.


> Have you looked at the concept of 'quantitation types', e.g. in MAGE 
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

I had a quick look, but not really sure what you intended to suggest here.


> There is no quantitation type ontology at a repository I know of. I have 
> used my own ones in the past and they have been pretty useful.

Can you provide a brief example of what you mean?

If it would be appropriate to implement a Bio::Score with an ontology 
that's fine. Would we want a Bio::Score implemented though? Or are you 
suggesting each module make it's own quantitation type ontology when it 
wants to deal with numerous scores?

I like the idea of a Bio::Score because then you can compare complex 
scores from multiple different unrelated modules.

From cjfields at uiuc.edu  Tue Jun 27 10:08:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 09:08:57 -0500
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
Message-ID: <001e01c699f3$3b6cda50$15327e82@pyrimidine>

> Hilmar Lapp wrote:
> > So you basically want to attach semantic information to a number, and
> > type the number thereby?
> 
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.

The 'best one' might be tricky when dealing with different kinds of scores,
esp. scores calculated different ways.  For instance, I run RNA motif
programs quite frequently (RNAMotif, ERPIN, Infernal), but all generate
'scores' based on different criteria (algorithms, different parameters, how
the author slept, and so on).  RNAMotif in particular is hard to deal with
(though a great program) b/c the scores are based on criteria in the
descriptor file (the file used to describe the motif), so aren't comparable
to other descriptors, which may have their own method of generating scores,
let alone output from other programs.  Which one would be 'the best?'  It's
a bit subjective since the scores are predictive based upon your input,
various program limitations, specific program parameter implementations,
etc.  

I do like the idea of grouping together scores for comparison, such as when
a particular region of DNA has multiple hits from different programs with
different scores.  It would at least suffice as a test on how various
programs or experimental data would compare with one another.

> > If so, an ontology would be the more natural choice (and in the end more
> > flexible one) for expressing this kind of information.
> 
> I'm not really sure I understand 'and type the number', or what (useful)
> flexibility doing it with an ontology would provide.

I'm not sure, but maybe something along the lines of what the number (the
score) actually means, especially when compared to other scores.  In other
words, how you could compare one score or number versus the other.  An
ontology would allow more complex information to be included along with the
score information so one could make more informed choices based on how the
score was obtained, the algorithm used, the program involved, etc.  Hence
flexible.  Is that close, Hilmar?

To use my RNA program example above, I could include the information about
how the scores were obtained, the programs involved, parameters used, the
various raw scores, the time it took to run the program, etc. (i.e. you
could make it as specific as you wanted).  This could also be extended to
other data types as well besides program, such as wet bench experimental
data and so on, which I deal with quite a bit.  I think there are a few XML
specs out there besides MAGE that do this as well but I can't think of any
off the top of my head.

> > Have you looked at the concept of 'quantitation types', e.g. in MAGE
> > (the XML [MGAE-ML] or the object model [MAGE-OM])?
> 
> I had a quick look, but not really sure what you intended to suggest here.

I think the idea is that MAGE, strictly as an example, deals with microarray
data from different sources or different data systems for comparison.
Sounds a little like what you want to do.

> > There is no quantitation type ontology at a repository I know of. I have
> > used my own ones in the past and they have been pretty useful.
> 
> Can you provide a brief example of what you mean?
> 
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology when it
> wants to deal with numerous scores?
> 
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.

Which is what MAGE does in a way, but more specifically, i.e. just
microarray data from different sources.  So the array data may be calculated
in different ways based upon the specs for different machines, the way array
slides were prepared, how the experimenter slept, etc.

Chris


From hlapp at gmx.net  Tue Jun 27 10:27:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 10:27:55 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
Message-ID: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>

I would have suggested initiating a quantitation type ontology, not  
one individual per module.

An ontology would capture all your semantic information (min/max or  
range, higher or lower is better, what is a reasonable default [not  
sure there would be one], etc) and you would have a hierarchical  
structure.

You type a score by associating it with an ontology term:

	BLAST_e-value is-a expectation_value
	expectation_value has-min-value 0
	expectation_value has-max-value positive_infinity
	BLAST_p-value is-a probability_value
	probability_value has-min-value 0
	probability_value has-max-value 1
	
etc and then something being an expectation_value for instance would  
imply several attributes laid down in the ontology (probably through  
has-a statements).

It seems to me that essentially what you are trying to do is  
capturing knowledge for particular types of scores, which you would  
then use in more general purpose programs to sort from more to less  
significant, and possibly filter? If so, then hard-coding this into  
objects (all over the place or in a single place) is typically not  
the best practice; rather, the usual best-practice approach is using  
(and if necessary, constructing) an ontology. This is also the most  
re-usable approach.

	-hilmar

On Jun 27, 2006, at 8:52 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> So you basically want to attach semantic information to a number, and
>> type the number thereby?
>
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.
>
>
>> If so, an ontology would be the more natural choice (and in the  
>> end more
>> flexible one) for expressing this kind of information.
>
> I'm not really sure I understand 'and type the number', or what  
> (useful)
> flexibility doing it with an ontology would provide.
>
>
>> Have you looked at the concept of 'quantitation types', e.g. in MAGE
>> (the XML [MGAE-ML] or the object model [MAGE-OM])?
>
> I had a quick look, but not really sure what you intended to  
> suggest here.
>
>
>> There is no quantitation type ontology at a repository I know of.  
>> I have
>> used my own ones in the past and they have been pretty useful.
>
> Can you provide a brief example of what you mean?
>
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology  
> when it
> wants to deal with numerous scores?
>
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 11:25:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 16:25:06 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
Message-ID: <44A14DD2.7000402@sendu.me.uk>

Hilmar Lapp wrote:
> I would have suggested initiating a quantitation type ontology, not one 
> individual per module.

Where would such a thing 'live'? Would it be some static file somewhere 
that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology 
that can added to by a module when it needs extra terms to describe its 
particular kind of scores?


> An ontology would capture all your semantic information [snip]

Thanks, I agree that an ontology would be the way to do it...


> It seems to me that essentially what you are trying to do is capturing 
> knowledge for particular types of scores, which you would then use in 
> more general purpose programs to sort from more to less significant, and 
> possibly filter?

Yes.


> If so, then hard-coding this into objects (all over the 
> place or in a single place) is typically not the best practice; rather, 
> the usual best-practice approach is using (and if necessary, 
> constructing) an ontology. This is also the most re-usable approach.

Not having any experience with ontolgies, I can't think how this would 
all be done in practice though. Don't we need some central module 
(Bio::Score) to create the ontology (or read it in) and then present 
some suitable interface to it? For example, modules that wanted to store 
some scores might just ask Bio::Score for the ontology and type their 
scores by associating with an available ontology term, creating new 
terms if necessary (or is that something you would never do; the 
ontology needed to have been set up to cover all possible terms?). Then 
when the user has a bunch of these typed scores, surely he doesn't want 
to deal with going through the ontology himself to work out what it all 
means? Well, he could if he needs that level of control, but also he 
just wants to say Bio::Score->sort(x y z) or something.

From bix at sendu.me.uk  Tue Jun 27 12:13:46 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 17:13:46 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
Message-ID: <44A1593A.809@sendu.me.uk>

Cook, Malcolm wrote:
>
> All this semantic cruft is overkill for a moving target and will never
> settle down until your analysis results are no longer relevant.

I'm not sure what you mean by that. What moves? An evalue will always be 
an evalue. Once you know that you are in fact dealing with an evalue, 
and once your sorting algorithm knows that lower evalues are better, 
nothing changes. Likewise for other kinds of scores.

Instead of having to discover that a particular program is giving you an 
evalue, and then writing code to deal with an evalue appropriately, I 
thought it would be nicer to have a single module that knew how to deal 
with it already.

From MEC at stowers-institute.org  Tue Jun 27 12:01:45 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 27 Jun 2006 11:01:45 -0500
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>

For the use case of TFBS analysis demonstrated in the attachment to the
bug, I would expect to find potentially three scores, ala, {evalue,
bitscore, and percentmatch}.  To deal with this in existing framework
(i.e. GFF/bioperl analysis modules/TFBS), I would try to make GFFx eat
scalars as scores and pack the three values into a string and unpack
them as needed for sorting, etc.  Else put the one score I know I'm
going to 'use' in a particular analysis into 'score' and adorn column 9
with the rest.

All this semantic cruft is overkill for a moving target and will never
settle down until your analysis results are no longer relevant.

my $.02

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>Sent: Tuesday, June 27, 2006 5:22 AM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::Score of interest?
>
>Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
>Is the idea of a Bio::Score of interest? See bug, but basically an 
>object that can handle multiple kinds of scores effectively.
>
>I would like to use such a thing in Bioperl, but what standard 
>needs to 
>be met before Bioperl gets a new kind of object?
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bix at sendu.me.uk  Tue Jun 27 14:07:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 19:07:44 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
References: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
Message-ID: <44A173F0.4040302@sendu.me.uk>

Chris Fields wrote:
>> Hilmar Lapp wrote:
>>> So you basically want to attach semantic information to a number, and
>>> type the number thereby?
>> Basically, I want to be able to stick a bunch of (different kinds of)
>> numbers into an object, and later get the 'best' one out (of a
>> particular kind), or sort multiple of those objects.
> 
> The 'best one' might be tricky when dealing with different kinds of scores,
> esp. scores calculated different ways.

I didn't make myself very clear, but you don't compare different kinds 
of scores. When you want to compare two different Score objects, each of 
which may contain multiple different kinds of scores, you pick the kind 
of score you're interested in, and for that kind of score ask which 
object has the 'best' score. I can't readily think of any exceptions to 
the rule that 'best' is either the higher score or the lower score, 
depending on what kind of score you've chosen.

I may not have made myself clear in another way. One of the ideas behind 
a Bio::Score is to have a container object for multiple different kinds 
of scores (and even multiple values per kind) all generated by one 
program in one analysis on one data set.
The container then lets you pick the kind of score you want to work with 
and compare its scores with those in other Bio::Score objects that 
contain the same kind of score (most probably, ones made by the same 
analysis program but on different data sets).

Furthermore, the kind of score you want to work with could have multiple 
values from that single analysis. So the container also lets you 
summarise these values (eg. average them) before trying to compare with 
another Score object. Often, it may be that for a certain kind of score 
it makes sense (it is intended by the score-generating program) to 
always summarise the values in a certain way. So the container needs to 
know about that and 'do the right thing' so the user can just compare 
things without having to trouble himself.

So this is why I feel that to just 'use an ontology' isn't enough. 
Certainly one ought to be used when defining the kinds, but you need 
some single interface with useful methods that lets you deal with the 
actual score values easily.

From cjfields at uiuc.edu  Tue Jun 27 14:56:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 13:56:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060627181439.GD51742@iib.unsam.edu.ar>
Message-ID: <000a01c69a1b$6d0338c0$15327e82@pyrimidine>

> | 1)  Commits should be made to stable releases (as well as to the main
> branch
> | in CVS) to fix bugs as long as that release is supported.  I agree with
> | this, but someone has to volunteer, and the length of time a release is
> | supported also worked out.
> 
> I volunteer to do that (merge approved changes/fixes back to
> a stable branch), though as said by others, 1.4 may not be
> the most appropriate 'stable' branch, as too many changes
> have accumulated, and maybe it's not worth it. But I could
> do that for the next 'stable' release, 1.6 or 2.0 whichever
> comes next.
> 
> As per the length of time, I would say that a stable release
> should be supported at least until another 'stable' release
> is made. Or until it's no longer being used in production
> setups, which is only feasible to know in small
> communities.

I'm posting this to the mail list so that others can respond.

Kevin Brown (in a response to me) made some good points about updating and
maintaining stable releases in that only bug fixes are committed (i.e. no
refactoring, no new modules or features).  I personally wouldn't have a
problem in someone doing this, releasing periodic updates to stable or
developer releases to fix bugs only but I may be in the minority here.  The
rest of the core guys and others need to also speak their thoughts.  I hate
forwarding this to Jason since he's in the middle of getting ready for a
move but I think this is important enough to do so.

I can say that I am unequivocally against updating 1.4.  Too much has
changed since then and I think it would be a mess trying to figure out what
bug fixes to include, etc.  

I also am very much against placing developer's releases in CPAN; those
releases are not intended to be completely stable as they may be
implementing new features that haven't been tested completely and may
contain various other bugs.  v 1.5.1 is remarkably stable for a developer's
release but several bug fixes have been made since.  If someone wants to try
out the developer's versions or bioperl-live they are most welcome to it;
the web site docs give all the instructions one needs to install from pretty
much any platform.

Beyond that, I'm spent on this thread.

Chris 


From lstein at cshl.edu  Tue Jun 27 18:35:08 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 27 Jun 2006 18:35:08 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <200606271835.09558.lstein@cshl.edu>

Hi All,

This is rather late, but just for future reference on the mailing list,  here 
is how I would do the task using Bio::DB::Fasta.

Script 1: index the file for future use:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	
	my $filename = shift;  # name of file to index on command line
	Bio::DB::Fasta->new($filename,-makeid=>\&make_my_id)
		or die "Indexing failed";
	print "Indexing succeeded!\n";
	exit 0;

	sub make_my_id {
		my $description_line = shift;
		$description_line =~ /(\d+_at)/ or die "malformed description line";
		return $1;
	}

Run this script once to create a reusable index of the file. The index will be 
stored in the same directory as the FASTA file.

Script 2: extract the sequences using the IDs stored in a second file:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	use Bio::SeqIO;
	use IO::File;

	my $indexed_fasta_file = shift;
	my $probe_id_file         = shift;

	# open up the indexed fasta file
	my $db = Bio::DB::Fasta->new($indexed_fasta_file) or die;
	# open up a FASTA writer
	my $out = Bio::SeqIO->new(-format=>'Fasta',-fh=>\*STDOUT) or die;
	# open the probe id file
	my $in   = IO::File->new($probe_id_file) or die;

	# do the work
	while (my $id = <$in>) {
		chomp $id;
		my $seq = $db->get_Seq_by_id($id) or die;
		$out->write_seq($seq);
	}

	exit 0;

Bio::Index::Fasta will work in almost exactly the same way. The only 
difference is that the Bio::DB::Fasta will allow you to retrieve subsequences 
efficiently.

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From awitney at sgul.ac.uk  Tue Jun 27 10:08:20 2006
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 27 Jun 2006 15:08:20 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A13BD4.60802@sgul.ac.uk>


> Have you looked at the concept of 'quantitation types', e.g. in MAGE  
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

the MGED Ontology has a concept of quantitation type if that helps

http://mged.sourceforge.net/ontologies/MGEDontology.php#QuantitationType


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From william.hsiao at gmail.com  Tue Jun 27 15:52:03 2006
From: william.hsiao at gmail.com (William Hsiao)
Date: Tue, 27 Jun 2006 12:52:03 -0700
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
Message-ID: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>

Hi all,
   I've encountered a strange problem while parsing a gff file from
NCBI using perl.  I'm hoping that someone on the list may have a
solution even though this is not a bioperl issue.  Maybe someone
familiar with gff3 parsing can help :)  Essentially, I'm parsing a gff
file into a nested hash structure using the following functions:

sub parse_gff {
    my $file = shift;
    my %hash_gff;
    open (INFILE, $file) or die "Cannot find file $file\n";
    while(<INFILE>){
	next if (/^\#/);
	chomp;
	my ($seqid, $source, $type, $start, $end, $score, $strand, $phase,
$attributes) = split /\t/;
	my $attri_ref = &process_attributes($attributes);
	my %record = ('seqid'     => $seqid,
		      'source'    => $source,
		      'type'      => $type,
		      'start'     => $start,
		      'end'       => $end,
		      'score'     => $score,
		      'strand'    => $strand,
		      'phase'     => $phase,
		      'attribute' => $attri_ref);
	push @{$hash_gff{$type}}, \%record;
    }
    close INFILE;
    print Dumper %hash_gff;
    return \%hash_gff;
}

sub process_attributes {
    my $attr_string = shift;
    my @attributes = split (/\;/, $attr_string);
    my %attr;
    foreach (@attributes){
	my ($key, $value) = split /=/;
	if ($value=~/\:/){
	    my ($subkey, $subvalue) = split (/:/, $value);
	    $attr{$key}{$subkey}=$subvalue;
	}
	else{
	    $attr{$key}=$value;
	}
    }
    return \%attr;
}

   It works for all the gff files we downloaded from NCBI's microbial
genomes refseq ftp repository.  However, 3 lines from one particular
file NC_005966.gff (of Acinetobacter_sp_ADP1) can not be parsed
properly.  These lines are:

NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	start_codon	636487	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	stop_codon	635833	635835	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

   They generate an error: Can't use string
("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
 The strange part is that all I have to do is replace the word
"function" in front of "=adaptation%20to%20stress;" with another word
or simply change it to functions or functio or Function, etc, then the
line parses properly.  If I retype the word "function", it doesn't
solve the problem.  For some strange reason, when the word "function"
is there, perl tried to use "adaptation%20to%20stress" as the hash key
and failed.  The word "function" is used in other lines as well so I
don't think the problem is not caused by the word alone.
    Any suggestion on what might be happening would be greatly
appreciated.  Thank you.

Cheers,

Will

-- 
William Hsiao
PhD Student, Brinkman Laboratory
Department of Molecular Biology and Biochemistry
Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6
Phone: 604-291-4206 Fax: 604-291-5583

From bix at sendu.me.uk  Wed Jun 28 04:25:52 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 09:25:52 +0100
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
In-Reply-To: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
References: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
Message-ID: <44A23D10.1010308@sendu.me.uk>

William Hsiao wrote:
>
> sub process_attributes {
>     my $attr_string = shift;
>     my @attributes = split (/\;/, $attr_string);
>     my %attr;
>     foreach (@attributes){
> 	my ($key, $value) = split /=/;
> 	if ($value=~/\:/){
> 	    my ($subkey, $subvalue) = split (/:/, $value);
             # assign hashref to $key, assign key => value pair to that
> 	    $attr{$key}{$subkey}=$subvalue;
> 	}
> 	else{
             # assign scalar $key
> 	    $attr{$key}=$value;
> 	}
>     }
>     return \%attr;
> }

> NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

>    They generate an error: Can't use string
> ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
>  The strange part is that all I have to do is replace the word
> "function" in front of "=adaptation%20to%20stress;" with another word
> or simply change it to functions or functio or Function, etc, then the
> line parses properly.

The problem is that these lines contain function=x twice, where the 
second x contains a colon.
So your code first assigns $attr{function} = $scalar, and then tries to
do $attr{function}{before_colon} = "after_colon".

Normally the latter would auto-vivicate $attr{function} as a hash 
reference: $attr{function} == HASH(xyz) and then set before_colon => 
after_colon as a key value pair of HASH(xyz). But in this case, 
$attr{function} already exists: $attr{function} == 
"adaptation%20to%20stress". But you try and set before_colon => 
after_colon as a key value pair of that string. Which you can't do.

Basically, your data structure isn't so great, mixing scalars and hash 
references as values of %attr.

The solution may be to parse using Bioperl instead ;).

From selvik at ufl.edu  Tue Jun 27 08:54:48 2006
From: selvik at ufl.edu (Kadirvel, Selvi)
Date: Tue, 27 Jun 2006 08:54:48 -0400 (EDT)
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
Message-ID: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>

All,

(I am new to Bioinformatics and Bioperl, so please apologize if I 
get my terminology wrong)

I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
report consists of three sections namely;

1. A ranked list of the best scoring HMMs
2. A list of the best scoring domains in order of their occurrence 
in the sequence
3. Alignments for all the best scoring domains.

Section 3 can be truncated to a specific number using the ??A? 
option when building the report.

Though the Bio::SearchIO::hmmer module parses through the entire 
HMMER report (Section 1, 2 and 3), the set of values made 
available through Bio::Search::Result::ResultI seem to be using 
Section 3 alone. So when we use the ?A option to truncate, we lose 
otherwise useful information in Section 1. This information is 
lost (only) for those models that do not have any of their domains 
in the top ?A number of? best scoring domains. The fields that are 
not available are:

1.	Description of a model
2.	Score of a model
3.	Evalue of a model

If I use the older Bio::Tools::HMMER:Results module, NEITHER 
Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to 
retrieve the above listed values. Scores and Evalues are available 
for each domain but not for the model it belongs to.

I was wondering if there is any other method to access these 
values or do I have to write my own module to do this?

Any ideas/suggestions would be greatly appreciated.

Thank you!


Selvi Kadirvel

Graduate Research Assistant
High Performance Computing Center
University of Florida


From hlapp at gmx.net  Tue Jun 27 20:18:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 20:18:36 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A14DD2.7000402@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
	<44A14DD2.7000402@sendu.me.uk>
Message-ID: <E4565670-479B-4247-A3CB-3DA998AF8456@gmx.net>


On Jun 27, 2006, at 11:25 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> I would have suggested initiating a quantitation type ontology,  
>> not one
>> individual per module.
>
> Where would such a thing 'live'? Would it be some static file  
> somewhere
> that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology
> that can added to by a module when it needs extra terms to describe  
> its
> particular kind of scores?

For instance, yes. Once you read in an ontology (through  
Bio::OntologyIO indeed) it sits essentially in memory.

> [...]
> Not having any experience with ontolgies, I can't think how this would
> all be done in practice though. Don't we need some central module
> (Bio::Score) to create the ontology (or read it in) and then present
> some suitable interface to it?

Possibly - the problem is how to get the ontology=typed term given an  
analysis program and attribute name (e.g. 'score' of a feature  
object). There is no method for doing this on a feature object and  
bolting one on would be a bad idea I think.

So, the Bio::Score would be a little hybrid between an objectified  
score value that now doesn't just have a numeric value but also a  
type term, and a factory for creating the ontology (e.g., by reading  
it in from a specified or default location). I.e., you'd have

	my $value = $score->value();
	my $type = $score->type();
	# $type is-a Bio::Ontology::TermI
	my $quant_ont = $type->ontology();
	
	# see what type of score we have
	my @ancestors = $quant_ont->get_ancestor_terms($type);
	if (grep {$_->name eq 'expectation_value'} @ancestors) {
		# it's an e-value
	} elsif ( ...test for some other type...) {
		# etc
	}


> For example, modules that wanted to store
> some scores might just ask Bio::Score for the ontology and type their
> scores by associating with an available ontology term, creating new
> terms if necessary (or is that something you would never do; the
> ontology needed to have been set up to cover all possible terms?).

Yes. You'd extend it as you encounter types that aren't in the  
ontology yet, until the ontology fully captures the knowledge domain.

> Then
> when the user has a bunch of these typed scores, surely he doesn't  
> want
> to deal with going through the ontology himself to work out what it  
> all
> means? Well, he could if he needs that level of control, but also he
> just wants to say Bio::Score->sort(x y z) or something.

See above for a quick example of the logic. I'd separate that into  
its own module, like Bio::Score::Utils.

	-hilmar

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 28 10:29:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:29:17 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>

Selvi, 

Can you send me the report you are trying to parse as an attachment?  I'll
give it a look.

Judging by the pdoc this is mapped for the event handler so it should be
there.  From the %MAPPING hash:

                 'HMMER_program'   => 'RESULT-algorithm_name',
                 'HMMER_version'   => 'RESULT-algorithm_version',
                 'HMMER_query-def' => 'RESULT-query_name',
                 'HMMER_query-len' => 'RESULT-query_length',
                 'HMMER_query-acc' => 'RESULT-query_accession',
                 'HMMER_querydesc' => 'RESULT-query_description',
                 'HMMER_hmm'       => 'RESULT-hmm_name',                 
                 'HMMER_seqfile'   => 'RESULT-sequence_file',
	           'HMMER_db'        => 'RESULT-database_name',

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> Sent: Tuesday, June 27, 2006 7:55 AM
> To: bioperl-l at lists.open-bio.org
> Cc: selvik at ufl.edu
> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> evalue, description)
> 
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A?
> option when building the report.
> 
> Though the Bio::SearchIO::hmmer module parses through the entire
> HMMER report (Section 1, 2 and 3), the set of values made
> available through Bio::Search::Result::ResultI seem to be using
> Section 3 alone. So when we use the ?A option to truncate, we lose
> otherwise useful information in Section 1. This information is
> lost (only) for those models that do not have any of their domains
> in the top ?A number of? best scoring domains. The fields that are
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model
> 
> If I use the older Bio::Tools::HMMER:Results module, NEITHER
> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> retrieve the above listed values. Scores and Evalues are available
> for each domain but not for the model it belongs to.
> 
> I was wondering if there is any other method to access these
> values or do I have to write my own module to do this?
> 
> Any ideas/suggestions would be greatly appreciated.
> 
> Thank you!
> 
> 
> 
> 
> Selvi Kadirvel
> 
> Graduate Research Assistant
> High Performance Computing Center
> University of Florida
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 10:55:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:55:31 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>
Message-ID: <003501c69ac2$e70623b0$15327e82@pyrimidine>

I hate responding to myself!!  Forgot to add that there is also
Bio::Tools::Hmmpfam :

http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam

I'll check if Bio::SearchIO catches this data and let you know what I find
out.  It should at least some according to the mapping.

Chris

> Selvi,
> 
> Can you send me the report you are trying to parse as an attachment?  I'll
> give it a look.
> 
> Judging by the pdoc this is mapped for the event handler so it should be
> there.  From the %MAPPING hash:
> 
>                  'HMMER_program'   => 'RESULT-algorithm_name',
>                  'HMMER_version'   => 'RESULT-algorithm_version',
>                  'HMMER_query-def' => 'RESULT-query_name',
>                  'HMMER_query-len' => 'RESULT-query_length',
>                  'HMMER_query-acc' => 'RESULT-query_accession',
>                  'HMMER_querydesc' => 'RESULT-query_description',
>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
> 	           'HMMER_db'        => 'RESULT-database_name',
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> > Sent: Tuesday, June 27, 2006 7:55 AM
> > To: bioperl-l at lists.open-bio.org
> > Cc: selvik at ufl.edu
> > Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> > evalue, description)
> >
> > All,
> >
> > (I am new to Bioinformatics and Bioperl, so please apologize if I
> > get my terminology wrong)
> >
> > I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> > report consists of three sections namely;
> >
> > 1. A ranked list of the best scoring HMMs
> > 2. A list of the best scoring domains in order of their occurrence
> > in the sequence
> > 3. Alignments for all the best scoring domains.
> >
> > Section 3 can be truncated to a specific number using the ??A?
> > option when building the report.
> >
> > Though the Bio::SearchIO::hmmer module parses through the entire
> > HMMER report (Section 1, 2 and 3), the set of values made
> > available through Bio::Search::Result::ResultI seem to be using
> > Section 3 alone. So when we use the ?A option to truncate, we lose
> > otherwise useful information in Section 1. This information is
> > lost (only) for those models that do not have any of their domains
> > in the top ?A number of? best scoring domains. The fields that are
> > not available are:
> >
> > 1.	Description of a model
> > 2.	Score of a model
> > 3.	Evalue of a model
> >
> > If I use the older Bio::Tools::HMMER:Results module, NEITHER
> > Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> > retrieve the above listed values. Scores and Evalues are available
> > for each domain but not for the model it belongs to.
> >
> > I was wondering if there is any other method to access these
> > values or do I have to write my own module to do this?
> >
> > Any ideas/suggestions would be greatly appreciated.
> >
> > Thank you!
> >
> >
> >
> >
> > Selvi Kadirvel
> >
> > Graduate Research Assistant
> > High Performance Computing Center
> > University of Florida
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Jun 28 11:04:29 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 16:04:29 +0100
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <44A29A7D.7020602@sendu.me.uk>

Kadirvel, Selvi wrote:
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I 
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence 
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A? 
> option when building the report.

What do you mean by this? What is ??A? ?
Is this an option you're supplying to hmmpfam or a bioperl module?


> Though the Bio::SearchIO::hmmer module parses through the entire 
> HMMER report (Section 1, 2 and 3), the set of values made 
> available through Bio::Search::Result::ResultI seem to be using 
> Section 3 alone. So when we use the ?A option to truncate, we lose 
> otherwise useful information in Section 1. This information is 
> lost (only) for those models that do not have any of their domains 
> in the top ?A number of? best scoring domains. The fields that are 
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model

Each hit you get back from each result of the SearchIO is a 
Bio::Search::Hit::HMMERHit and represents the results of a particular 
model (you can also say $result->next_model).

So you can say:
$hit->name, " ", $hit->description, " ", $hit->significance, " ", 
$hit->score;

To get the information you want.
General information about the result can be had like so:
print $result->query_name, " ", $result->algorithm, " ", 
$result->hmm_name, "\n";

I have another problem (or the same one as you? I'm can't tell...) in 
that I can only get a single result, hit and hsp from my hmmpfam file!
It is doing my head in, but I might be doing something wrong so will 
look into it further before posting a bug report.

From bix at sendu.me.uk  Wed Jun 28 12:46:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 17:46:57 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A29A7D.7020602@sendu.me.uk>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
	<44A29A7D.7020602@sendu.me.uk>
Message-ID: <44A2B281.7030806@sendu.me.uk>

Sendu Bala wrote:
[ from thread Bio::SearchIO - Accessing Model parameters (score, evalue, 
description) ]
[ concerning hmmpfam output ]
> I have another problem (or the same one as you? I'm can't tell...) in 
> that I can only get a single result, hit and hsp from my hmmpfam file!
> It is doing my head in, but I might be doing something wrong so will 
> look into it further before posting a bug report.

I was just doing something wrong, but...

Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report 
a single HSP per Hit so domains with multiple alignments get separate 
Hits (more FASTA like) since they aren't really HSPs'

Strangely 1.25 (Bioperl 1.4) seems to behave like that already.

In any case, this is extremely counter-intuitive, especially given that 
next_domain is a synonym of next_hsp. I think either the synonym 
relationship remains and hits have multiple hsps (and there is only one 
hit per model), or next_domain goes off and finds the hsp that is the 
next domain of the current model. But that would be incredibly broken in 
the current model since it would be found in a different hit object...

What hmmpfam does is take a database of models which can be thought of 
as database sequences. Then it aligns each one against your query 
sequences. A model could align in multiple locations along a query 
sequence. Each one of these locations is called a domain of the model. A 
user of hmmpfam is model-centric (wants to know which models are on his 
query), and so you want to know all about how well the model did in one 
go. So you should be able to get the results for a model ($hit = 
$result->next_model), get overall info about it ($hit->score etc.), then 
get more detailed information about each domain of it (while ($hsp = 
$hit->next_domain) {...}). But right now you only get one domain and you 
have to go searching through all your other hits to find a hit with the 
same ->name() as your model of interest to get the next domain of your 
model.

In my view this is less than ideal. What do people think? Should it be 
changed?

From selvik at ufl.edu  Wed Jun 28 11:21:37 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 11:21:37 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <003501c69ac2$e70623b0$15327e82@pyrimidine>
References: <003501c69ac2$e70623b0$15327e82@pyrimidine>
Message-ID: <2679E8D1-E225-4414-8925-1EB73B83523B@ufl.edu>

Thanks for your reply Chris.

I am attaching a part of the report I am trying to parse.

Also I see that, Bio::SearchIO::hmmer.pm is parsing all three  
sections. I am not sure how (or whether) fields from Section 1 are  
actually being made available through Bio::SearchIO or Bio::Search:: 
[Hit | Hsp | Result].

I'll look into Bio::Tools::Hmmpfam and let you know if that works for  
me.

-Selvi


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ManyQueries.hmmer
Type: application/octet-stream
Size: 3684451 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060628/53dcc875/attachment-0001.obj 
-------------- next part --------------


On Jun 28, 2006, at 10:55 AM, Chris Fields wrote:

> I hate responding to myself!!  Forgot to add that there is also
> Bio::Tools::Hmmpfam :
>
> http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam
>
> I'll check if Bio::SearchIO catches this data and let you know what  
> I find
> out.  It should at least some according to the mapping.
>
> Chris
>
>> Selvi,
>>
>> Can you send me the report you are trying to parse as an  
>> attachment?  I'll
>> give it a look.
>>
>> Judging by the pdoc this is mapped for the event handler so it  
>> should be
>> there.  From the %MAPPING hash:
>>
>>                  'HMMER_program'   => 'RESULT-algorithm_name',
>>                  'HMMER_version'   => 'RESULT-algorithm_version',
>>                  'HMMER_query-def' => 'RESULT-query_name',
>>                  'HMMER_query-len' => 'RESULT-query_length',
>>                  'HMMER_query-acc' => 'RESULT-query_accession',
>>                  'HMMER_querydesc' => 'RESULT-query_description',
>>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
>> 	           'HMMER_db'        => 'RESULT-database_name',
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
>>> Sent: Tuesday, June 27, 2006 7:55 AM
>>> To: bioperl-l at lists.open-bio.org
>>> Cc: selvik at ufl.edu
>>> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters  
>>> (score,
>>> evalue, description)
>>>
>>> All,
>>>
>>> (I am new to Bioinformatics and Bioperl, so please apologize if I
>>> get my terminology wrong)
>>>
>>> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
>>> report consists of three sections namely;
>>>
>>> 1. A ranked list of the best scoring HMMs
>>> 2. A list of the best scoring domains in order of their occurrence
>>> in the sequence
>>> 3. Alignments for all the best scoring domains.
>>>
>>> Section 3 can be truncated to a specific number using the ??A?
>>> option when building the report.
>>>
>>> Though the Bio::SearchIO::hmmer module parses through the entire
>>> HMMER report (Section 1, 2 and 3), the set of values made
>>> available through Bio::Search::Result::ResultI seem to be using
>>> Section 3 alone. So when we use the ?A option to truncate, we lose
>>> otherwise useful information in Section 1. This information is
>>> lost (only) for those models that do not have any of their domains
>>> in the top ?A number of? best scoring domains. The fields that are
>>> not available are:
>>>
>>> 1.	Description of a model
>>> 2.	Score of a model
>>> 3.	Evalue of a model
>>>
>>> If I use the older Bio::Tools::HMMER:Results module, NEITHER
>>> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
>>> retrieve the above listed values. Scores and Evalues are available
>>> for each domain but not for the model it belongs to.
>>>
>>> I was wondering if there is any other method to access these
>>> values or do I have to write my own module to do this?
>>>
>>> Any ideas/suggestions would be greatly appreciated.
>>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>> Selvi Kadirvel
>>>
>>> Graduate Research Assistant
>>> High Performance Computing Center
>>> University of Florida
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From akarger at CGR.Harvard.edu  Wed Jun 28 15:49:54 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 28 Jun 2006 15:49:54 -0400
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
Message-ID: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>

>perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
-e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: acc (P09651) does not exist
---------------------------------------------------
>perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
$format="fasta";' -e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: id (ROA1_HUMAN) does not exist
---------------------------------------------------

But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
Same error for a couple other proteins.
Works for a GenBank protein.

perl 5.8.6
Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp

This worked a few months ago.
What's going on?

-Amir Karger


From cjfields at uiuc.edu  Wed Jun 28 16:27:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 15:27:15 -0500
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>
Message-ID: <006901c69af1$412c3590$15327e82@pyrimidine>

This was a recent bug due to recent changes in EBI's remote database; they
changed the name of the database from 'swall' to 'uniprot'.  Update to
bioperl-live from CVS (or just Bio::DB::SwissProt) and that should fix it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, June 28, 2006 2:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
> 
> >perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
> -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: acc (P09651) does not exist
> ---------------------------------------------------
> >perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
> $format="fasta";' -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: id (ROA1_HUMAN) does not exist
> ---------------------------------------------------
> 
> But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
> Same error for a couple other proteins.
> Works for a GenBank protein.
> 
> perl 5.8.6
> Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp
> 
> This worked a few months ago.
> What's going on?
> 
> -Amir Karger
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed Jun 28 16:39:43 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 28 Jun 2006 13:39:43 -0700
Subject: [Bioperl-l] FW:  How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <1A4207F8295607498283FE9E93B775B4019719A4@EX02.asurite.ad.asu.edu>

This was supposed to go to the list...  Still not used to Outlook...

> The points made here, as I see them:
> 
> 1)  Commits should be made to stable releases (as well as to 
> the main branch in CVS) to fix bugs as long as that release is
supported.  I 
> agree with this, but someone has to volunteer, and the length of time
a 
> release is supported also worked out.  Almost would be better going to
a regular
> release schedule (once every 3-6 months or so) where the code is given
as is
> to CPAN, whether it passes tests or not.

What I've seen in other projects is that stable is supported and bug
patched up till the next stable release.  After that support is dropped.
Once a branch was tagged stable the ONLY thing that went into it was
fixes for bugs based on the code already present.  No new features, no
refactoring of any code or modules.  I'm not certain how often things
like a stable patch release happened since most of the bugs were worked
on long before while it was still tagged as dev.  I could see, worst
case a .x release to stable every 6 months to a year until the next
stable came out if there were patches to it.  It looks like the wiki has
most of this kind of stuff documented in the previously posted link:
http://www.bioperl.org/wiki/Making_a_BioPerl_release.  I guess it would
just need a pumpkin/monkey/whatever to step up to keep things rolling...

> 2)  More communication about the direction Bioperl is 
> heading; personally I
> haven't see a problem with this as much as there is no 
> information about a
> roadmap.  That is being alleviated soon I believe, thought 
> people out there
> need to be patient.
> 
> 3)  Volunteer.  If you have something you believe needs to be 
> done and you
> believe so fervently, then put up or shut up.  Make (nice polite)
> suggestions otherwise.  Don't judge code or "the way things 
> are done" and
> don't presume what kind of experience people have that you 
> don't know and
> haven't met.  End of story.
> 
> Chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Jun 28 18:14:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:14:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A2B281.7030806@sendu.me.uk>
Message-ID: <007e01c69b00$2e091410$15327e82@pyrimidine>

> Sendu Bala wrote:
> [ from thread Bio::SearchIO - Accessing Model parameters (score, evalue,
> description) ]
> [ concerning hmmpfam output ]
> > I have another problem (or the same one as you? I'm can't tell...) in
> > that I can only get a single result, hit and hsp from my hmmpfam file!
> > It is doing my head in, but I might be doing something wrong so will
> > look into it further before posting a bug report.
> 
> I was just doing something wrong, but...
> 
> Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report
> a single HSP per Hit so domains with multiple alignments get separate
> Hits (more FASTA like) since they aren't really HSPs'
> 
> Strangely 1.25 (Bioperl 1.4) seems to behave like that already.
> 
> In any case, this is extremely counter-intuitive, especially given that
> next_domain is a synonym of next_hsp. I think either the synonym
> relationship remains and hits have multiple hsps (and there is only one
> hit per model), or next_domain goes off and finds the hsp that is the
> next domain of the current model. But that would be incredibly broken in
> the current model since it would be found in a different hit object...
>
> What hmmpfam does is take a database of models which can be thought of
> as database sequences. Then it aligns each one against your query
> sequences. A model could align in multiple locations along a query
> sequence. Each one of these locations is called a domain of the model. A
> user of hmmpfam is model-centric (wants to know which models are on his
> query), and so you want to know all about how well the model did in one
> go. So you should be able to get the results for a model ($hit =
> $result->next_model), get overall info about it ($hit->score etc.), then
> get more detailed information about each domain of it (while ($hsp =
> $hit->next_domain) {...}). But right now you only get one domain and you
> have to go searching through all your other hits to find a hit with the
> same ->name() as your model of interest to get the next domain of your
> model.
> 
> In my view this is less than ideal. What do people think? Should it be
> changed?

The model (hit-like) table scores are retained and can be retrieved via
$model->significance and the individual domain (hsp-like) evalues via
$model->evalue.  The reason you don't get all the individual domain evalues
is that only five alignments are returned by default.  You might try
changing the 'A' parameter to see if you can get more alignments; that may
work around the problem of missing domains for now.  You'll note that the
Model/Domain results returned are not based on top score but what looks like
the position of the domain in the sequence (seq-t in the last table); that's
what is stated in the hmmpfam docs.  Anyway, I tried this loop with the
reports Selvi sent and it works, but only for the ones that return
alignments:

my $result_count = 1;
while ( my $result = $searchio->next_result() ) {
  print "Result $result_count : ",$result->query_name,"\n";
  print "Result models: ",$result->num_hits,"\n";
  while (my $model = $result->next_hit) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_hsp) {
	  print "\t\tDomain : ",$domain->name,"\n";
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
  $result_count++;
}

>From the HMMER docs: "Say you have a new sequence that, according to a BLAST
analysis, shows a slew of hits to receptor tyrosine kinases. Before you
decide to call your sequence an RTK homologue, you suspiciously recall that
RTK's are, like many proteins, composed of multiple functional domains, and
these domains are often found promiscuously in proteins with a wide variety
of functions. Is your sequence really an RTK? Or is it a novel sequence that
just happens to have a protein kinase catalytic domain or fibronectin type
III domain?"

Model/domain pairs really aren't Hits/HSPs by definition, like the CVS
commit from Jason states.  The way Pfam is set up, you actually have your
query(ies) scanned using a database of Pfam domains (HMM's, built from
protein alignments for various protein families), hence the alignment in the
report is not a HSP since HSPs come from pairwise sequence alignments.  An
HSP is a pair of sequences which, when aligned, meet or exceed a maximal
cutoff.  The hmmpfam report has alignments of the sequence and the consensus
for the alignment the HMM is based on (not another sequence, so not an HSP).
This is also the same reason you can't get alignments from
Bio::Search::HSP::HMMERHSP objects since the model 'sequence' isn't a true
sequence but a consensus of sequences, so it's 'inappropriate' to use that
as an actual alignment.  Bad Bioperl user!  Bad!

I think the reasoning for keeping single model-domain pairs is that you
should consider each domain's location in the sequence as well as the number
of times they appear, regardless of whether they belong to the same model or
not.  One protein could have three ATP-binding domains and another two, and
they could be located in different positions on the sequence.  But where
they are on the sequence in relation to other domains and to each other
(i.e. positional information) is just as important, maybe more so, than how
many times that domain appears.  

Well, that and SearchIO is set up as a SAX-like parser, so I believe it
processes the model-domain alignments as the file is parsed.

My 2c: there should be a way to get all model-domain pairs in the "parsed
for domains" table (which is like a list of HSPs).  Seems the last few w/o
alignments are not retained; this may be the way the parser is set up.  I
would try getting the handler to return just evalues and similar stuff for
those and leave out sequence/alignment info, if that's possible.  Not sure
how this is handled with BLAST reports where there are more hits reported
than alignments...

Chris
_____________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 18:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:16:38 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
Message-ID: <000001c69b00$86adcc00$15327e82@pyrimidine>

Arghhhh!  Made a mistake:

> my $result_count = 1;
> while ( my $result = $searchio->next_result() ) {
>   print "Result $result_count : ",$result->query_name,"\n";
>   print "Result models: ",$result->num_hits,"\n";
>   while (my $model = $result->next_hit) {
> 	print "\tModel : ",$model->name,"\n";
> 	print "\tSignif: ",$model->significance,"\n";
> 	while (my $domain = $model->next_hsp) {
> 	  print "\t\tDomain : ",$domain->name,"\n";
                              ^^^^^^^
Should be:                    $model

> 	  print "\t\tEvalue : ",$domain->evalue,"\n";
> 	}
>   }
>   $result_count++;
> }

My bad!

Chris


From bix at sendu.me.uk  Wed Jun 28 19:00:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 00:00:11 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <007e01c69b00$2e091410$15327e82@pyrimidine>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
Message-ID: <44A309FB.2050009@sendu.me.uk>

Chris Fields wrote:
>> Sendu Bala wrote:
[snip]
>> In any case, this is extremely counter-intuitive, especially given
>> that next_domain is a synonym of next_hsp. I think either the
>> synonym relationship remains and hits have multiple hsps (and there
>> is only one hit per model)
[snip]

> The model (hit-like) table scores are retained and can be retrieved
> via $model->significance and the individual domain (hsp-like) evalues
> via $model->evalue.

I know, see my earlier post.

> The reason you don't get all the individual domain evalues is that
> only five alignments are returned by default.  You might try changing
> the 'A' parameter to see if you can get more alignments; that may 
> work around the problem of missing domains for now.

[I'm using my own data, not the OP's]
No, I have all the alignments: 'A' isn't a problem. And I can get all
the domains. The problem is I have to check multiple different hits to
find them all.


> You'll note that the Model/Domain results returned are not based on 
> top score but what looks like the position of the domain in the
> sequence (seq-t in the last table); that's what is stated in the
> hmmpfam docs.
[...]
> Well, that and SearchIO is set up as a SAX-like parser, so I believe 
> it processes the model-domain alignments as the file is parsed.

Yes, this is the problem. The parser does the obvious thing, but in my 
view it does not do the correct thing.


> Model/domain pairs really aren't Hits/HSPs by definition, like the
> CVS commit from Jason states.  The way Pfam is set up, you actually
> have your query(ies) scanned using a database of Pfam domains (HMM's,
> built from protein alignments for various protein families), hence
> the alignment in the report is not a HSP since HSPs come from
> pairwise sequence alignments.  An HSP is a pair of sequences which,
> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> has alignments of the sequence and the consensus for the alignment
> the HMM is based on (not another sequence, so not an HSP).

But this is just semantics. It doesn't /matter/ that its not really 
truly a sequence that's being aligned. The parser needs to present to 
the user the information in the file. As we see in the OP's example, it 
simply fails to do this because the parser isn't model-centric while the 
file it is parsing /is/.

And in any case, your argument doesn't hold because even the current 
parser /does/ store domains in hsp objects! It just only stores one hsp 
per hit, repeatedly, which is nonsensical.

[to avoid confusion, in the following the use of 'model' is in the 
programming sense, whilst 'Model' refers to the things generated by hmmer]

The correct model to describe the file being parsed is one that is able 
provide to the user all the available results for all Models that hit a 
query sequence, even when there are no alignments in the file. To make 
this fit the SearchIO scheme, we must have one hit per Model. The hit 
has hsps which are the domains. This perfectly matches the information 
in the file. It matches something like a Blast, where you have one hit 
per database sequence/query sequence combo.

A hit could end up with no hsps (no domains), but we may not even care. 
Sometimes you really do just want to know if a particular model hit at 
all, and with what evalue/score. The current parsing model isn't 
guaranteed to tell you this even when you can read it yourself in the 
file being parsed.

You can guess at the intent of the original authors, I think, just by 
looking at those method synonyms. next_hit == next_model. next_hsp == 
next_domain. This makes perfect sense. This is the way to correctly 
model the information in the file. The problem is that next_model 
doesn't give you the next Model (because each Model has multiple hits), 
and next_domain doesn't give you the next domain (because each hit only 
has one domain).


> I think the reasoning for keeping single model-domain pairs is that
> you should consider each domain's location in the sequence as well as
> the number of times they appear, regardless of whether they belong to
> the same model or not.  One protein could have three ATP-binding
> domains and another two, and they could be located in different
> positions on the sequence.  But where they are on the sequence in
> relation to other domains and to each other (i.e. positional
> information) is just as important, maybe more so, than how many times
> that domain appears.

Well, that's for the user to decide. But the way the results are 
presented needs to make sense. If blast results came back with all hsps 
listed out in sequence position order, would you have multiple hits per 
database sequence each with one hsp? No, because the meaning is 
completely wrong. The 'hit' is the collection of alignments of a 
particular database sequence hitting a query sequence. The alignments 
are stored in a bunch of hsps. It is absurd to have more than one hit 
object for a database+query sequence combo, because then we have 
multiple hit objects duplicating the exact same information, and 'hit' 
no longer has any meaning - it is a collection of /some/ of the 
alignments? Yet this is exactly what we have with hmmpfam result parsing.

From selvik at ufl.edu  Wed Jun 28 16:11:56 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 16:11:56 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
References: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
Message-ID: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>

Sendu,

>
> What do you mean by this? What is ??A? ?
> Is this an option you're supplying to hmmpfam or a bioperl module?

I was referring to the '-A' option when running hmmpfam. So if I were  
to use  '-A 5', Section 3 will have only the top scoring (first) five  
HSPs.

>
> So you can say:
> $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> $hit->score;
>
> To get the information you want.
> General information about the result can be had like so:
> print $result->query_name, " ", $result->algorithm, " ",
> $result->hmm_name, "\n";

I do use the same methods that you have suggested. Let me try to  
explain my problem in detail. Lets say I have a report that was  
generated using this "-A 5" option. I want to get the description,  
score, evalue of a model that *does not* have a domain in the top 5  
high scoring HSPs. This information *exists* in the report in Section  
1 but neither $result->next_hit or $hit->next_hsp can see it.

Details of ALL domains  are available through:

     foreach $domain ($result->each_Domain)
     {
            $domain-> [ hmmname, hmmacc, start, end, hstart, hend,  
evalue ]
     }

where $result is a Bio::Tools::HMMER::Results object. But this again  
represents information in Section 2. It gives us domain scores and  
evalues (and not model scores and evalues.)

I am working around this by finding the sum of scores (evalues) of  
all domains in a model. But there seems to be no work-around to  
retrieve the description. $domain->hmmacc contains only the first  
string of the description.

-Selvi


From jason at bioperl.org  Wed Jun 28 22:53:25 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 28 Jun 2006 22:53:25 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
	<44A309FB.2050009@sendu.me.uk>
Message-ID: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>

I don't have any time to really debate this sadly - I definitely went  
back and forth on how to solve this and not many people ever spoke up  
about what the WANTED.  So glad to hear there are opinions out there  
now.

I think the bug fix you refer to had to do with not returning things  
ordered by E-value -- the creation machinery only only builds Hit  
objects when there are HSP objects being built.  Basically the  
parsing is linear in terms of the file, we read "Model" (Hit) data  
first and store them in a hash keyed by the name of the domain, but  
we only >>build<< the "Hits" when seen HSPs, hence the problem when  
the -A option limits alignments but reports Hits that don't have  
individual alignments.  This has to do with the order of things not  
syncing up and/or dealing with the -A option when there is leftover  
Hit data but no HSPs to populate them.  We also had this problem in  
BLAST reports and had to work around that, but I never bothered  
solving it in HMMER I guess.  Glad there are other people who are  
going to fix the problems!

The one "alignment" (HSP) per hit was a workaround to the problem  
that Hits were being returned in the order the HSPs came in (Sequence  
order) -- because that is the order they were being built in -- not  
in the sorted order of the Hits as seen in the report.

Feel free to propose an alternative implement for parser as you see  
fit as long as the API is preserved.  you can contibute a new  
SearchIO plugin and HMMERSearchResultListener to deal with it - or I  
guess do what I also do and just run hmmer2table and deal with things  
in a tab-delimited format.

Personally my interests lie in the actual domains so the Hit objects  
are superfluous in my own work so it never bothered me to have one  
per Hit and it flows more naturally to things like GFF, etc.  You can  
aggregate them however you like after the fact pretty simply so I  
don't find this too hard to deal with, but if this a major deterrent  
for people I guess have at it ( I think the speed of object creation  
is a larger problem that I hope that someone will work on soon).

I'd appreciate you including the salient points of how the report is  
interpreted on the wiki at some point (with 8X10 glossy pictures and  
circles and arrows on the back...http://en.wikipedia.org/wiki/Alice% 
27s_Restaurant) so the debate can be archived too.

-jason

On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:

> Chris Fields wrote:
>>> Sendu Bala wrote:
> [snip]
>>> In any case, this is extremely counter-intuitive, especially given
>>> that next_domain is a synonym of next_hsp. I think either the
>>> synonym relationship remains and hits have multiple hsps (and there
>>> is only one hit per model)
> [snip]
>
>> The model (hit-like) table scores are retained and can be retrieved
>> via $model->significance and the individual domain (hsp-like) evalues
>> via $model->evalue.
>
> I know, see my earlier post.
>
>> The reason you don't get all the individual domain evalues is that
>> only five alignments are returned by default.  You might try changing
>> the 'A' parameter to see if you can get more alignments; that may
>> work around the problem of missing domains for now.
>
> [I'm using my own data, not the OP's]
> No, I have all the alignments: 'A' isn't a problem. And I can get all
> the domains. The problem is I have to check multiple different hits to
> find them all.
>
>
>> You'll note that the Model/Domain results returned are not based on
>> top score but what looks like the position of the domain in the
>> sequence (seq-t in the last table); that's what is stated in the
>> hmmpfam docs.
> [...]
>> Well, that and SearchIO is set up as a SAX-like parser, so I believe
>> it processes the model-domain alignments as the file is parsed.
>
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.
>
>
>> Model/domain pairs really aren't Hits/HSPs by definition, like the
>> CVS commit from Jason states.  The way Pfam is set up, you actually
>> have your query(ies) scanned using a database of Pfam domains (HMM's,
>> built from protein alignments for various protein families), hence
>> the alignment in the report is not a HSP since HSPs come from
>> pairwise sequence alignments.  An HSP is a pair of sequences which,
>> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
>> has alignments of the sequence and the consensus for the alignment
>> the HMM is based on (not another sequence, so not an HSP).
>
> But this is just semantics. It doesn't /matter/ that its not really
> truly a sequence that's being aligned. The parser needs to present to
> the user the information in the file. As we see in the OP's  
> example, it
> simply fails to do this because the parser isn't model-centric  
> while the
> file it is parsing /is/.
>
> And in any case, your argument doesn't hold because even the current
> parser /does/ store domains in hsp objects! It just only stores one  
> hsp
> per hit, repeatedly, which is nonsensical.
>
> [to avoid confusion, in the following the use of 'model' is in the
> programming sense, whilst 'Model' refers to the things generated by  
> hmmer]
>
> The correct model to describe the file being parsed is one that is  
> able
> provide to the user all the available results for all Models that  
> hit a
> query sequence, even when there are no alignments in the file. To make
> this fit the SearchIO scheme, we must have one hit per Model. The hit
> has hsps which are the domains. This perfectly matches the information
> in the file. It matches something like a Blast, where you have one hit
> per database sequence/query sequence combo.
>
> A hit could end up with no hsps (no domains), but we may not even  
> care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.
>
> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple  
> hits),
> and next_domain doesn't give you the next domain (because each hit  
> only
> has one domain).
>
>
>> I think the reasoning for keeping single model-domain pairs is that
>> you should consider each domain's location in the sequence as well as
>> the number of times they appear, regardless of whether they belong to
>> the same model or not.  One protein could have three ATP-binding
>> domains and another two, and they could be located in different
>> positions on the sequence.  But where they are on the sequence in
>> relation to other domains and to each other (i.e. positional
>> information) is just as important, maybe more so, than how many times
>> that domain appears.
>
> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all  
> hsps
> listed out in sequence position order, would you have multiple hits  
> per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result  
> parsing.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Wed Jun 28 23:40:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 22:40:28 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>
Message-ID: <000301c69b2d$c3fdc6a0$15327e82@pyrimidine>

According to CVS, using -A0 (no alignments) is supposed to work since v.
1.5.1 and (I'm guessing here) should return HMMERHit/HMMERHSP objects with
no sequences, just the values from the table.  By this reasoning using -A5
should work but the first five Hit/HSP pairs will give you sequences and any
remaining should give nothing, just the Sequence Model combined evalue
(which you can get by $model->significance) and individual Domain (HSP-like)
evalues ($domain->evalue).  I don't get these either (I only get a max of 5
model/domain pairs). 

So, I tried a little experiment using the first single result output for
this query from your combined file (nbd27e02.y1  716 69 831 ; translated),
which was the first one I came across with more than five model/domain
pairs, and this scripted loop:

while ( my $result = $searchio->next_result() ) {
  print "Query: ",$result->query_name,"\n";
  while (my $model = $result->next_model) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_domain) {
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
}

I get this with the file containing the alignments.  For anyone following,
I'm using bioperl-live, perl 5.8, WinXP:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 2.6e-43
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 40
        Model : IBN_N
        Signif: 2.1
                Evalue : 2.1
        Model : Arm
        Signif: 6e-38
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 0.0096

If I manually delete the alignments (make it like -A0 output) I get this:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 157.3
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 52.1
                Evalue : 40
        Model : IBN_N
        Signif: -3.6
                Evalue : 2.1
        Model : Arm
        Signif: 139.5
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0096
        Model : Arm
        Signif: 139.5
                Evalue : 2.2e-13
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0032
        Model : Arm
        Signif: 139.5
                Evalue : 0.00019

i.e. all the model/domain pairs!  So I think it's safe to say that this is a
bug; the last few don't get processed but should.  I'll drop a bug report
into Bugzilla along with the test files and script so it can be confirmed.
This shouldn't be too hard to fix but it make take a few days; I'm pretty
busy here until Saturday.
 
Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Selvi Kadirvel
> Sent: Wednesday, June 28, 2006 3:12 PM
> To: bioperl-l at lists.open-bio.org
> Cc: Selvi Kadirvel
> Subject: Re: [Bioperl-l] Bio::SearchIO - Accessing Model parameters
> (score,evalue, description)
> 
> Sendu,
> 
> >
> > What do you mean by this? What is ??A? ?
> > Is this an option you're supplying to hmmpfam or a bioperl module?
> 
> I was referring to the '-A' option when running hmmpfam. So if I were
> to use  '-A 5', Section 3 will have only the top scoring (first) five
> HSPs.
> 
> >
> > So you can say:
> > $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> > $hit->score;
> >
> > To get the information you want.
> > General information about the result can be had like so:
> > print $result->query_name, " ", $result->algorithm, " ",
> > $result->hmm_name, "\n";
> 
> I do use the same methods that you have suggested. Let me try to
> explain my problem in detail. Lets say I have a report that was
> generated using this "-A 5" option. I want to get the description,
> score, evalue of a model that *does not* have a domain in the top 5
> high scoring HSPs. This information *exists* in the report in Section
> 1 but neither $result->next_hit or $hit->next_hsp can see it.
> 
> Details of ALL domains  are available through:
> 
>      foreach $domain ($result->each_Domain)
>      {
>             $domain-> [ hmmname, hmmacc, start, end, hstart, hend,
> evalue ]
>      }
> 
> where $result is a Bio::Tools::HMMER::Results object. But this again
> represents information in Section 2. It gives us domain scores and
> evalues (and not model scores and evalues.)
> 
> I am working around this by finding the sum of scores (evalues) of
> all domains in a model. But there seems to be no work-around to
> retrieve the description. $domain->hmmacc contains only the first
> string of the description.
> 
> -Selvi
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 01:20:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:20:10 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
Message-ID: <000d01c69b3b$b17776d0$15327e82@pyrimidine>

> I know, see my earlier post.
...
> [I'm using my own data, not the OP's]
...

Sorry, I was typing that one up over a three-hour period in between
experiments, so I didn't go back and check everything before I sent it.
Pretty much the entire file Selvi sent me (and the entire group, grrr) shows
that the domains in the domain table are not completely parsed, and the
number of reported hits correlates with the number of alignments present.
In other words, only five or less hits are reported based on the alignments
and the default max alignments reported per result is five.  I figured out
that it is a bug and plan on submitting it to Bugzilla.

What you are talking about and what Selvi describes are two separate issues.
I dealt with Selvi's for the moment; let's deal with yours.

> > Well, that and SearchIO is set up as a SAX-like parser, so I believe
...
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.

Yes, and that's your opinion.  To tell the truth I'm quite neutral on this;
I'm trying to reason along the lines the contributors for the module
intended.  The fact of the matter is the parser is set up to do it this way,
and it was set up this way by others (not you or I); modifying it to suit
one's personal wants and needs is not our job here.  I don't have issues
while I'm running it so I really don't see what the problem is, well,
besides the reported bug I found along with Selvi's help.

My view on all this before I quit for the night:

I'm really don't want to get into what I consider nit-picky issues (the
'semantics' you mention; it's a simple difference in opinion and a small one
at that).  We can agree to disagree, whatever.  The issue immediately at
hand, what I consider the most important, is that Selvi has uncovered a bug
with the code, as is.  But I'm going to vent here a bit.  It's late, I'm
tired, and this whole thing irks me.  It irks me a great deal. 

Personally, I don't think right now is the time to think about refactoring
this particular module, esp. since I find it essentially works.  I believe
that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
instance, or refactoring SearchIO::blast etc to use hashes instead of
objects to speed things up.  Or creating something yourself.  Or doing what
you currently are doing (Bio::Map).  In other words, areas where use is
high, code is aging, and refactoring is more productive.

I'll add that I'm not trying to dissuade you from trying to build your own
variation of a SearchIO HMMER parser; by all means go ahead.  The above is
how I feel.  You can build your own parser to do what you want; you can even
base it off the current SearchIO HMMER parser and see if you can set it up
to give you the results you want, using a different handler and so on.  Just
don't break the API or modify the current code based strictly on what your
opinion of how it should work is.  It was probably set up this way for a
particular reason.

According to the SearchIO HOWTO the intent for SearchIO was to 'genericize'
parsing reports with 'similar' styles, like BLAST, FASTA, HMMER, and so on.
The most prevalently parsed reports, by a long stretch, are BLAST reports,
which is what the system is based on: 

http://www.bioperl.org/wiki/HOWTO:SearchIO#Design

So the SearchIO system is based on the >assumption< that these reports can
be divi'd up with the data mapped into categories (Results, Hits, HSPs), so
similar objects should be able to handle them.  Domain data are currently
stored in HSP objects (HMMERHSP), but that's nothing more than a convenient
way to store HMMER report data in my opinion; the alignment matches,
strictly speaking, are not HSP's.  You could rename HMMERHit HMMERModel and
HMMERHsp HMMERDomain, but they would still, if they fit into SearchIO and
used the current event handlers, implement HitI/HSPI by inheriting from
GenericHit/GenericHSP.  Ergo, any easy way you go about it here, HMMERHit
is-a HitI and HMMERHsp is-a HSPI.  You could probably work around it by
building the 'correct' object hierarchy by setting up your own handler and
SearchIO plugin, but that risks changing API.  And, really, if you decide to
go down that path, consider what Jason is talking about when he mentions
using "under-the-hood" hashes.

> A hit could end up with no hsps (no domains), but we may not even care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.

For every model (hit) you should have a corresponding domain (HSP) or more
depending on your view of how the parser works, even if the domain (HSP) is
only present in the table and not in an alignment.  You shouldn't have
models w/o domains from your query (hits w/o hsps); that doesn't make any
sense.  If hmmpfam output has this then it's a serious issue, but, again,
that doesn't make sense.  All that information is in the tables in the
hmmpfam output; you can even build objects w/o alignments present (-A0)
straight from the tables.

If you wanted to know whether a particular model hit at all, grab all the
model objects ($result->models) and run through them to see if your expected
model (Annexin, Phosphoribosyl, or whatever) is there using a map/grep
block, regex, or whatever; you could autovivicate a hash or similar data
structure indicating that a particular sequence has x domains of y type.  Or
iterate through them like you would for a BLAST report.  I don't see what's
difficult about this; I do it for BLAST sequences, SeqFeatures, and many
other BioPerl objects all the time!  Yes, it can be slow; that's an issue
with object instantiation and Perl and there is no easy way around it
besides refactoring the SearchIO parsers/eventhandlers to send back hashes,
as Jason has suggested.

> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple hits),
> and next_domain doesn't give you the next domain (because each hit only
> has one domain).

....

> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all hsps
> listed out in sequence position order, would you have multiple hits per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result parsing.

The problem is that the module is geared to parse the output as simply as
possible, so it does it by sequence order, just like the output.  And, as
is, it makes sense to me why Eddy and Co. set it that way, not that I
completely agree with it.  Hmmpfam output is designed for annotating
sequences using Pfam HMM's, so the results are hard-coded to appear in
sequence order, not based on score or evalue.  That's the way it is; not
necessarily the best way IMHO (I would have a way to sort by evalue or model
myself as an option), but it's the only way that's currently available.
Yes, each Model can match more than one domain on a query sequence.  Again,
that this is the 'correct way' to set up this parser is your opinion; if you
want, design your own SearchIO parser.  Like I said, I don't have a problem
with using this module myself.  And I'm a bit reticent to spend the energy
overhaulin' this module when I could spend my time working on something else
I consider more constructive (or destructive, depending on your view).  

And, frankly, it's not up to the user when using code they didn't create.
You have to deal with it.  Or code something yourself to do things the way
you want.  You have the power to do that; most bioperl users don't simply
b/c they probably don't understand the class structure and OO nature of
Bioperl.  It's just a matter of where you want to spend your energy: dealing
with something that interests you or fixing other's people's broken code.


Chris


From cjfields at uiuc.edu  Thu Jun 29 01:23:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:23:03 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <000e01c69b3c$18d58fb0$15327e82@pyrimidine>

...
 
> I think the bug fix you refer to had to do with not returning things
> ordered by E-value -- the creation machinery only only builds Hit
> objects when there are HSP objects being built.  Basically the
> parsing is linear in terms of the file, we read "Model" (Hit) data
> first and store them in a hash keyed by the name of the domain, but
> we only >>build<< the "Hits" when seen HSPs, hence the problem when
> the -A option limits alignments but reports Hits that don't have
> individual alignments.  This has to do with the order of things not
> syncing up and/or dealing with the -A option when there is leftover
> Hit data but no HSPs to populate them.  We also had this problem in
> BLAST reports and had to work around that, but I never bothered
> solving it in HMMER I guess.  Glad there are other people who are
> going to fix the problems!

Yeah, just figured that one out.  I see the two tables are parsed into two
arrays, so it is feasible to have the leftover (Hit/HSP|Model/Domain)
whatever converted into the proper objects like without any alignments (-A0
optional output).  I plan on reporting this in Bugzilla and will work on it,
but can't get to it immediately (probably not 'til Friday-Saturday at the
earliest).  If Sendu wants to tackle it I don't have a problem.

> The one "alignment" (HSP) per hit was a workaround to the problem
> that Hits were being returned in the order the HSPs came in (Sequence
> order) -- because that is the order they were being built in -- not
> in the sorted order of the Hits as seen in the report.

The SAX method, I gather, getting in the way.  

> Feel free to propose an alternative implement for parser as you see
> fit as long as the API is preserved.  you can contibute a new
> SearchIO plugin and HMMERSearchResultListener to deal with it - or I
> guess do what I also do and just run hmmer2table and deal with things
> in a tab-delimited format.

Or set it up as hashes, which you have mentioned before for BLAST.

> Personally my interests lie in the actual domains so the Hit objects
> are superfluous in my own work so it never bothered me to have one
> per Hit and it flows more naturally to things like GFF, etc.  You can
> aggregate them however you like after the fact pretty simply so I
> don't find this too hard to deal with, but if this a major deterrent
> for people I guess have at it ( I think the speed of object creation
> is a larger problem that I hope that someone will work on soon).

Agreed, though now it's finding the time....


Chris 

> I'd appreciate you including the salient points of how the report is
> interpreted on the wiki at some point (with 8X10 glossy pictures and
> circles and arrows on the back...http://en.wikipedia.org/wiki/Alice%
> 27s_Restaurant) so the debate can be archived too.
> 
> -jason
> 
> On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:
> 
> > Chris Fields wrote:
> >>> Sendu Bala wrote:
> > [snip]
> >>> In any case, this is extremely counter-intuitive, especially given
> >>> that next_domain is a synonym of next_hsp. I think either the
> >>> synonym relationship remains and hits have multiple hsps (and there
> >>> is only one hit per model)
> > [snip]
> >
> >> The model (hit-like) table scores are retained and can be retrieved
> >> via $model->significance and the individual domain (hsp-like) evalues
> >> via $model->evalue.
> >
> > I know, see my earlier post.
> >
> >> The reason you don't get all the individual domain evalues is that
> >> only five alignments are returned by default.  You might try changing
> >> the 'A' parameter to see if you can get more alignments; that may
> >> work around the problem of missing domains for now.
> >
> > [I'm using my own data, not the OP's]
> > No, I have all the alignments: 'A' isn't a problem. And I can get all
> > the domains. The problem is I have to check multiple different hits to
> > find them all.
> >
> >
> >> You'll note that the Model/Domain results returned are not based on
> >> top score but what looks like the position of the domain in the
> >> sequence (seq-t in the last table); that's what is stated in the
> >> hmmpfam docs.
> > [...]
> >> Well, that and SearchIO is set up as a SAX-like parser, so I believe
> >> it processes the model-domain alignments as the file is parsed.
> >
> > Yes, this is the problem. The parser does the obvious thing, but in my
> > view it does not do the correct thing.
> >
> >
> >> Model/domain pairs really aren't Hits/HSPs by definition, like the
> >> CVS commit from Jason states.  The way Pfam is set up, you actually
> >> have your query(ies) scanned using a database of Pfam domains (HMM's,
> >> built from protein alignments for various protein families), hence
> >> the alignment in the report is not a HSP since HSPs come from
> >> pairwise sequence alignments.  An HSP is a pair of sequences which,
> >> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> >> has alignments of the sequence and the consensus for the alignment
> >> the HMM is based on (not another sequence, so not an HSP).
> >
> > But this is just semantics. It doesn't /matter/ that its not really
> > truly a sequence that's being aligned. The parser needs to present to
> > the user the information in the file. As we see in the OP's
> > example, it
> > simply fails to do this because the parser isn't model-centric
> > while the
> > file it is parsing /is/.
> >
> > And in any case, your argument doesn't hold because even the current
> > parser /does/ store domains in hsp objects! It just only stores one
> > hsp
> > per hit, repeatedly, which is nonsensical.
> >
> > [to avoid confusion, in the following the use of 'model' is in the
> > programming sense, whilst 'Model' refers to the things generated by
> > hmmer]
> >
> > The correct model to describe the file being parsed is one that is
> > able
> > provide to the user all the available results for all Models that
> > hit a
> > query sequence, even when there are no alignments in the file. To make
> > this fit the SearchIO scheme, we must have one hit per Model. The hit
> > has hsps which are the domains. This perfectly matches the information
> > in the file. It matches something like a Blast, where you have one hit
> > per database sequence/query sequence combo.
> >
> > A hit could end up with no hsps (no domains), but we may not even
> > care.
> > Sometimes you really do just want to know if a particular model hit at
> > all, and with what evalue/score. The current parsing model isn't
> > guaranteed to tell you this even when you can read it yourself in the
> > file being parsed.
> >
> > You can guess at the intent of the original authors, I think, just by
> > looking at those method synonyms. next_hit == next_model. next_hsp ==
> > next_domain. This makes perfect sense. This is the way to correctly
> > model the information in the file. The problem is that next_model
> > doesn't give you the next Model (because each Model has multiple
> > hits),
> > and next_domain doesn't give you the next domain (because each hit
> > only
> > has one domain).
> >
> >
> >> I think the reasoning for keeping single model-domain pairs is that
> >> you should consider each domain's location in the sequence as well as
> >> the number of times they appear, regardless of whether they belong to
> >> the same model or not.  One protein could have three ATP-binding
> >> domains and another two, and they could be located in different
> >> positions on the sequence.  But where they are on the sequence in
> >> relation to other domains and to each other (i.e. positional
> >> information) is just as important, maybe more so, than how many times
> >> that domain appears.
> >
> > Well, that's for the user to decide. But the way the results are
> > presented needs to make sense. If blast results came back with all
> > hsps
> > listed out in sequence position order, would you have multiple hits
> > per
> > database sequence each with one hsp? No, because the meaning is
> > completely wrong. The 'hit' is the collection of alignments of a
> > particular database sequence hitting a query sequence. The alignments
> > are stored in a bunch of hsps. It is absurd to have more than one hit
> > object for a database+query sequence combo, because then we have
> > multiple hit objects duplicating the exact same information, and 'hit'
> > no longer has any meaning - it is a collection of /some/ of the
> > alignments? Yet this is exactly what we have with hmmpfam result
> > parsing.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 29 03:02:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:02:49 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
Message-ID: <44A37B19.7030908@sendu.me.uk>

Chris Fields wrote:
>
> Personally, I don't think right now is the time to think about refactoring
> this particular module, esp. since I find it essentially works.  I believe
> that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
> instance, or refactoring SearchIO::blast etc to use hashes instead of
> objects to speed things up.  Or creating something yourself.  Or doing what
> you currently are doing (Bio::Map).  In other words, areas where use is
> high, code is aging, and refactoring is more productive.

Hmmer parsing happens to be important to me, in fact vital for my work. 
I've been using my own parser up till now, so didn't know what the 
Bioperl one was like. I'd like to use Bioperl for more things, 
preferably everything.


> I'll add that I'm not trying to dissuade you from trying to build your own
> variation of a SearchIO HMMER parser; by all means go ahead.  The above is
> how I feel.  You can build your own parser to do what you want; you can even
> base it off the current SearchIO HMMER parser and see if you can set it up
> to give you the results you want, using a different handler and so on.  Just
> don't break the API or modify the current code based strictly on what your
> opinion of how it should work is.  It was probably set up this way for a
> particular reason.

Well, I don't like the idea of there being multiple SearchIO parsers for 
the same thing.

[...]
> And, frankly, it's not up to the user when using code they didn't create.
> You have to deal with it.  Or code something yourself to do things the way
> you want.  You have the power to do that; most bioperl users don't simply
> b/c they probably don't understand the class structure and OO nature of
> Bioperl.  It's just a matter of where you want to spend your energy: dealing
> with something that interests you or fixing other's people's broken code.

My original question was essentially: does doing it my way make sense? 
And implicitly: would doing it my way be of any harm? Ie. can I go ahead 
and change how the parser reports results and groups them together? I 
don't think it will involve an API change, but the results it generates 
will obviously be very different.


From bix at sendu.me.uk  Thu Jun 29 03:54:50 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:54:50 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <44A3874A.9040803@sendu.me.uk>

Jason Stajich wrote:
>
> Feel free to propose an alternative implement for parser as you see  
> fit as long as the API is preserved.  you can contibute a new  
> SearchIO plugin and HMMERSearchResultListener to deal with it - or [snip]

What's the thinking behind the way SearchIOs work? Is it necessary or 
desirable to always do it with events and listeners? Or is it enough to 
simply return a ResultI regardless of how you made it?

From cjfields at uiuc.edu  Thu Jun 29 09:27:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 08:27:00 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <A1A48284-6FD6-4898-9438-DEEB105496EC@uiuc.edu>


On Jun 29, 2006, at 2:02 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> Personally, I don't think right now is the time to think about  
>> refactoring
>> this particular module, esp. since I find it essentially works.  I  
>> believe
>> that energy is better spent elsewhere, such as SeqIO::genbank/ 
>> swiss/embl for
>> instance, or refactoring SearchIO::blast etc to use hashes instead of
>> objects to speed things up.  Or creating something yourself.  Or  
>> doing what
>> you currently are doing (Bio::Map).  In other words, areas where  
>> use is
>> high, code is aging, and refactoring is more productive.
>
> Hmmer parsing happens to be important to me, in fact vital for my  
> work.
> I've been using my own parser up till now, so didn't know what the
> Bioperl one was like. I'd like to use Bioperl for more things,
> preferably everything.

We're not deterring you from setting up your own parser, something  
both Jason and I suggested.  I just don't see what the major issue  
is; hmmerpfam results never really contain the same number of hits  
per query that BLAST does (I get at the very most 30-40 and that is  
usually based on repeats).  I believe the best place to spend this  
energy first and foremost is fixing the bug.

>> I'll add that I'm not trying to dissuade you from trying to build  
>> your own
>> variation of a SearchIO HMMER parser; by all means go ahead.  The  
>> above is
>> how I feel.  You can build your own parser to do what you want;  
>> you can even
>> base it off the current SearchIO HMMER parser and see if you can  
>> set it up
>> to give you the results you want, using a different handler and so  
>> on.  Just
>> don't break the API or modify the current code based strictly on  
>> what your
>> opinion of how it should work is.  It was probably set up this way  
>> for a
>> particular reason.
>
> Well, I don't like the idea of there being multiple SearchIO  
> parsers for
> the same thing.

See, here's the thing: if the community-at-large decides to use your  
version of the parser then, by default it will become the only HMMER  
SearchIO parser and we'll deprecate the old one.  I just don't think  
this is the way I would go about it.  Jason has mentioned that object  
instantiation is a bigger issue with parsing (speed) than anything  
else; why not, if you plan on doing this, set up a Handler to return  
hashes, or do it completely under-the-hood?  Have it be the 'new,  
faster way to run SearchIO.'  Don't rehash (pardon the bad pun) the  
way things were esp. when proposals are out there to improve the  
toolkit.

> [...]
>> And, frankly, it's not up to the user when using code they didn't  
>> create.
>> You have to deal with it.  Or code something yourself to do things  
>> the way
>> you want.  You have the power to do that; most bioperl users don't  
>> simply
>> b/c they probably don't understand the class structure and OO  
>> nature of
>> Bioperl.  It's just a matter of where you want to spend your  
>> energy: dealing
>> with something that interests you or fixing other's people's  
>> broken code.
>
> My original question was essentially: does doing it my way make sense?
> And implicitly: would doing it my way be of any harm? Ie. can I go  
> ahead
> and change how the parser reports results and groups them together? I
> don't think it will involve an API change, but the results it  
> generates
> will obviously be very different.

And my point is that both ways make sense, at least to me (and it  
sounds like to Jason though I could be wrong).  Again, create a new  
version of the parser based on what you want to do and accomplish.   
Don't just modify something the community at-large uses based on your  
whims. Make the changes to a new module and let the community  
decide.  As an example, BioPerl, for the longest time, had several  
BLAST parsers; we directed everybody over to SearchIO and most people  
seem to like it; hence the others are deprecated.

And changing the results returned by some could be considered  
changing the API or a bug.  If someone using this module has an  
automated pipeline set up for annotation using Pfam, hmmpfam,  
Bioperl, and a database, and their setup expects single model/domain  
pairs, yeah, your changes will break that.  Maybe small,  
inconsequential even, but it's possible (and even true; many genome  
annotation pipelines are set up exactly how I describe).

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ClarkeW at AGR.GC.CA  Thu Jun 29 10:31:14 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 10:31:14 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>

Hi all, 

 
Recently I was working on a project which required some manipulation of
Quality files. I may be wrong in this, but I don't believe that there is
a Quality format for Bio:SeqIO. If there is, someone could point me in
the right direction as I could write a much nicer script then what I
currently have, if not I was wondering if anyone here has any use for
such a thing. I am pretty new to developing but would be willing to give
it a shot, as I feel that for all the use I get out of BioPerl with no
thanks to anyone who spent time on writing something I used, I could try
and contribute my limited amount. Any comments would be appreciated, and
don't be afraid to tell me this is a lost cause. I realize that quality
files tend to be less important than FASTA sequence files. I will give
you a little information on me so that you know what to expect/what I am
working with.

I am a fourth year bioinformatics student, and am currently working as a
summer student. I have some limited experience with writing perl modules
and test scripts. Mostly I write perl to do specific jobs, that I or
someone else has come up with to fill some immediate need of the
company. I am interested in most things bioinformatics/computer
sci/biology and am hoping to do Graduate studies when I finish my
degree.

Well that's enough for now, if you have any comments/suggestions I would
appreciate it.

 
Cheers, Wayne


From cjfields at uiuc.edu  Thu Jun 29 10:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 09:55:16 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <001601c69b8c$08cdce70$15327e82@pyrimidine>

> Recently I was working on a project which required some manipulation of
> Quality files. I may be wrong in this, but I don't believe that there is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could try
> and contribute my limited amount. Any comments would be appreciated, and
> don't be afraid to tell me this is a lost cause. I realize that quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in BioPerl,
which is the sequence implementation for sequences with quality data and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as a
> summer student. I have some limited experience with writing perl modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Thu Jun 29 11:01:52 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 11:01:52 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>


Thanks Chris, 

I don't know how I didn't come up with this before. Can I use
Bio::SeqIO::qual as follows?

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Thursday, June 29, 2006 8:55 AM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] BioPerl and quality files

> Recently I was working on a project which required some manipulation
of
> Quality files. I may be wrong in this, but I don't believe that there
is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to
give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could
try
> and contribute my limited amount. Any comments would be appreciated,
and
> don't be afraid to tell me this is a lost cause. I realize that
quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I
am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in
BioPerl,
which is the sequence implementation for sequences with quality data
and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as
a
> summer student. I have some limited experience with writing perl
modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I
would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 11:21:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:21:21 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>
Message-ID: <002001c69b8f$ad754450$15327e82@pyrimidine>

It should work that way, yes:  

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

# the below should return a Bio::Seq::Quality object
my $seq = $in->next_seq; 

You might want to check the other SeqIO modules as well depending on your
format:

...

Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

...

Chris

> Thanks Chris,
> 
> I don't know how I didn't come up with this before. Can I use
> Bio::SeqIO::qual as follows?
> 
> my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');
> 
> Cheers, Wayne
...


From cjfields at uiuc.edu  Thu Jun 29 11:23:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:23:20 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
Message-ID: <002101c69b8f$f48bd070$15327e82@pyrimidine>

Sendu, 

The HOWTO explains everything:

http://www.bioperl.org/wiki/HOWTO:SearchIO

under "Implementation."  I learned this the hard way when I started working
on SearchIO::blast and wondered why it had so many *_element methods.  

Yes, you will need an EventHandler if you implement SearchIO; the
EventHandler should implement Bio::SearchIO::EventHandlerI interface.  You
might not need one that returns objects though (i.e. it could return
hashes).  And you could possibly get around the event handler somehow,
though if you plan on doing that, why not just work on Bio::Tools::Hmmpfam
as an alternative parser?  We've had other BLAST parsers before
(Bio::Tools::BPLite comes to mind); if they aren't maintained and there is a
viable alternative they can be deprecated.  Hence the reason I mentioned
working on your own version of SearchIO::hmmer; if that module becomes most
prevalently used we can deprecate the older version.

The idea that a SearchIO plugin should act like a SAX parser is based on the
fact that many files being parsed are quite large, so it would be nice to
have everything parsed as a stream (on-the-go) as opposed to preprocessing
everything into an object hierarchy (which can be very memory intensive for
large files).  Whether this is done in practice in all SearchIO modules is
another thing; it may be based upon what particular fixes were made over
time or the contributor's intentions.  

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 29, 2006 2:55 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
> 
> Jason Stajich wrote:
> >
> > Feel free to propose an alternative implement for parser as you see
> > fit as long as the API is preserved.  you can contibute a new
> > SearchIO plugin and HMMERSearchResultListener to deal with it - or
> [snip]
> 
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Thu Jun 29 11:05:54 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 29 Jun 2006 16:05:54 +0100
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <44A3EC52.7030502@colibase.bham.ac.uk>

Hi Wayne.

I think Bio::SeqIO::qual is what you are looking for.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From jason at bioperl.org  Thu Jun 29 14:04:12 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 29 Jun 2006 14:04:12 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
Message-ID: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>

however you want - the idea of listeners at the time was to make it  
more SAX like so we could throw away events we didn't want and speed  
up the whole system when there was some idea of how you wanted the  
data filtered.  That may have been too much wishful thinking and I  
just couldn't do it alone.


On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> Feel free to propose an alternative implement for parser as you see
>> fit as long as the API is preserved.  you can contibute a new
>> SearchIO plugin and HMMERSearchResultListener to deal with it - or  
>> [snip]
>
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it  
> enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From prettyblondegirl222 at yahoo.com  Thu Jun 29 14:23:56 2006
From: prettyblondegirl222 at yahoo.com (S S)
Date: Thu, 29 Jun 2006 11:23:56 -0700 (PDT)
Subject: [Bioperl-l] TAKE ME OFF
Message-ID: <20060629182356.93810.qmail@web51305.mail.yahoo.com>

  
---------------------------------
How low will we go? Check out Yahoo! Messenger?s low  PC-to-Phone call rates.

From cjfields at uiuc.edu  Thu Jun 29 23:53:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 22:53:22 -0500
Subject: [Bioperl-l] SearchIO::blast, was Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
	<166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
Message-ID: <7511BE75-3A87-4E78-BFEA-2B38210BAD85@uiuc.edu>

If we can work around the listener/handler that'll definitely speed  
things up.  I was thinking about tackling the SearchIO::blast parser  
next, refactoring it to use hashes as a separate plugin module; if I  
don't need the handler for that then it'll speed things up a bit.

Chris

On Jun 29, 2006, at 1:04 PM, Jason Stajich wrote:

> however you want - the idea of listeners at the time was to make it
> more SAX like so we could throw away events we didn't want and speed
> up the whole system when there was some idea of how you wanted the
> data filtered.  That may have been too much wishful thinking and I
> just couldn't do it alone.
>
>
> On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:
>
>> Jason Stajich wrote:
>>>
>>> Feel free to propose an alternative implement for parser as you see
>>> fit as long as the API is preserved.  you can contibute a new
>>> SearchIO plugin and HMMERSearchResultListener to deal with it - or
>>> [snip]
>>
>> What's the thinking behind the way SearchIOs work? Is it necessary or
>> desirable to always do it with events and listeners? Or is it
>> enough to
>> simply return a ResultI regardless of how you made it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Fri Jun 30 08:45:15 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 30 Jun 2006 14:45:15 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>

Hi,

>My original question was essentially: does doing it my way make sense?
With respect to Sendu's points, I can only say that a colleague
(developer) and I were surprised that the HMMer parser did not group
the hits as the blast parser does, in "Hit" and "Hsp".
When we realized how hmmer parsing worked we continued with to use it
but used a check for multiple hits of one domain on 1 query sequence
(e.g. in hmmpfam).

Regards,
Bernd

From jason at bioperl.org  Fri Jun 30 10:05:01 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Jun 2006 10:05:01 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
Message-ID: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>

I understand the confusion and it was the intention of having HSPs  
grouped together under the same Hit initialy just like BLAST reports  
- but somehow in the bug-fix-cycle the way to deal with the fact that  
"HSPs" aren't ordered by the overall Hit table led to this design  
decision - the problem before was something with the ordering, but I  
must admit to not being able to remember what specifically was the  
problem t I can't really remember why I changed things to do this.   
Does 1.4 actually do it the way you expect?

Again, more user feedback is definitely critical to make these tools  
useful to everyone so please don't bashful about reporting your  
preferences.

-j

On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:

> Hi,
>
>> My original question was essentially: does doing it my way make  
>> sense?
> With respect to Sendu's points, I can only say that a colleague
> (developer) and I were surprised that the HMMer parser did not group
> the hits as the blast parser does, in "Hit" and "Hsp".
> When we realized how hmmer parsing worked we continued with to use it
> but used a check for multiple hits of one domain on 1 query sequence
> (e.g. in hmmpfam).
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Fri Jun 30 11:56:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 10:56:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
Message-ID: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>

It may have been just simpler to have it be one HSP (domain) per Hit  
(model) as that's how the reports are generated.  My reasoning was  
that using the one domain per model made sense based on what you are  
actually trying to do, which is annotate the sequence based on the  
order the domain appears.  Most others may not view it that way,  
which is fine.  One can always gather the relevant HSP's, convert to  
seqfeatures, then sort them if order is important, I suppose.

I would say, if the overall consensus is to modify it to have  
multiple domain hits per model (similar to BLAST) then Sendu should  
go ahead and make those changes then announce it on the list so no  
one can gripe about it later.  My main concern was not changing  
things so dramatically that it'll break for someone, but seeing as  
we've had a lengthy discussion about it already they should have  
piped up by now!   Well, that and trying to return everything as  
hashes as Jason suggested.  From looking at SearchIO::hmmer we need  
to make sure that both hmmsearch and hmmpfam work the same way (looks  
like they have different sections) and that the reported bug about  
missing hits (Bug 2036) is fixed as well.

Chris

On Jun 30, 2006, at 9:05 AM, Jason Stajich wrote:

> I understand the confusion and it was the intention of having HSPs
> grouped together under the same Hit initialy just like BLAST reports
> - but somehow in the bug-fix-cycle the way to deal with the fact that
> "HSPs" aren't ordered by the overall Hit table led to this design
> decision - the problem before was something with the ordering, but I
> must admit to not being able to remember what specifically was the
> problem t I can't really remember why I changed things to do this.
> Does 1.4 actually do it the way you expect?
>
> Again, more user feedback is definitely critical to make these tools
> useful to everyone so please don't bashful about reporting your
> preferences.
>
> -j
>
> On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:
>
>> Hi,
>>
>>> My original question was essentially: does doing it my way make
>>> sense?
>> With respect to Sendu's points, I can only say that a colleague
>> (developer) and I were surprised that the HMMer parser did not group
>> the hits as the blast parser does, in "Hit" and "Hsp".
>> When we realized how hmmer parsing worked we continued with to use it
>> but used a check for multiple hits of one domain on 1 query sequence
>> (e.g. in hmmpfam).
>>
>> Regards,
>> Bernd
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Jun 30 12:14:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Jun 2006 17:14:05 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
Message-ID: <44A54DCD.3050708@sendu.me.uk>

Chris Fields wrote:
> It may have been just simpler to have it be one HSP (domain) per Hit 
> (model) as that's how the reports are generated.  My reasoning was that 
> using the one domain per model made sense based on what you are actually 
> trying to do, which is annotate the sequence based on the order the 
> domain appears.  Most others may not view it that way, which is fine.  
> One can always gather the relevant HSP's, convert to seqfeatures, then 
> sort them if order is important, I suppose.
> 
> I would say, if the overall consensus is to modify it to have multiple 
> domain hits per model (similar to BLAST) then Sendu should go ahead and 
> make those changes then announce it on the list so no one can gripe 
> about it later.  My main concern was not changing things so dramatically 
> that it'll break for someone

Going on your earlier suggestion, I was thinking about making 
SearchIO::hmmpfam instead, which would get used if you set the format to 
'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I 
suppose I would make a SearchIO::hmmsearch as well, if necessary.


[...]
> that the reported bug about missing hits (Bug 2036) is fixed as well.

However, having never made a SearchIO plugin before, it will be some 
time before I get my head around it. I'll want to make one the current 
HOWTO:SearchIO way before I can think about doing it a better way 
(hashes) as well. So I can say I'll make a move on this at some point in 
the future, but if someone wants to fix Bug 2036 in the mean time, they 
are welcome to. Again as suggested, my priority is Bio::Map right now.

From rmb32 at cornell.edu  Fri Jun 30 13:01:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 10:01:38 -0700
Subject: [Bioperl-l] parser for GeneSeqer
Message-ID: <44A558F2.2050304@cornell.edu>

Hi all,

I find myself needing a parser for GeneSeqer output, so I'm writing one 
(which I will submit for your consideration when it's working).  In a 
nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of 
ESTs to genomic sequence, then using those alignments to predict where 
in the genomic sequence the genes are.  So really what you get from this 
is a bunch of hierarchical features.

I don't really know where I should put it in the bioperl hierarchy 
though.  Probably FeatureIO?

And what's the current fashion for objects it should emit?  
Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 30 13:43:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:43:56 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A54DCD.3050708@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
	<44A54DCD.3050708@sendu.me.uk>
Message-ID: <E2C6F66F-9B85-42D3-B2A0-BD7C8B222572@uiuc.edu>

I'll try looking at it this weekend.  A suggested workaround is to  
either try setting -A for no alignments or setting it to a high  
number to retrieve all of them.  It's pretty serious as the error  
silently dumps those domains, so for those using automated annotation  
pipelines would miss it unless they are also checking the raw output.

You could design a SearchIO::hmmpfam parser then expand it to take in  
hmmsearch output at a later point, or keep them separate.  I like the  
idea of having modules that are more specific about what they parse;  
seems at some point you reach serious code bloat and maintenance  
becomes an issue.  Look at SearchIO::blast; it parses various text  
BLAST output very well but with some serious obfuscation.  Just don't  
know how productive it would be to separate out the PSI-BLAST and  
bl2seq stuff since they are pretty close to a standard BLAST  
report... oh well.

To Jason : good luck on your move.  Drop  us a line here to let us  
know everything went well.

Chris

On Jun 30, 2006, at 11:14 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> It may have been just simpler to have it be one HSP (domain) per Hit
>> (model) as that's how the reports are generated.  My reasoning was  
>> that
>> using the one domain per model made sense based on what you are  
>> actually
>> trying to do, which is annotate the sequence based on the order the
>> domain appears.  Most others may not view it that way, which is fine.
>> One can always gather the relevant HSP's, convert to seqfeatures,  
>> then
>> sort them if order is important, I suppose.
>>
>> I would say, if the overall consensus is to modify it to have  
>> multiple
>> domain hits per model (similar to BLAST) then Sendu should go  
>> ahead and
>> make those changes then announce it on the list so no one can gripe
>> about it later.  My main concern was not changing things so  
>> dramatically
>> that it'll break for someone
>
> Going on your earlier suggestion, I was thinking about making
> SearchIO::hmmpfam instead, which would get used if you set the  
> format to
> 'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I
> suppose I would make a SearchIO::hmmsearch as well, if necessary.
>
>
> [...]
>> that the reported bug about missing hits (Bug 2036) is fixed as well.
>
> However, having never made a SearchIO plugin before, it will be some
> time before I get my head around it. I'll want to make one the current
> HOWTO:SearchIO way before I can think about doing it a better way
> (hashes) as well. So I can say I'll make a move on this at some  
> point in
> the future, but if someone wants to fix Bug 2036 in the mean time,  
> they
> are welcome to. Again as suggested, my priority is Bio::Map right now.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Jun 30 13:54:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:54:23 -0500
Subject: [Bioperl-l] parser for GeneSeqer
In-Reply-To: <44A558F2.2050304@cornell.edu>
References: <44A558F2.2050304@cornell.edu>
Message-ID: <2FB066C7-12E6-46D8-8F4A-BD096BE2A0CA@uiuc.edu>

If you plan on generating seqfeatures from this output you could  
check out the Bio::Tools core modules for examples.  There are a few  
there that take program output and convert them to  
Bio::SeqFeature::Generic objects, including Bio::Tools:RNAMotif and  
Bio::Tools::tRNAscanSE.  If alignments are involved you might want  
something like Bio::SeqFeature::FeaturePair.  Not sure about using  
the SeqFeature::Annotation or others; I thought that the some of the  
Annotation/Annotatable stuff might be changing soon but I may be wrong.

Chris

On Jun 30, 2006, at 12:01 PM, Robert Buels wrote:

> Hi all,
>
> I find myself needing a parser for GeneSeqer output, so I'm writing  
> one
> (which I will submit for your consideration when it's working).  In a
> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> ESTs to genomic sequence, then using those alignments to predict where
> in the genomic sequence the genes are.  So really what you get from  
> this
> is a bunch of hierarchical features.
>
> I don't really know where I should put it in the bioperl hierarchy
> though.  Probably FeatureIO?
>
> And what's the current fashion for objects it should emit?
> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>
> Rob
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From rmb32 at cornell.edu  Fri Jun 30 15:32:11 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 12:32:11 -0700
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
In-Reply-To: <29201430510651801@webmail.iastate.edu>
References: <29201430510651801@webmail.iastate.edu>
Message-ID: <44A57C3B.8040808@cornell.edu>

Aha!  Isn't it amazing what gets revealed when you just get off your 
butt and ask on the mailing list.

I'll look at that code straightaway.  The concept is quite attractive to 
me, since GenomeThreader is the next program that I'm going to be 
integrating into my analysis stuff.  Unfortunately, (I am under the 
impression that) my GeneSeqer parser is almost finished.

This brings us to the next question, what about parsing the 
GenomeThreader XML?  Would be lovely to have a Bioperl interface for 
that.  Is there some code floating about for that too?

Rob

Michael E Sparks wrote:
> Hi Rob,
>
> For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
>  You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/
>
> There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
> an XML format also used by the GenomeThreader spliced alignment program, whose
> schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
> 0README in the above directory will give you an overview of what tools I've made
> available.  Hope you find it useful!
>
> Regards,
> Michael
>
> --
> Thanks,
> Michael E Sparks
> Graduate Assistant, Brendel Lab
> 2128 Molecular Biology Building
> Iowa State University
> Ames, IA 50011-3260
> 1-515-294-4063
> http://www.public.iastate.edu/~mespar1/
>
>
> Forwarded Message:
>   
>> To: <plantgdb at iastate.edu>
>> From: "Shannon D Schlueter" <sds at iastate.edu>
>> Subject: FW: [Bioperl-l] parser for GeneSeqer
>> Date: Fri, 30 Jun 2006 13:01:46 -0500
>> -----
>>     
>>> Date: Fri, 30 Jun 2006 10:01:38 -0700
>>> From: Robert Buels <rmb32 at cornell.edu>
>>> User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
>>> To: bioperl-l at bioperl.org
>>> Subject: [Bioperl-l] parser for GeneSeqer
>>> Sender: bioperl-l-bounces at lists.open-bio.org
>>>
>>> Hi all,
>>>
>>> I find myself needing a parser for GeneSeqer output, so I'm writing one
>>> (which I will submit for your consideration when it's working).  In a
>>> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
>>> ESTs to genomic sequence, then using those alignments to predict where
>>> in the genomic sequence the genes are.  So really what you get from this
>>> is a bunch of hierarchical features.
>>>
>>> I don't really know where I should put it in the bioperl hierarchy
>>> though.  Probably FeatureIO?
>>>
>>> And what's the current fashion for objects it should emit? 
>>> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>>>
>>> Rob
>>>
>>> --
>>> Robert Buels
>>> SGN Bioinformatics Analyst
>>> 252A Emerson Hall, Cornell University
>>> Ithaca, NY  14853
>>> Tel: 503-889-8539
>>> rmb32 at cornell.edu
>>> http://www.sgn.cornell.edu
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>
>
>
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From mespar1 at iastate.edu  Fri Jun 30 15:20:29 2006
From: mespar1 at iastate.edu (Michael E Sparks)
Date: Fri, 30 Jun 2006 14:20:29 -0500 (CDT)
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
Message-ID: <29201430510651801@webmail.iastate.edu>

Hi Rob,

For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
 You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/

There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
an XML format also used by the GenomeThreader spliced alignment program, whose
schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
0README in the above directory will give you an overview of what tools I've made
available.  Hope you find it useful!

Regards,
Michael

--
Thanks,
Michael E Sparks
Graduate Assistant, Brendel Lab
2128 Molecular Biology Building
Iowa State University
Ames, IA 50011-3260
1-515-294-4063
http://www.public.iastate.edu/~mespar1/


Forwarded Message:
> To: <plantgdb at iastate.edu>
> From: "Shannon D Schlueter" <sds at iastate.edu>
> Subject: FW: [Bioperl-l] parser for GeneSeqer
> Date: Fri, 30 Jun 2006 13:01:46 -0500
> -----
> >Date: Fri, 30 Jun 2006 10:01:38 -0700
> >From: Robert Buels <rmb32 at cornell.edu>
> >User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
> >To: bioperl-l at bioperl.org
> >Subject: [Bioperl-l] parser for GeneSeqer
> >Sender: bioperl-l-bounces at lists.open-bio.org
> >
> >Hi all,
> >
> >I find myself needing a parser for GeneSeqer output, so I'm writing one
> >(which I will submit for your consideration when it's working).  In a
> >nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> >ESTs to genomic sequence, then using those alignments to predict where
> >in the genomic sequence the genes are.  So really what you get from this
> >is a bunch of hierarchical features.
> >
> >I don't really know where I should put it in the bioperl hierarchy
> >though.  Probably FeatureIO?
> >
> >And what's the current fashion for objects it should emit? 
> >Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
> >
> >Rob
> >
> >--
> >Robert Buels
> >SGN Bioinformatics Analyst
> >252A Emerson Hall, Cornell University
> >Ithaca, NY  14853
> >Tel: 503-889-8539
> >rmb32 at cornell.edu
> >http://www.sgn.cornell.edu
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jay at jays.net  Thu Jun  1 00:58:29 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 23:58:29 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000001c68528$d1b6ec10$15327e82@pyrimidine>
References: <000001c68528$d1b6ec10$15327e82@pyrimidine>
Message-ID: <447E73F5.40403@jays.net>

Chris Fields wrote:
>> Is the doc/ tree being abandoned?
> 
> Most docs have been moved over to the wiki, which generates nicely formatted
> docs for printing.

Oh. Well, if we've already jumped off that cliff I say we just go for it. Move everything to the wiki, nuke the empty CVS dirs, and call it good.

I hereby volunteer to strip the code out of bptutorial.pl and put it wherever. Where should I put it when I'm done? (examples/tutorial.pl?)

>> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> I believe the reasoning is along these lines: HOWTO's are focused in on
> specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
> has greater detail. The tutorials are more broadly based (sort of a general
> bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
> has additional information over the tutorial (at least it did the last time
> I looked at the tutorial, which has been a while).

Huh. Sounds like a subtle line. I might suggest picking one name or the other and shuffling everything into one list on the wiki. 

>> It's hard for me to dive into a wiki lifestyle for the huge documentation
>> pillars since it can't ever get back into the distro... (can it?)  Small,
>> throw away stuff is great for the wiki, but huge, established, thoughtful,
>> long documents should be left in the distro? Present (and searchable) on
>> the wiki but static?
> 
> Hence the problem we face now.  It is something we need to really look into
> before adding too much more to the wiki.  IMHO, I think we should have very
> little information directly in the distribution itself since it's already
> quite large.  It's almost as easy to have a bare-bones INSTALL file, which
> would point to the wiki for additional information.  But I may be very much
> alone in that train of thought ; >

If the doc/ tree has already moved then I guess I just joined the all-wiki camp. I assume it stores full revision history and we have backups in case somebody blows something up. Any system is better than multiple systems breeding inconsistencies. Keep the spammers/clueless out and/or quickly remove their nonsense and I'm pro-wiki. Revisions email reviewers?

>> Sick of my endless questions yet? -grin-
> 
> Not really.

Give it a few more posts. It'll come. :)

j
Current toy: http://openlab.jays.net/


From ULNJUJERYDIX at spammotel.com  Thu Jun  1 02:53:46 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 1 Jun 2006 14:53:46 +0800
Subject: [Bioperl-l] **Fwd: Re: SOLVED ver2 Bio::Graphics::Panel make
	ruler have neg values
Message-ID: <5b6410e0605312353l1fbf8256hc8a2b85d0f0ac199@mail.gmail.com>

 Thanks Lincoln! Your code worked in ver 1.4 as well.
think the prob i had was due to me just adapting from the blast output
tutorial so i had something like
my $feature =
Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end,
-source=>$source);

and maybe also because I didn't have the + sign for the numbers

on a side note, I think that the ability to offset the ruler might prove
useful for some applications. Will spend more time to understand the
$relative_coords_offset option in the arrow.pm when i can afford to, and
perhaps help contribute an offset option to arrow.pm

cheers
kevin

Content-Disposition: inline
>
> Hi Kevin,
>
> Since you are modifying the Panel.pm source code, why don't you just go
> ahead
> and use the current Bio::Graphics development tree? Since 1.5.1 it
> supports
> negative coordinates. Here's an illustration:
>
> #!/usr/bin/perl
>
> use strict;
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
>
> my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
> my $feature =
> Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
> my $panel   = Bio::Graphics::Panel->new(-start=> -200,
>                                          -end  => +200,
>                                          -width=>800,
>                                          -pad_left=>10,
>                                          -pad_right=>10);
> $panel->add_track($whole,
>                    -glyph=>'arrow',
>                    -double=>1,
>                    -tick=>2);
> $panel->add_track($feature,
>                   -glyph=>'box',
>                    -stranded=>1);
> print $panel->png;
>
> exit 0;
>
> The resulting image is attached.
>
> Lincoln
>
> On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> > I am so sorry for the truncated email accidentally hit reply.
> > if anyone is interested i have opted to change
> >
> > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> > in linux its
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
> >
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
> >
> > to
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
> >
> > just  for this one-off use.
> >
> >
> >
> > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> > option for coords offset?
> >     my $relative_coords_offset =
> $self->option('relative_coords_offset');
> >     $relative_coords_offset    = 1 unless defined
> $relative_coords_offset;
> > but entering the option -relative_coords_offset=>1000 in the arrow
> glyphs
> > didn't do anything...
> >
> >
> >
> > Hi!
> >
> > > oh it was in a slightly different header asking about the create image
> > > map feature.
> > > I am using the stable version 1.4 of bioperl now. In any case I have
> not
> > > added the sequence as a feature annotated seq. as I already have the
> bp
> > > where the TF binds (in 1-1050 numberings) so what I did was to just
> add
> > > graded segments based on the position.
> > > I saw that there is a scale function for the arrow glyp however, it is
> a
> > > multiply function, can it be hacked to take in a offset value (ie
> minus
> > > the
> > > scale by 1000?)
> > >
> > > cheers
> > > kevin
> > >
> > >
> > > Hi,
> > >
> > > > For some reason I didn't see the first posting on this. In current
> > >
> > > bioperl
> > >
> > > > live, the ruler can have negative numberings - I use this routinely.
> > > > You need
> > > > to create a feature that starts in negative coordinates. What is
> > >
> > > happening
> > >
> > > > to
> > > > you when you try this?
> > > >
> > > > Lincoln
> > > >
> > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > > Hi
> > > > > thanks for the help offered thus far!
> > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer
> seq
> > > >
> > > > using
> > > >
> > > > > bioperl. therefore i was asked to make the numberings as such
> (-1000)
> > >
> > > is
> > >
> > > > > there any way at all to do this in bioperl without changing the
> .pm
> > > >
> > > > file?
> > > >
> > > > > thanks guys..
> > > > > kevin
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > Lincoln D. Stein
> > > > Cold Spring Harbor Laboratory
> > > > 1 Bungtown Road
> > > > Cold Spring Harbor, NY 11724
> > > > (516) 367-8380 (voice)
> > > > (516) 367-8389 (fax)
> > > > FOR URGENT MESSAGES & SCHEDULING,
> > > > PLEASE CONTACT MY ASSISTANT,
> > > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 03:59:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 08:59:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <447E9E59.6090709@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
>> Just looking for all return undef;s isn't enough. It's entirely possible
>> to do something like:
>>
>> my $return_value;
>> {
>>    # do something that assigns to return_value on success
>>    # on failure, just do nothing
>> }
>> return $return_value;
> 
> Agreed, though looking for these is obviously much harder.  
> 
> The way to get around those is:
> 
> return $return_value if $return_value;
> return;
> 
> which I've seen used in a number of get/set methods. 

Though if anyone is using that cookie-cutter/macro style, that's much 
worse because now you can't return 0.

return $return_value if defined($return_value);
return;

In any case, it burns the eyes. I share Lincoln's POV. I also fully 
understand your point about not being able to trust the docs 
(Bio::Map::Marker...). But the solution is to change the code so they 
match the docs when the docs make sense, not change the code so that it 
no longer matches the docs[*]. In a massive OO project like bioperl the 
users need to be able to rely on the docs. You can't turn around and say 
"you've used this method for years, but now I'm changing how it works 
because you might have used the method incorrectly". Ideally any code 
changes add functionality or improve it's working without affecting code 
  that uses the method correctly according to its old docs.


* though if there isn't time/interest in changing the code, and the 
method never worked as per the docs, then by all means change the docs 
to avoid confusion - just don't change the docs on a method that worked 
according to the docs, because then you can assume people use the method 
and will be affected by the change


From lstein at cshl.edu  Thu Jun  1 11:40:38 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 1 Jun 2006 11:40:38 -0400
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
Message-ID: <200606011140.38726.lstein@cshl.edu>

Hi,

The border is coming from the HTML <img. To get rid of it, set -border=>0 in 
the img() call.

Lincoln


On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> Hello everybody,
>
> does anybody know how to remove the background color of the Panel.
> Currently, I am not adding anything to it, so I can troubleshot the
> problem, and I have tried setting up
> all color attributes I could find to the panel, but no luck. Whatever I do,
> I get the BLUE border of the panel.
>
> Has anybody faced the same problem?
>
> Thanks in advance,
>
> Jelena
>
> And here is the code I am currently using:
>
> ---------------------------------------------------------------------------
>-------------------------------- my $panel =
>     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
>                               -width => 800,
>                               -pad_left => 10,
>                               -pad_right => 10,
>                               -key_color => 'white',
>                               -bgcolor => 'white',
>                               -gridcolor=>'black',
>                               -fgcolor => 'black',
>                               -grid => 0,
>                               );
>    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
>      -url  => '/tmpimages');
>    #make clickable image
>    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
>    print $map;
>
> ---------------------------------------------------------------------------
>--------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From arareko at campus.iztacala.unam.mx  Thu Jun  1 12:13:05 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:13:05 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A3BD0B.8A2C%osborne1@optonline.net>
References: <C0A3BD0B.8A2C%osborne1@optonline.net>
Message-ID: <447F1211.2010705@campus.iztacala.unam.mx>

You're right Brian. I also think that the text/POD part is more 
important than the script. Since we're more into moving everything to 
the Wiki, I believe this would be the right approach.

Moving the script part of the tutorial into the examples/ directory is 
also a nice idea.

Mauricio.

Brian Osborne wrote:
> Mauricio,
> 
> Bernd didn't say he want the _script_ in the package, he said he wanted
> bptutorial.pl in the package, not indicating whether it was the
> documentation or the script that was important. It's my suspicion that the
> documentation is more important than the script, and this is what my last
> letter was asking, in part: is the script important? Or can we focus on the
> text/POD part?
> 
> Brian O.
> 
> 
> On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> <arareko at campus.iztacala.unam.mx> wrote:
> 
>> I agree with what Bernd Web said in another reply. For some people will
>> be nice to still be able to run the script from the codebase and
>> interact with it.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 12:20:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:20:34 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1211.2010705@campus.iztacala.unam.mx>
Message-ID: <000b01c68597$5026bdf0$15327e82@pyrimidine>

Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
/scripts or /examples then?

Also, what do we do about similar situation with other docs moved to the
wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
distribution pointing out the wiki docs instead?

Chris

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Thursday, June 01, 2006 11:13 AM
> To: Brian Osborne
> Cc: Chris Fields; bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> You're right Brian. I also think that the text/POD part is more
> important than the script. Since we're more into moving everything to
> the Wiki, I believe this would be the right approach.
> 
> Moving the script part of the tutorial into the examples/ directory is
> also a nice idea.
> 
> Mauricio.
> 
> Brian Osborne wrote:
> > Mauricio,
> >
> > Bernd didn't say he want the _script_ in the package, he said he wanted
> > bptutorial.pl in the package, not indicating whether it was the
> > documentation or the script that was important. It's my suspicion that
> the
> > documentation is more important than the script, and this is what my
> last
> > letter was asking, in part: is the script important? Or can we focus on
> the
> > text/POD part?
> >
> > Brian O.
> >
> >
> > On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> > <arareko at campus.iztacala.unam.mx> wrote:
> >
> >> I agree with what Bernd Web said in another reply. For some people will
> >> be nice to still be able to run the script from the codebase and
> >> interact with it.
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 12:28:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:28:38 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447E9E59.6090709@mrc-dunn.cam.ac.uk>
Message-ID: <000c01c68598$704b15d0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 2:59 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> Chris Fields wrote:
> >
> > Sendu Bala wrote:
> >> Just looking for all return undef;s isn't enough. It's entirely
> possible
> >> to do something like:
> >>
> >> my $return_value;
> >> {
> >>    # do something that assigns to return_value on success
> >>    # on failure, just do nothing
> >> }
> >> return $return_value;
> >
> > Agreed, though looking for these is obviously much harder.
> >
> > The way to get around those is:
> >
> > return $return_value if $return_value;
> > return;
> >
> > which I've seen used in a number of get/set methods.
> 
> Though if anyone is using that cookie-cutter/macro style, that's much
> worse because now you can't return 0.
> 
> return $return_value if defined($return_value);
> return;

Makes sense.  Really, this all comes down to semantics and the context of
how the method is called and what is expected as a return value.  I suppose
it also depends on what one considers 'best practice,' which can be
subjective.  I don't want us getting into a situation in which we come
across as critiquing someone else's code w/o some valid points, i.e.
Lincoln's point about complaining.  I think that's why this thread is pretty
important, in that we're getting a broad range of opinions on the issue.

> In any case, it burns the eyes. 

Yep, I agree. 

> I share Lincoln's POV. I also fully
> understand your point about not being able to trust the docs
> (Bio::Map::Marker...). But the solution is to change the code so they
> match the docs when the docs make sense, not change the code so that it
> no longer matches the docs[*]. In a massive OO project like bioperl the

So you know, Lincoln and I both support the idea of an audit.  He also notes
(and I agree) that people will likely complain.  

Anyway, changing the code to match the docs makes sense therotically, but in
practice that doesn't always work.  Any situation where code does not behave
as expected (i.e. as described in the docs) are bugs and can be reported as
such.  The problem arises when the docs are completely wrong, as
Bio::Restriction::IO was before I made changes to it.  In many cases simple
small code changes won't work, such as when methods inherit from an
interface but don't implement all methods (so essentially are incomplete).

Hilmar made the point that we should change the docs to reflect
inconsistencies in particular plugin modules for IO classes (AlignIO has a
few modules with unimplemented write methods, and so on).  When the code
radically varies, such as in the Restriction::IO case (where none of the
write methods worked), the docs should be changed in the IO class to reflect
this.  Of course, you should also add a bit to the TO DO section of POD and
add a bit to the Project Priority List on the wiki to point this out, both
of whichI did.  It comes down to 'truth in advertising', does it do what's
expected.

> users need to be able to rely on the docs. You can't turn around and say
> "you've used this method for years, but now I'm changing how it works
> because you might have used the method incorrectly". Ideally any code

Not what I did, BTW.  The API is intact; you can still use the write methods
if you want (they throw errors just fine).  In fact, I didn't change any
methods except in one module (Restriction::IO::bairoch), where I added a
warning to the read method b/c it didn't work as expected, and I filed a bug
report.  Essentially, the only thing I changed was the docs to reflect what
the code currently can accomplish (at least until you read the TO DO).  We
already had one person email the group asking why code in the synopsis
didn't work.

Adding read and write methods to most of these modules (making the code do
what the docs reflect, in your words) is a lot of work, esp. for someone
like me unfamiliar with the class architecture and methods for those
modules.  IMHO, contributions to bioperl should accomplish what is reflected
in their docs once added to the core; if a write method hasn't been written,
then add it to the docs in a TO DO section or add a warning to the synopsis.
Don't put in the docs what you intend the code to accomplish down the road
but what it does currently.  Is that unreasonable?

Anyway, when something doesn't perform as expected (produces invalid output
or contains errors), it's considered a bug.  That includes misrepresenting
what a module does in the docs.  When we try to fix bugs we have to decipher
what the intent of the original author was from the docs and code, then try
to get it to work by modifying the code.  In extreme cases (such as
unimplemented methods) that may mean writing up entire methods from scratch.
The read and write methods for IO modules are normally the longest methods
in a class.  That's a heck of a lot of effort for something that a large
majority of us aren't interested in taking up, esp. when the submitting
author should have had everything up to spec (i.e. what's in the docs) when
adding it to the core.

> changes add functionality or improve it's working without affecting code
>   that uses the method correctly according to its old docs.
> 
> 
> * though if there isn't time/interest in changing the code, and the
> method never worked as per the docs, then by all means change the docs
> to avoid confusion - just don't change the docs on a method that worked
> according to the docs, because then you can assume people use the method
> and will be affected by the change

Again, didn't do that.  The methods in the docs either didn't exist (not
implemented) or didn't work (contained bugs).  The docs were changed b/c
they were misleading.

-chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 12:36:07 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:36:07 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
References: <C0A31929.89F9%osborne1@optonline.net> <447E48B9.4080503@jays.net>
Message-ID: <447F1777.3070906@campus.iztacala.unam.mx>

Jay Hannah wrote:
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should 
>> point to the Tutorials page instead of the Bptutorial.pl page please let 
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Nice idea, I'll check with Jason if it's possible (in mediawiki) to 
create a new Documentation sidebar to hold this 4 sections.

> (What's the conceptual difference between a HOWTO and a tutorial?)

My concept is that Tutorials cover a wider aspect of BioPerl, contrary 
to the HOWTO's which focus on a certain topic.

> Why isn't the short "Current events" just listed on the top of the "News" page?

I don't know, maybe because it was important when Jason started the Wiki 
a couple of months ago. Do you think it should be erased from the sidebar?

> Sick of my endless questions yet? -grin-
> 
> j
> 

Of course not! :)

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 12:46:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:46:03 +0100
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <000c01c68598$704b15d0$15327e82@pyrimidine>
References: <000c01c68598$704b15d0$15327e82@pyrimidine>
Message-ID: <447F19CB.4090607@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
[snip]
>> users need to be able to rely on the docs. You can't turn around and say
>> "you've used this method for years, but now I'm changing how it works
>> because you might have used the method incorrectly". Ideally any code
> 
> Not what I did, BTW.
[snip]
>> * though if there isn't time/interest in changing the code, and the
>> method never worked as per the docs, then by all means change the docs
>> to avoid confusion - just don't change the docs on a method that worked
>> according to the docs, because then you can assume people use the method
>> and will be affected by the change
> 
> Again, didn't do that.

I'm very sorry that I allowed the ambiguity, but my comments were 
certainly not directed at your recent changes to Bio::Restriction::IO. 
In fact, I put in the above * comment to exclude your changes from my 
discussion; you changed the docs because the code never did what they 
said they did (the docs were bad). That's fine (good!). My comments were 
a general point, slightly directed at the idea of changing all the 
return undef;s - changing the code so that it no longer matches the docs 
of a previously working method. That's what I think is bad. Though in 
this particular case it shouldn't make any difference at all.


From osborne1 at optonline.net  Thu Jun  1 12:46:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 12:46:02 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <C0A4920A.8A5B%osborne1@optonline.net>

Chris,

I think the INSTALL* files should be in the package, this is the de facto
convention for 99% of the packages I've ever seen. Then any Wiki page just
links to the file in CVS.

Personally I don't like the idea of maintaining a Wiki page and a file that
both say essentially the same thing (this is what has happened with the
INSTALL and INSTALL.WIN files). I've spent plenty of time merging redundant
text and removing files that contained these redundancies so it's
unfortunate to see them appear anew, sooner or later they'll get out of sync
despite best intentions. The most likely cause will be someone other than
the person who created the initial duplication (and promised to maintain
both) making a change in one of the two files.

Brian O.


On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 12:57:27 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:57:27 +0100
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
References: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <447F1C77.5040403@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
> /scripts or /examples then?
> 
> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?

Imho, something like an installation document should be there in full so 
once you've downloaded you can install without reference to anything 
else. Also, an installation document could be considered specific to the 
release version. Which is to say, it never goes out of date even if new 
versions of bioperl are released with new installation instructions - it 
applies to the installation directory it is found in.

The wiki can have the latest installation instructions, and you don't 
have to worry about keeping things synced.


From cjfields at uiuc.edu  Thu Jun  1 13:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:13:30 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1C77.5040403@mrc-dunn.cam.ac.uk>
Message-ID: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>

So basically have a minimal set of installation instructions in CVS and a
more detailed installation instructions on the wiki.  Sounds reasonable
enough but bioperl is a pretty complex distribution (lots of additional
modules required, platform-specific issues, so on).  Maybe we can come up
with a pared-down INSTALL file which combines the basic elements for
installing on UNIX/Windows/Mac/FreeBSD and points out dependencies.  

I still like the idea of just having a simple conversion from wiki->txt
direct from the web page (i.e. best of both worlds).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 11:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris Fields wrote:
> > Sounds good to me.  I guess the tutorial (post-stripping)would be moved
> to
> > /scripts or /examples then?
> >
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?
> 
> Imho, something like an installation document should be there in full so
> once you've downloaded you can install without reference to anything
> else. Also, an installation document could be considered specific to the
> release version. Which is to say, it never goes out of date even if new
> versions of bioperl are released with new installation instructions - it
> applies to the installation directory it is found in.
> 
> The wiki can have the latest installation instructions, and you don't
> have to worry about keeping things synced.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s-merchant at northwestern.edu  Thu Jun  1 13:17:32 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Thu, 1 Jun 2006 12:17:32 -0500
Subject: [Bioperl-l] Bio::OntologyIO
Message-ID: <000001c6859f$446f7fd0$c2987ca5@pc13>

Hi Everyone,

    I would like to announce the availability of an obo format parser
which can parse GO, PO, PATO and other ontology files in obo format. The
parser can be used through the Bio::OntologyIO module. Thanks to HIlamar
Lapp and Chris Mungall for their invaluable contributions.

 
Thanks,

Sohel Merchant.

 
Sohel Merchant

dictyBase

Bioinformatics Software Engineer

Center for Genetic Medicine

Northwestern University

676 St. Clair Street, Suite 1206

Chicago IL 60611

 
From cjfields at uiuc.edu  Thu Jun  1 13:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:35 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4920A.8A5B%osborne1@optonline.net>
Message-ID: <001101c685a3$53f4bf70$15327e82@pyrimidine>

I understand your point, though I think the wiki gives us an opportunity add
helpful links and use markup to help clarify things a bit more.  I have seen
several distributions which don't have INSTALL files, just simple README
with very basic instructions (Bio::ASN1::EntrezGene is one).  

I've been reluctant to mess around with the wiki Install pages too much more
b/c of syncing problems, just as you mentioned.  I will look into thing a
bit more to see if there's an easier way to go about converting wiki->text.

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 11:46 AM
> To: Chris Fields; 'Mauricio Herrera Cuadra'
> Cc: bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> I think the INSTALL* files should be in the package, this is the de facto
> convention for 99% of the packages I've ever seen. Then any Wiki page just
> links to the file in CVS.
> 
> Personally I don't like the idea of maintaining a Wiki page and a file
> that
> both say essentially the same thing (this is what has happened with the
> INSTALL and INSTALL.WIN files). I've spent plenty of time merging
> redundant
> text and removing files that contained these redundancies so it's
> unfortunate to see them appear anew, sooner or later they'll get out of
> sync
> despite best intentions. The most likely cause will be someone other than
> the person who created the initial duplication (and promised to maintain
> both) making a change in one of the two files.
> 
> Brian O.
> 
> 
> On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?


From cjfields at uiuc.edu  Thu Jun  1 13:46:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:45 -0500
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <447F19CB.4090607@mrc-dunn.cam.ac.uk>
Message-ID: <001201c685a3$59d78da0$15327e82@pyrimidine>


....

> > Again, didn't do that.
> 
> I'm very sorry that I allowed the ambiguity, but my comments were
> certainly not directed at your recent changes to Bio::Restriction::IO.
> In fact, I put in the above * comment to exclude your changes from my
> discussion; you changed the docs because the code never did what they
> said they did (the docs were bad). That's fine (good!). My comments were
> a general point, slightly directed at the idea of changing all the
> return undef;s - changing the code so that it no longer matches the docs
> of a previously working method. That's what I think is bad. Though in
> this particular case it shouldn't make any difference at all.

Agreed.  In any case, if tests have been properly set up then they should
catch problems.  This is, of course, if they are properly set up.  

Chris


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gad14 at cornell.edu  Thu Jun  1 15:10:31 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Thu, 01 Jun 2006 15:10:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447D5668.7070500@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu>
	<447BFB20.40501@mrc-dunn.cam.ac.uk>	<447C7985.9000404@cornell.edu>
	<447D5668.7070500@mrc-dunn.cam.ac.uk>
Message-ID: <447F3BA7.9030500@cornell.edu>

Problem solved, albeit, in a slightly hacky way.

I tried to make seek() work for a good long while with the SearchIO 
blast results object, but I just couldn't get it to work. (Probably b/c 
seek wants to see a genuine file handle-- not a SearchIO filehandle.) I 
used SearchIO's fh() to get the handle and could while(<$fh>) through 
the data but when I used seek($fh,0,0) to reset the cursor position in 
the handle in prep for another loop, i got an error complaining about my 
use of seek() by indicating that "SEEK" could not be found in Seekable.pm.

I concluded that it was not going to be possible and instead made an 
array if SeqFeature objects which contain all the relevant blast output 
data (i.e. the m8/hit table stuff).

It still seems unfortunate that one can't reuse the SearchIO object for 
cases when the SearchIO blast report needs to be accessed mltiple times.

Thanks for your help,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>>Thanks for your comment Sendu, it was very helpful. I think this must be 
>>what's going on.. I am using $blast_report->next_result in both 
>>subroutines. It appears that analyzing the blast results first w/ my 
>>sort subroutine empties (?) the $blast_result object so that when I try 
>>to print, there is nothing left to print. (and visa-versa when I print 
>>first then try to sort).
>>So, from the looks of things, using next_result has the effect of 
>>popping the Bio::Search::Result::ResultI objects off of the SearchIO 
>>blast report object??
> 
> 
> Not quite. It's more or less exactly like opening a file and then trying 
> to read it all twice like this:
> open(FILE, "file");
> while (<FILE>) {
>      print # prints each line in the file
> }
> while (<FILE>) {
>      print # never happens, we never enter this while loop
> }
> 
> To get the second while loop to print anything we need to say seek(FILE, 
> 0, 0) before it. Or in the first while loop store each line in an array, 
> and then make the second loop a foreach through that array.
> 
> 
> 
>>It seems I could get around this by making a copy of the blast report by 
>>setting it to another new variable...(not the most elegant solution) but 
>>I'm having trouble with this...
>>
>>If I do:
>>
>>    my $blast_report_copy = $blast_report;
>>
>>I'm just copying the reference to the SearchIO blast result, so it 
>>doesn't help me. How can I make another physical copy of this blast 
>>result object? Seems like a simple thing but how to do it is escaping me.
> 
> 
> Not really a good idea, and it may not work anyway if the object 
> contains a filehandle. But for a simple object you might recursively 
> loop through the data structure and copy each element out into a similar 
> data structure.
> 
> 
> 
>>But better yet, the way to go is to 'reset the counter,' or to find a 
>>way to look at/print/sort the results without removing data from the 
>>blast result object. How is this done though??
> 
> 
> It would be rather nice if this worked:
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);
> 
> Finally, the most sensible way (assuming bioperl provides no methods of 
> its own for this) of solving the problem is, the first time you go 
> through each next_result, next_hit and next_hsp, just store the returned 
> objects in an array of arrays of arrays. Then the second time get the 
> objects from your array structure instead of with the method calls.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jelenaob at gmail.com  Thu Jun  1 11:45:49 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Thu, 1 Jun 2006 08:45:49 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <200606011140.38726.lstein@cshl.edu>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
	<200606011140.38726.lstein@cshl.edu>
Message-ID: <5042a62b0606010845u79a5d5b3h131c4ed54f90fee3@mail.gmail.com>

Thanks Lincoln.

I figure out the solution just after I post a question, Murpfy's law ... but
my post left hanging in my email ... :(

The problem is in CGI->img method.

Instead of  print $cgi->img({-src=>$url,-usemap=>"#$mapname"});

I should have used: rint $cgi->img({-src=>$url,-usemap=>"#$mapname",
-border=>undef});

Thanks anyways for your help.

Cheers,

Jelena

On 6/1/06, Lincoln Stein <lstein at cshl.edu> wrote:
>
> Hi,
>
> The border is coming from the HTML <img. To get rid of it, set -border=>0
> in
> the img() call.
>
> Lincoln
>
>
>
> On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> > Hello everybody,
> >
> > does anybody know how to remove the background color of the Panel.
> > Currently, I am not adding anything to it, so I can troubleshot the
> > problem, and I have tried setting up
> > all color attributes I could find to the panel, but no luck. Whatever I
> do,
> > I get the BLUE border of the panel.
> >
> > Has anybody faced the same problem?
> >
> > Thanks in advance,
> >
> > Jelena
> >
> > And here is the code I am currently using:
> >
> >
> ---------------------------------------------------------------------------
> >-------------------------------- my $panel =
> >     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
> >                               -width => 800,
> >                               -pad_left => 10,
> >                               -pad_right => 10,
> >                               -key_color => 'white',
> >                               -bgcolor => 'white',
> >                               -gridcolor=>'black',
> >                               -fgcolor => 'black',
> >                               -grid => 0,
> >                               );
> >    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url'
> ,
> >      -url  => '/tmpimages');
> >    #make clickable image
> >    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
> >    print $map;
> >
> >
> ---------------------------------------------------------------------------
> >--------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>


From osborne1 at optonline.net  Thu Jun  1 15:36:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:36:27 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>
Message-ID: <C0A4B9FB.8A71%osborne1@optonline.net>

Chris,

Right - how would this be done?

Brian O.


On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> I still like the idea of just having a simple conversion from wiki->txt
> direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 15:44:13 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:44:13 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <C0A4BBCD.8A74%osborne1@optonline.net>

Jay,

You asked about the doc/ directory. The only directory I see in my
bioperl-live/doc directory is examples/, the reason this remains is that it
contains scripts and images related to the Graphics HOWTO, in theory these
could be moved to the Wiki and the examples/ directory deleted. One
explanation for why you see doc/html and all those other dirs is that you
aren't using the 'cvs -d' option (there are other explanations) when you
update.

If examples/ is removed then presumably the README can be removed and
makedoc.pl moved elsewhere.

Brian O.


On 5/31/06 9:54 PM, "Jay Hannah" <jay at jays.net> wrote:

> Brian Osborne wrote:
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
> 
> We certainly wouldn't want to try to maintain two copies, one POD one in wiki.
> That would be the worst of all options. One option that hasn't been mentioned
> yet is to keep maintenance of that in POD in the distro (leaving the cool
> runability alone), and then flag that document as unchangeable in the wiki
> with a note on top "Maintenance of this document is done in POD in the distro.
> Submit POD patches to bioperl-l and we'll re-post an updated copy to this
> wiki."
> 
> Just a thought.
> 
>> - What do we do with the script part of bptutorial.pl? It certainly could be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?
> 
> doc/faq        (empty?)
> doc/howto      
> doc/howto/examples
> doc/howto/figs (empty?)
> doc/howto/html (empty?)
> doc/howto/pdf  (empty?)
> doc/howto/sgml (empty?)
> doc/howto/txt  (empty?)
> doc/howto/xml  (empty?)
> 
> Does all that stuff officially live in and is being changed in the wiki, never
> to return to the distro?
> 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
>> Jay, looks like there are still some weird formatting issues with the
>> bptutorial wiki page, something which I ran into before when getting the
>> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
>> spaces preceding a line denotes code for some reason).  Not much you can do
>> in these cases except remove the extra spaces in those spots.  Looking good
>> though!  
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know nothing
> about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran
> off to work. :)
> 
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should
>> point to the Tutorials page instead of the Bptutorial.pl page please let
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials
> 
> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on the
> wiki but static?
> 
> Why isn't the short "Current events" just listed on the top of the "News"
> page?
> 
> Sick of my endless questions yet? -grin-
> 
> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun  1 15:47:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 14:47:40 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4B9FB.8A71%osborne1@optonline.net>
Message-ID: <001301c685b4$3dbfb820$15327e82@pyrimidine>

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 2:36 PM
> To: Chris Fields; 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> Right - how would this be done?

I'll look into a few of the wiki converters, there are a few things that
claim to convert wiki to other formats (and vice versa).  It may not be
direct, though.  I'll post anything if I figure something out.

Chris
 
> Brian O.
> 
> 
> On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > I still like the idea of just having a simple conversion from wiki->txt
> > direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 15:45:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:45:39 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E73F5.40403@jays.net>
Message-ID: <C0A4BC23.8A75%osborne1@optonline.net>

Jay,

Yes, good idea, thank you for volunteering.

Brian O.


On 6/1/06 12:58 AM, "Jay Hannah" <jay at jays.net> wrote:

> I hereby volunteer to strip the code out of bptutorial.pl and put it wherever.
> Where should I put it when I'm done? (examples/tutorial.pl?)


From hubert.prielinger at gmx.at  Thu Jun  1 16:33:45 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 01 Jun 2006 14:33:45 -0600
Subject: [Bioperl-l] remoteblast xml problem
Message-ID: <447F4F29.9070600@gmx.at>

hi,
I have the following program and it worked quite well, for retrieving 
remoteblast results in a textfile,
now I have altered it to to xml, and it didn't work anymore.....
it takes all the parameter at the commandline, submits the query, but I 
don't retrieve any results file anymore.....

it seems that it hangs in a endless loop......
the only output I get is:  $rc is not a ref! over and over..... it 
doesn't enter the else term anymore....

every help is appreciated, thanks in advance


#!/usr/bin/perl -w

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use IO::String;
use Bio::SearchIO;


#use lib qw(/usr/local/bioperl/bioperl-1.5.1);

print "Please insert database:\t";
my $db_STD = <STDIN>;
chomp $db_STD;

print "Please insert matrix:\t";
my $matrix_STD = <STDIN>;
chomp $matrix_STD;

print "Please insert count:\t";
my $count_STD = <STDIN>;
chomp $count_STD;

print "Please insert gapcosts:\t";
my $gapcosts_STD = <STDIN>;
chomp $gapcosts_STD;

my $prog   = 'blastp';
my $db     = $db_STD;           
my $e_val  = '20000';
my $matrix = $matrix_STD;               
my $wordSize = '2';


my @data;
my $line_dataArray;
my $rid;
my $count = $count_STD;           
my @params = (
  '-prog'   => $prog,
  '-data'   => $db,
  '-expect' => $e_val,
  '-MATRIX_NAME' => $matrix,
  '-readmethod' => 'xml',
  '-WORD_SIZE' => $wordSize,
);

my $seqio_obj = Bio::SeqIO->new(
  -file   => "aloneblosum62.txt",
  -format => "raw",
);

print "entering blast....";

my $xmlFactory = Bio::Tools::Run::RemoteBlast->new(@params);


$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
    $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = 
$gapcosts_STD;                   
    $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000';
     $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'ALIGNMENTS'} = '1000';
    $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML';
   

print "Blast entered successfully \n";

while ( my $query = $seqio_obj->next_seq ) {
  print "submit Sequence...just do it....\n";
 
  my $r = $xmlFactory->submit_blast($query);
  print $query->seq;
  print "\n";
 
 
#    sleep 30;

  # Wait for the reply and save the output file
  print "entering while loop for saving Output.... \n";
 
  while ( my @rids = $xmlFactory->each_rid ) {
      foreach my $rid (@rids) {
           
          my $rc = $xmlFactory->retrieve_blast($rid);
          if ( !ref($rc) ) {
              print '$rc is not a ref!', "\n";
              if ( $rc < 0 ) {
                  print "Remove rid ...\n";
                  $xmlFactory->remove_rid($rid);
              }
              # sleep 5;
          }
          else {

              print "retrieved Results successfully \n";
              print $rid;
              print "\n";
              my $filename = "comp80swiss$count.xml";
              $xmlFactory->save_output($filename);
              print "File saved successfully \n";
              my $checkinput = $xmlFactory->file;
              open(my $fh,"<$checkinput") or die $!;
              while(<$fh>){
                print;
              }
              close $fh;
              $count++;
              $xmlFactory->remove_rid($rid);
          }
      }
      print "\n";
      print "\n";

  }
}


From emmanuel.quevillon at versailles.inra.fr  Thu Jun  1 17:15:42 2006
From: emmanuel.quevillon at versailles.inra.fr (Emmanuel Quevillon)
Date: Thu, 01 Jun 2006 23:15:42 +0200
Subject: [Bioperl-l] How to submit new module?
Message-ID: <447F58FE.7020603@versailles.inra.fr>

Hi,

I just created some new parsers for TargetP, TandemRepeatFinder and
RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
like to know the differents steps procedure to submit them to BioPerl
and to be integrated in the next release (I hope)?
Is there any documentation about it?

Thanks

-- 
Emmanuel

---------------------------------------------------------------------
Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>

INRA-URGI / Bayer CropScience
523 Place des Terrasses             http://www.infobiogen.fr
91000 EVRY                          http://urgi.infobiogen.fr
Tel : 01 60 87 37 42                http://www.bayercropscience.com

PGP public key server : http://pgp.mit.edu/
Key ID : 0x0B84357F
---------------------------------------------------------------------


From cjfields at uiuc.edu  Thu Jun  1 17:36:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:36:05 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447F3BA7.9030500@cornell.edu>
Message-ID: <001b01c685c3$63840070$15327e82@pyrimidine>

Genevieve, 

seek() won't work here; all the file IO is handled through Bio::Root::IO
methods.  The SearchIO system is set up like an XML SAX parser so if you
want to save objects as they come you'll have to store the object refs in an
array, like so:

my @hsps;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      while ($hsp = $hit->next_hsp) {
         push @hsps, $hsp;
      }
   }
}

Or similarly with hits: 

my @hits;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      push @hits, $hit;
   }
}

Or you could use more complex data structures (array of arrays) as Sendu
suggested.  You should be able to sort like anything else by calling methods
within the sort:

# total number of hsps
my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;

# if you really like your accessions in alphabetical order
my @sorted = sort {$a->accession cmp $b->accession} @hits;

Then if you wanted to print later you could sort based on something else,
like the score:

my @sort_score = sort {$a->score <=> $b->score} @hits;

So you would end up with something like the following subroutines:

sub sort_results{
   my $report = shift;
   while($result = $report->next_result()){
      while(my $hit = $result->next_hit()){
         push @hits, $hit;
      }
   }
   my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;
   print $_->accession,"\t",$_->num_hsps,"\n" for @sorted;
}

sub print_blast_results{
   my $report = shift;
   my @sort_score = sort {$a->score <=> $b->score} @hits;
   for my $h (@sort_score) {
      while (my $hsp = $h->next_hsp) {
         # might use something else here like hit->name or accession,
         # not sure what you want
         my $q_name = $hsp->seq_id; 
         print join(", ",$q_name,$h->name,$hsp->bits)."\n";
         }
   }
}


Just so you know, I couldn't get display_id or display_name to work when
using the Bio::Search::HSP::GenericHSP object.  Your results may vary.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Genevieve DeClerck
> Sent: Thursday, June 01, 2006 2:11 PM
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Problem solved, albeit, in a slightly hacky way.
> 
> I tried to make seek() work for a good long while with the SearchIO
> blast results object, but I just couldn't get it to work. (Probably b/c
> seek wants to see a genuine file handle-- not a SearchIO filehandle.) I
> used SearchIO's fh() to get the handle and could while(<$fh>) through
> the data but when I used seek($fh,0,0) to reset the cursor position in
> the handle in prep for another loop, i got an error complaining about my
> use of seek() by indicating that "SEEK" could not be found in Seekable.pm.
> 
> I concluded that it was not going to be possible and instead made an
> array if SeqFeature objects which contain all the relevant blast output
> data (i.e. the m8/hit table stuff).
> 
> It still seems unfortunate that one can't reuse the SearchIO object for
> cases when the SearchIO blast report needs to be accessed mltiple times.
> 
> Thanks for your help,
> Genevieve
> 
> 
> 
> Sendu Bala wrote:
> 
> > Genevieve DeClerck wrote:
> >
> >>Thanks for your comment Sendu, it was very helpful. I think this must be
> >>what's going on.. I am using $blast_report->next_result in both
> >>subroutines. It appears that analyzing the blast results first w/ my
> >>sort subroutine empties (?) the $blast_result object so that when I try
> >>to print, there is nothing left to print. (and visa-versa when I print
> >>first then try to sort).
> >>So, from the looks of things, using next_result has the effect of
> >>popping the Bio::Search::Result::ResultI objects off of the SearchIO
> >>blast report object??
> >
> >
> > Not quite. It's more or less exactly like opening a file and then trying
> > to read it all twice like this:
> > open(FILE, "file");
> > while (<FILE>) {
> >      print # prints each line in the file
> > }
> > while (<FILE>) {
> >      print # never happens, we never enter this while loop
> > }
> >
> > To get the second while loop to print anything we need to say seek(FILE,
> > 0, 0) before it. Or in the first while loop store each line in an array,
> > and then make the second loop a foreach through that array.
> >
> >
> >
> >>It seems I could get around this by making a copy of the blast report by
> >>setting it to another new variable...(not the most elegant solution) but
> >>I'm having trouble with this...
> >>
> >>If I do:
> >>
> >>    my $blast_report_copy = $blast_report;
> >>
> >>I'm just copying the reference to the SearchIO blast result, so it
> >>doesn't help me. How can I make another physical copy of this blast
> >>result object? Seems like a simple thing but how to do it is escaping
> me.
> >
> >
> > Not really a good idea, and it may not work anyway if the object
> > contains a filehandle. But for a simple object you might recursively
> > loop through the data structure and copy each element out into a similar
> > data structure.
> >
> >
> >
> >>But better yet, the way to go is to 'reset the counter,' or to find a
> >>way to look at/print/sort the results without removing data from the
> >>blast result object. How is this done though??
> >
> >
> > It would be rather nice if this worked:
> > my $blast_report = $factory->blastall($ref_seq_objs);
> > my $blast_fh = $blast_report->fh();
> > while (<$blast_fh>) {
> >      # $_ is a ResultI object, use as normal
> > }
> > seek($blast_fh, 0, 0); # this would be great, but does it work?
> > while <$blast_fh>) {
> >      # go through the results again in your second subroutine
> > }
> >
> > An alternative hacky way of doing it, which may also not work, would be
> > to go through your $blast_report as normal, but then before going
> > through it a second time, say
> > my $fh = $blast_report->_fh;
> > seek($fh, 0, 0);
> >
> > Finally, the most sensible way (assuming bioperl provides no methods of
> > its own for this) of solving the problem is, the first time you go
> > through each next_result, next_hit and next_hsp, just store the returned
> > objects in an array of arrays of arrays. Then the second time get the
> > objects from your array structure instead of with the method calls.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 17:49:30 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 16:49:30 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
References: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <447F60EA.1050608@campus.iztacala.unam.mx>

Hi Emmanuel,

Take a look into the BioPerl FAQ:

http://bioperl.org/wiki/FAQ

It contains some info that will guide you through the appropriate steps 
depending on your situation.

Regards,
Mauricio.

Emmanuel Quevillon wrote:
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 17:47:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:47:11 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <001c01c685c4$f01e7550$15327e82@pyrimidine>

The Bioperl FAQ on the wiki answers this:

http://www.bioperl.org/wiki/FAQ#I.27ve_got_an_idea_for_a_module_how_do_I_con
tribute_it.3F

Basically, you've already done the first step, but you might want to
resubmit the email in a different form, with something about "New parsers
for TargetP, TandemRepeatFinder and RepeatMasker" in the Subject line to get
more input about those from the users-at-large.  

BTW, there is already a Bio::Tools::RepeatMasker, so you should check it out
to make sure there isn't any redundancy between your version and the
bioperl-live version.  The developers may be reluctant to replace the
bioperl-live version with yours to prevent API problems with end users,
unless you provide some serious justification (like the current one is
broken, not complete, etc).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Emmanuel Quevillon
> Sent: Thursday, June 01, 2006 4:16 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] How to submit new module?
> 
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 
> --
> Emmanuel
> 
> ---------------------------------------------------------------------
> Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>
> 
> INRA-URGI / Bayer CropScience
> 523 Place des Terrasses             http://www.infobiogen.fr
> 91000 EVRY                          http://urgi.infobiogen.fr
> Tel : 01 60 87 37 42                http://www.bayercropscience.com
> 
> PGP public key server : http://pgp.mit.edu/
> Key ID : 0x0B84357F
> ---------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  2 03:52:07 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 2 Jun 2006 09:52:07 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <001201c685a3$59d78da0$15327e82@pyrimidine>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
Message-ID: <200606020952.08034.heikki@sanbi.ac.za>

I've started going through the files that have 'return undef' lines.
I'll report back later.

Initial impression is that there are a few cases where the context indicates 
list to be returned but failure returns an explicit undef. I'll fix those.

Most of the cases are much more ambiguous. Even when documentation says the 
failure returns undef, it is clearly meant to mean false. In most cases 
documentation does not comment on return value at all. Luckily the context is 
almost always scalar and therefore it does not matter too much.

I seem to be changing 'return undef' to plain 'return' a bit overzealously, so 
do not take it personally.

	-Heikki

On Thursday 01 June 2006 19:46, Chris Fields wrote:
> ....
>
> > > Again, didn't do that.
> >
> > I'm very sorry that I allowed the ambiguity, but my comments were
> > certainly not directed at your recent changes to Bio::Restriction::IO.
> > In fact, I put in the above * comment to exclude your changes from my
> > discussion; you changed the docs because the code never did what they
> > said they did (the docs were bad). That's fine (good!). My comments were
> > a general point, slightly directed at the idea of changing all the
> > return undef;s - changing the code so that it no longer matches the docs
> > of a previously working method. That's what I think is bad. Though in
> > this particular case it shouldn't make any difference at all.
>
> Agreed.  In any case, if tests have been properly set up then they should
> catch problems.  This is, of course, if they are properly set up.
>
> Chris
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Fri Jun  2 05:04:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 02 Jun 2006 10:04:18 +0100
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447F4F29.9070600@gmx.at>
References: <447F4F29.9070600@gmx.at>
Message-ID: <447FFF12.506@mrc-dunn.cam.ac.uk>

Hubert Prielinger wrote:
> hi,
> I have the following program and it worked quite well, for retrieving 
> remoteblast results in a textfile,
> now I have altered it to to xml, and it didn't work anymore.....
> it takes all the parameter at the commandline, submits the query, but I 
> don't retrieve any results file anymore.....
> 
> it seems that it hangs in a endless loop......
> the only output I get is:  $rc is not a ref! over and over..... it 
> doesn't enter the else term anymore....

There is no problem with your code. The problem is with the NCBI server 
and should be reported to them. You can visit the site and do a blast, 
requesting xml format, and you will typically get one normal 'waiting' 
message and the promise that it will be updated in x seconds, but 
subsequent attempts to get progress information result in an xml error 
page because the NCBI server doesn't actually send any data.

Unfortunately the way that the bioperl code is written, it treats no 
data as 'waiting' instead of an error. I've offered a patch to fix this 
at this bug page:
http://bugzilla.bioperl.org/show_bug.cgi?id=2015


From cjfields at uiuc.edu  Fri Jun  2 10:30:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 09:30:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447FFF12.506@mrc-dunn.cam.ac.uk>
Message-ID: <001a01c68651$12925250$15327e82@pyrimidine>

Sendu, Hubert,


Hubert, your code looks fine so Sendu's patch should fix the problem (break
out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
works.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, June 02, 2006 4:04 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> Hubert Prielinger wrote:
> > hi,
> > I have the following program and it worked quite well, for retrieving
> > remoteblast results in a textfile,
> > now I have altered it to to xml, and it didn't work anymore.....
> > it takes all the parameter at the commandline, submits the query, but I
> > don't retrieve any results file anymore.....
> >
> > it seems that it hangs in a endless loop......
> > the only output I get is:  $rc is not a ref! over and over..... it
> > doesn't enter the else term anymore....
> 
> There is no problem with your code. The problem is with the NCBI server
> and should be reported to them. You can visit the site and do a blast,
> requesting xml format, and you will typically get one normal 'waiting'
> message and the promise that it will be updated in x seconds, but
> subsequent attempts to get progress information result in an xml error
> page because the NCBI server doesn't actually send any data.
> 
> Unfortunately the way that the bioperl code is written, it treats no
> data as 'waiting' instead of an error. I've offered a patch to fix this
> at this bug page:
> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  2 15:13:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 14:13:31 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
Message-ID: <000301c68678$a3cdaa40$15327e82@pyrimidine>

Heikki,

I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
when running AlignIO.t (I was fixing bug 2000):

http://bugzilla.open-bio.org/show_bug.cgi?id=2016

Not sure what's going on there but using read_aln and write_aln seem to work
normally.  It may have something to do with Bio::SimpleAlign but I'm not
absolutely sure.

Any ideas what may be going on here?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Fri Jun  2 17:11:41 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 15:11:41 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <001a01c68651$12925250$15327e82@pyrimidine>
References: <001a01c68651$12925250$15327e82@pyrimidine>
Message-ID: <4480A98D.6010501@gmx.at>

hi,
sorry, but I have updated the remoteblast module and I have run several 
attempts with the same results as before. It didn't work.
I didn't get any results.

regards
Hubert


Chris Fields wrote:
> Sendu, Hubert,
>
>
> Hubert, your code looks fine so Sendu's patch should fix the problem (break
> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
> works.  
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Friday, June 02, 2006 4:04 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> Hubert Prielinger wrote:
>>     
>>> hi,
>>> I have the following program and it worked quite well, for retrieving
>>> remoteblast results in a textfile,
>>> now I have altered it to to xml, and it didn't work anymore.....
>>> it takes all the parameter at the commandline, submits the query, but I
>>> don't retrieve any results file anymore.....
>>>
>>> it seems that it hangs in a endless loop......
>>> the only output I get is:  $rc is not a ref! over and over..... it
>>> doesn't enter the else term anymore....
>>>       
>> There is no problem with your code. The problem is with the NCBI server
>> and should be reported to them. You can visit the site and do a blast,
>> requesting xml format, and you will typically get one normal 'waiting'
>> message and the promise that it will be updated in x seconds, but
>> subsequent attempts to get progress information result in an xml error
>> page because the NCBI server doesn't actually send any data.
>>
>> Unfortunately the way that the bioperl code is written, it treats no
>> data as 'waiting' instead of an error. I've offered a patch to fix this
>> at this bug page:
>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri Jun  2 17:54:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 16:54:20 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480A98D.6010501@gmx.at>
Message-ID: <000001c6868f$1b68dbe0$15327e82@pyrimidine>

Hubert, 

Could you post this on bugzilla with your script and test data so I can try
to replicate you error?  I may not get to it until Monday.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, June 02, 2006 4:12 PM
> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> sorry, but I have updated the remoteblast module and I have run several
> attempts with the same results as before. It didn't work.
> I didn't get any results.
> 
> regards
> Hubert
> 
> 
> Chris Fields wrote:
> > Sendu, Hubert,
> >
> >
> > Hubert, your code looks fine so Sendu's patch should fix the problem
> (break
> > out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
> CVS;
> > it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
> it
> > works.
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >> Sent: Friday, June 02, 2006 4:04 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>
> >> Hubert Prielinger wrote:
> >>
> >>> hi,
> >>> I have the following program and it worked quite well, for retrieving
> >>> remoteblast results in a textfile,
> >>> now I have altered it to to xml, and it didn't work anymore.....
> >>> it takes all the parameter at the commandline, submits the query, but
> I
> >>> don't retrieve any results file anymore.....
> >>>
> >>> it seems that it hangs in a endless loop......
> >>> the only output I get is:  $rc is not a ref! over and over..... it
> >>> doesn't enter the else term anymore....
> >>>
> >> There is no problem with your code. The problem is with the NCBI server
> >> and should be reported to them. You can visit the site and do a blast,
> >> requesting xml format, and you will typically get one normal 'waiting'
> >> message and the promise that it will be updated in x seconds, but
> >> subsequent attempts to get progress information result in an xml error
> >> page because the NCBI server doesn't actually send any data.
> >>
> >> Unfortunately the way that the bioperl code is written, it treats no
> >> data as 'waiting' instead of an error. I've offered a patch to fix this
> >> at this bug page:
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri Jun  2 19:19:40 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 17:19:40 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <000001c68691$8c4eeb40$15327e82@pyrimidine>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
Message-ID: <4480C78C.1000701@gmx.at>

hi,
I have submitted the bug -> Bug 2017
with the script and input file, just start it from command line

thank you very much
greetings

Hubert

Chris Fields wrote:
> Hubert,
>
> I have a script that's using blastxml and XML output which seems to work.
> I'll try looking at it to get a better idea this weekend.
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, June 02, 2006 4:12 PM
>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> sorry, but I have updated the remoteblast module and I have run several
>> attempts with the same results as before. It didn't work.
>> I didn't get any results.
>>
>> regards
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Sendu, Hubert,
>>>
>>>
>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>       
>> (break
>>     
>>> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
>>>       
>> CVS;
>>     
>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
>>>       
>> it
>>     
>>> works.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> Hubert Prielinger wrote:
>>>>
>>>>         
>>>>> hi,
>>>>> I have the following program and it worked quite well, for retrieving
>>>>> remoteblast results in a textfile,
>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>> it takes all the parameter at the commandline, submits the query, but
>>>>>           
>> I
>>     
>>>>> don't retrieve any results file anymore.....
>>>>>
>>>>> it seems that it hangs in a endless loop......
>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>> doesn't enter the else term anymore....
>>>>>
>>>>>           
>>>> There is no problem with your code. The problem is with the NCBI server
>>>> and should be reported to them. You can visit the site and do a blast,
>>>> requesting xml format, and you will typically get one normal 'waiting'
>>>> message and the promise that it will be updated in x seconds, but
>>>> subsequent attempts to get progress information result in an xml error
>>>> page because the NCBI server doesn't actually send any data.
>>>>
>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>> data as 'waiting' instead of an error. I've offered a patch to fix this
>>>> at this bug page:
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at uiuc.edu  Fri Jun  2 20:33:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:33:48 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480C78C.1000701@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
Message-ID: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>

You need to add the input conditions as well (you have several  
<STDIN> lines which may play a role; I would like to know what you  
normally enter for those).

How long did you let the script run?  I ran a quick check on your  
sequences; you have almost 1600, so you have to expect that you'll  
run into some problems here!  Most here (including me) would suggest  
you try installing a local blast setup for something like this.

Chris

On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:

> hi,
> I have submitted the bug -> Bug 2017
> with the script and input file, just start it from command line
>
> thank you very much
> greetings
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> I have a script that's using blastxml and XML output which seems  
>> to work.
>> I'll try looking at it to get a better idea this weekend.
>>
>> Chris
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>> Sent: Friday, June 02, 2006 4:12 PM
>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>
>>> hi,
>>> sorry, but I have updated the remoteblast module and I have run  
>>> several
>>> attempts with the same results as before. It didn't work.
>>> I didn't get any results.
>>>
>>> regards
>>> Hubert
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> Sendu, Hubert,
>>>>
>>>>
>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>> problem
>>>>
>>> (break
>>>
>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>> RemoteBlast in
>>>>
>>> CVS;
>>>
>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to  
>>>> see if
>>>>
>>> it
>>>
>>>> works.
>>>>
>>>> Chris
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi,
>>>>>> I have the following program and it worked quite well, for  
>>>>>> retrieving
>>>>>> remoteblast results in a textfile,
>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>> query, but
>>>>>>
>>> I
>>>
>>>>>> don't retrieve any results file anymore.....
>>>>>>
>>>>>> it seems that it hangs in a endless loop......
>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>> over..... it
>>>>>> doesn't enter the else term anymore....
>>>>>>
>>>>>>
>>>>> There is no problem with your code. The problem is with the  
>>>>> NCBI server
>>>>> and should be reported to them. You can visit the site and do a  
>>>>> blast,
>>>>> requesting xml format, and you will typically get one normal  
>>>>> 'waiting'
>>>>> message and the promise that it will be updated in x seconds, but
>>>>> subsequent attempts to get progress information result in an  
>>>>> xml error
>>>>> page because the NCBI server doesn't actually send any data.
>>>>>
>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>> treats no
>>>>> data as 'waiting' instead of an error. I've offered a patch to  
>>>>> fix this
>>>>> at this bug page:
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Fri Jun  2 20:49:15 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 18:49:15 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
Message-ID: <4480DC8B.7070005@gmx.at>

hi,
input database: swissprot
         matrix: pam30
         count: 1
         gapcosts: 9 1

I know that there are  a lot of sequences, but that doesn't matter, you 
can delete all of them except one, the amount of the sequences is not 
the problem, the script reads one line and submits it.....then the 
second line and so on.....I have tried it with only one sequence either 
and I got the same result.... the script run at that time for more than 
20 minutes!!!!!! .....and that should be enough time to retrieve the 
results for ONE sequence, I guess

regards
Hubert


Chris Fields wrote:
> You need to add the input conditions as well (you have several <STDIN> 
> lines which may play a role; I would like to know what you normally 
> enter for those).
>
> How long did you let the script run?  I ran a quick check on your 
> sequences; you have almost 1600, so you have to expect that you'll run 
> into some problems here!  Most here (including me) would suggest you 
> try installing a local blast setup for something like this.
>
> Chris
>
> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>
>> hi,
>> I have submitted the bug -> Bug 2017
>> with the script and input file, just start it from command line
>>
>> thank you very much
>> greetings
>>
>> Hubert
>>
>> Chris Fields wrote:
>>> Hubert,
>>>
>>> I have a script that's using blastxml and XML output which seems to 
>>> work.
>>> I'll try looking at it to get a better idea this weekend.
>>>
>>> Chris
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> hi,
>>>> sorry, but I have updated the remoteblast module and I have run 
>>>> several
>>>> attempts with the same results as before. It didn't work.
>>>> I didn't get any results.
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>> Sendu, Hubert,
>>>>>
>>>>>
>>>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>>>
>>>> (break
>>>>
>>>>> out of that infinite loop).  I applied Sendu's patch to 
>>>>> RemoteBlast in
>>>>>
>>>> CVS;
>>>>
>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to 
>>>>> see if
>>>>>
>>>> it
>>>>
>>>>> works.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> Hubert Prielinger wrote:
>>>>>>
>>>>>>
>>>>>>> hi,
>>>>>>> I have the following program and it worked quite well, for 
>>>>>>> retrieving
>>>>>>> remoteblast results in a textfile,
>>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>>> it takes all the parameter at the commandline, submits the 
>>>>>>> query, but
>>>>>>>
>>>> I
>>>>
>>>>>>> don't retrieve any results file anymore.....
>>>>>>>
>>>>>>> it seems that it hangs in a endless loop......
>>>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>>>> doesn't enter the else term anymore....
>>>>>>>
>>>>>>>
>>>>>> There is no problem with your code. The problem is with the NCBI 
>>>>>> server
>>>>>> and should be reported to them. You can visit the site and do a 
>>>>>> blast,
>>>>>> requesting xml format, and you will typically get one normal 
>>>>>> 'waiting'
>>>>>> message and the promise that it will be updated in x seconds, but
>>>>>> subsequent attempts to get progress information result in an xml 
>>>>>> error
>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>
>>>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>>>> data as 'waiting' instead of an error. I've offered a patch to 
>>>>>> fix this
>>>>>> at this bug page:
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Fri Jun  2 20:57:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:57:37 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480DC8B.7070005@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
Message-ID: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>

Yes, I see the same error you do.  But I have a similar script  
(blastp, XML blast report, XML parsing, similar loop structure) that  
works fine.  I'm trying to dissect the problem but I think it may be  
something logically wrong here (something not so obvious) and not a  
bug...

What I'm trying to say is, when you send sequences using remoteblast  
like, this you are essentially spamming the NCBI BLAST server with  
~1600 requests.  This script wasn't set up with that intent in mind;  
you should really try to set up your own local blast database if  
possible.  If you can't, try running this script in off-hours  
(10pm-6am EST or something like that).


Chris

On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:

> hi,
> input database: swissprot
>         matrix: pam30
>         count: 1
>         gapcosts: 9 1
>
> I know that there are  a lot of sequences, but that doesn't matter,  
> you can delete all of them except one, the amount of the sequences  
> is not the problem, the script reads one line and submits  
> it.....then the second line and so on.....I have tried it with only  
> one sequence either and I got the same result.... the script run at  
> that time for more than 20 minutes!!!!!! .....and that should be  
> enough time to retrieve the results for ONE sequence, I guess
>
> regards
> Hubert
>
>
>
> Chris Fields wrote:
>> You need to add the input conditions as well (you have several  
>> <STDIN> lines which may play a role; I would like to know what you  
>> normally enter for those).
>>
>> How long did you let the script run?  I ran a quick check on your  
>> sequences; you have almost 1600, so you have to expect that you'll  
>> run into some problems here!  Most here (including me) would  
>> suggest you try installing a local blast setup for something like  
>> this.
>>
>> Chris
>>
>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I have submitted the bug -> Bug 2017
>>> with the script and input file, just start it from command line
>>>
>>> thank you very much
>>> greetings
>>>
>>> Hubert
>>>
>>> Chris Fields wrote:
>>>> Hubert,
>>>>
>>>> I have a script that's using blastxml and XML output which seems  
>>>> to work.
>>>> I'll try looking at it to get a better idea this weekend.
>>>>
>>>> Chris
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>> Bala'
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> hi,
>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>> several
>>>>> attempts with the same results as before. It didn't work.
>>>>> I didn't get any results.
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Sendu, Hubert,
>>>>>>
>>>>>>
>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>> problem
>>>>>>
>>>>> (break
>>>>>
>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>> RemoteBlast in
>>>>>>
>>>>> CVS;
>>>>>
>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>> to see if
>>>>>>
>>>>> it
>>>>>
>>>>>> works.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>> hi,
>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>> retrieving
>>>>>>>> remoteblast results in a textfile,
>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>> anymore.....
>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>> query, but
>>>>>>>>
>>>>> I
>>>>>
>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>
>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>> over..... it
>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>
>>>>>>>>
>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>> NCBI server
>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>> a blast,
>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>> 'waiting'
>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>> but
>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>> xml error
>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>
>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>> treats no
>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>> to fix this
>>>>>>> at this bug page:
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Fri Jun  2 21:36:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 19:36:42 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
Message-ID: <4480E7AA.3020603@gmx.at>

hi chris,
thanks but I never intended to run the remoteblast with so much, only a 
few of them, acutally I goal is to run the phiblast with regular 
expression, so that i just don't need that
file anymore.

another question for parsing the xml output....is there a xml parser 
available for blast xml output or how to start.....
I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
I'm not sure how to start....sorry, I guess I'm too stupid....
is their maybe another introduction or an example.

thanks
Hubert


Chris Fields wrote:
> Yes, I see the same error you do.  But I have a similar script  
> (blastp, XML blast report, XML parsing, similar loop structure) that  
> works fine.  I'm trying to dissect the problem but I think it may be  
> something logically wrong here (something not so obvious) and not a  
> bug...
>
> What I'm trying to say is, when you send sequences using remoteblast  
> like, this you are essentially spamming the NCBI BLAST server with  
> ~1600 requests.  This script wasn't set up with that intent in mind;  
> you should really try to set up your own local blast database if  
> possible.  If you can't, try running this script in off-hours  
> (10pm-6am EST or something like that).
>
>
> Chris
>
> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>
>   
>> hi,
>> input database: swissprot
>>         matrix: pam30
>>         count: 1
>>         gapcosts: 9 1
>>
>> I know that there are  a lot of sequences, but that doesn't matter,  
>> you can delete all of them except one, the amount of the sequences  
>> is not the problem, the script reads one line and submits  
>> it.....then the second line and so on.....I have tried it with only  
>> one sequence either and I got the same result.... the script run at  
>> that time for more than 20 minutes!!!!!! .....and that should be  
>> enough time to retrieve the results for ONE sequence, I guess
>>
>> regards
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> You need to add the input conditions as well (you have several  
>>> <STDIN> lines which may play a role; I would like to know what you  
>>> normally enter for those).
>>>
>>> How long did you let the script run?  I ran a quick check on your  
>>> sequences; you have almost 1600, so you have to expect that you'll  
>>> run into some problems here!  Most here (including me) would  
>>> suggest you try installing a local blast setup for something like  
>>> this.
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>
>>>       
>>>> hi,
>>>> I have submitted the bug -> Bug 2017
>>>> with the script and input file, just start it from command line
>>>>
>>>> thank you very much
>>>> greetings
>>>>
>>>> Hubert
>>>>
>>>> Chris Fields wrote:
>>>>         
>>>>> Hubert,
>>>>>
>>>>> I have a script that's using blastxml and XML output which seems  
>>>>> to work.
>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>>> Bala'
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> hi,
>>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>>> several
>>>>>> attempts with the same results as before. It didn't work.
>>>>>> I didn't get any results.
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Sendu, Hubert,
>>>>>>>
>>>>>>>
>>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>>> problem
>>>>>>>
>>>>>>>               
>>>>>> (break
>>>>>>
>>>>>>             
>>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>>> RemoteBlast in
>>>>>>>
>>>>>>>               
>>>>>> CVS;
>>>>>>
>>>>>>             
>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>>> to see if
>>>>>>>
>>>>>>>               
>>>>>> it
>>>>>>
>>>>>>             
>>>>>>> works.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> hi,
>>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>>> retrieving
>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>>> anymore.....
>>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>>> query, but
>>>>>>>>>
>>>>>>>>>                   
>>>>>> I
>>>>>>
>>>>>>             
>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>
>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>>> over..... it
>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>>> NCBI server
>>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>>> a blast,
>>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>>> 'waiting'
>>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>>> but
>>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>>> xml error
>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>
>>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>>> treats no
>>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>>> to fix this
>>>>>>>> at this bug page:
>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Sat Jun  3 00:35:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 23:35:21 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>


On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:

> hi chris,
> thanks but I never intended to run the remoteblast with so much,  
> only a few of them, acutally I goal is to run the phiblast with  
> regular expression, so that i just don't need that
> file anymore

Not a problem.  Just to let you know, I did manage to get the script  
working, so I'm marking the bug INVALID.  I think the problem isn't  
that there is an infinite loop so much as setting composition-based  
statistics causes the search to take much much longer; try removing  
that line to see what I mean.

Just so you know, using $result->query_name doesn't get you what you  
would expect (it gives you a part of the RID, which you don't want;  
this is something in the XML output that is beyond our control).  You  
might want to change it to something else or you'll get filenames  
with numerical names.

> another question for parsing the xml output....is there a xml  
> parser available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
> but I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

Bio::SearchIO objects are used to parse BLAST XML output if you have  
it saved to a file.  For instance:

my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');

while (my $result = $factory->next_result) {
   while (my $hit = $result->next_hit) {
      while (my $hsp = $hit->next_hsp {
         #do stuff here
       }
    }
}

The only thing that changes in parsing a text BLAST report from an  
XML BLAST report is the -format line (similar to the -readmethod  
parameter in RemoteBlast).  You shouldn't need to look up any more  
documentation other than these on the wiki:

http://www.bioperl.org/wiki/HOWTO:SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml

Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
up parsing.

Chris

> thanks
> Hubert
>
>
> Chris Fields wrote:
>> Yes, I see the same error you do.  But I have a similar script   
>> (blastp, XML blast report, XML parsing, similar loop structure)  
>> that  works fine.  I'm trying to dissect the problem but I think  
>> it may be  something logically wrong here (something not so  
>> obvious) and not a  bug...
>>
>> What I'm trying to say is, when you send sequences using  
>> remoteblast  like, this you are essentially spamming the NCBI  
>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>> that intent in mind;  you should really try to set up your own  
>> local blast database if  possible.  If you can't, try running this  
>> script in off-hours  (10pm-6am EST or something like that).
>>
>>
>> Chris
>>
>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>
>>
>>> hi,
>>> input database: swissprot
>>>         matrix: pam30
>>>         count: 1
>>>         gapcosts: 9 1
>>>
>>> I know that there are  a lot of sequences, but that doesn't  
>>> matter,  you can delete all of them except one, the amount of the  
>>> sequences  is not the problem, the script reads one line and  
>>> submits  it.....then the second line and so on.....I have tried  
>>> it with only  one sequence either and I got the same result....  
>>> the script run at  that time for more than 20  
>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>> the results for ONE sequence, I guess
>>>
>>> regards
>>> Hubert
>>>
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> You need to add the input conditions as well (you have several   
>>>> <STDIN> lines which may play a role; I would like to know what  
>>>> you  normally enter for those).
>>>>
>>>> How long did you let the script run?  I ran a quick check on  
>>>> your  sequences; you have almost 1600, so you have to expect  
>>>> that you'll  run into some problems here!  Most here (including  
>>>> me) would  suggest you try installing a local blast setup for  
>>>> something like  this.
>>>>
>>>> Chris
>>>>
>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi,
>>>>> I have submitted the bug -> Bug 2017
>>>>> with the script and input file, just start it from command line
>>>>>
>>>>> thank you very much
>>>>> greetings
>>>>>
>>>>> Hubert
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Hubert,
>>>>>>
>>>>>> I have a script that's using blastxml and XML output which  
>>>>>> seems  to work.
>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>> 'Sendu  Bala'
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> hi,
>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>> run  several
>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>> I didn't get any results.
>>>>>>>
>>>>>>> regards
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Sendu, Hubert,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>> the  problem
>>>>>>>>
>>>>>>>>
>>>>>>> (break
>>>>>>>
>>>>>>>
>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>> RemoteBlast in
>>>>>>>>
>>>>>>>>
>>>>>>> CVS;
>>>>>>>
>>>>>>>
>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>> CVS  to see if
>>>>>>>>
>>>>>>>>
>>>>>>> it
>>>>>>>
>>>>>>>
>>>>>>>> works.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>
>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>> for  retrieving
>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>> anymore.....
>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>> the  query, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>
>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>
>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>> over..... it
>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>> the  NCBI server
>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>> do  a blast,
>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>> normal  'waiting'
>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>> seconds,  but
>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>> an  xml error
>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>
>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>> treats no
>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>> patch  to fix this
>>>>>>>>> at this bug page:
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 11:10:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:10:51 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149084373.447da2d5c5339@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
	<1149084373.447da2d5c5339@128.91.55.38>
Message-ID: <9206E0B2-15DC-4AB2-B71B-5EA9D1D11AEC@duke.edu>

The bootstrap is stored as the node ID because that is a limitation  
of the newick format, there isn't a formal way to distinguish  
internal IDs from bootstraps.  There are several differents ways that  
programs encode the internal ID and a bootstrap value in that one  
slot - we try and parse it out if the the bootstrap is stored in  
brackets like INTERNALID[BOOTSTRAP].

Formats like nhx explicitly solve this problem, but most programs  
only use the simple newick.  if you know your data it is a simple  
procedure to move the internal ID data into the bootstrap slot.

in terms of ignoreoverwrite you just need to send in a second  
parameter which is true
$node->add_Descendent($childnode, 1);

-jason


On May 31, 2006, at 10:06 AM, Lucia Peixoto wrote:

> Hi
> Thanks
> a couple more questions
> why is the bootstrap value stored as the node id? Is that right?
>
> also, in the add_descendant method, how do you set the  
> $ignoreoverwrite
> parameter to true?
>
> Lucia
>
> Quoting Jason Stajich <jason.stajich at duke.edu>:
>
>> you need to special case the root - it won't have an ancestor.  just
>> protect the my $parent = $node->ancestor with an if statement as I
>> did below
>>
>> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>>
>>> Hi
>>> OK that was silly, but what I have in my code is what you just wrote
>>> But the problem is that if I write
>>>
>>> $parent->add_Descendent($child)
>>>
>>> it tells me that I am calling  the method "ass_Descendent" on an
>>> undefined value
>>> (but I did define $parent before??)
>>>
>>> So here it goes the code so far:
>>>
>>> use Bio::TreeIO;
>>>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>>>                           -format => 'newick');
>>>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>>>                            -format => 'newick');
>>>  while( my $tree = $in->next_tree ) {
>>>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes 
>>> () ) {
>>>     my $bootstrap=$node->_creation_id;
>>>
>>>     if ($bootstrap < 70 ){
>>>>>> if(        my $parent = $node->ancestor ) {
>>>               my @children=$node->get_all_Descendents;
>>>               foreach my $child (@children){
>>>                  $parent->add_Descendent($child);
>>>               }
>>          }
>>>
>>> ........
>>>
>>> eventually I'll add (once I assigned the children to the parent
>>> succesfully):
>>> $tree->remove_Node($node);
>>>
>>>         }
>>>     }
>>>     $out->write_tree($tree);
>>> }
>>>
>>> Quoting aaron.j.mackey at gsk.com:
>>>
>>>>> foreach $child (@children){
>>>>>          $parent=add_Descendent->$child;
>>>>> }
>>>>
>>>> I think what you want is $parent->add_Descendent($child)
>>>>
>>>> -Aaron
>>>>
>>>
>>>
>>> Lucia Peixoto
>>> Department of Biology,SAS
>>> University of Pennsylvania
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sat Jun  3 11:29:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:29:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>

you can get all the Hits or hsps with the following method:
my @hits = $result->hits;
my @hsps = $hit->hsps;


You can also reset the counter since these implementations are in- 
memory and already parsed (and not a stream processor per se).   
next_XX just iterates through the list stored in the parent object.

$result->rewind;

   and

$hit->rewind;


For example, the rewind needs to be called if you want to use a  
ResultWriter object and filter some of the values for the final  
writing after first inspecting them.

-jason


On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:

> Thanks for your comment Sendu, it was very helpful. I think this  
> must be
> what's going on.. I am using $blast_report->next_result in both
> subroutines. It appears that analyzing the blast results first w/ my
> sort subroutine empties (?) the $blast_result object so that when I  
> try
> to print, there is nothing left to print. (and visa-versa when I print
> first then try to sort).
> So, from the looks of things, using next_result has the effect of
> popping the Bio::Search::Result::ResultI objects off of the SearchIO
> blast report object??
>
> It seems I could get around this by making a copy of the blast  
> report by
> setting it to another new variable...(not the most elegant  
> solution) but
> I'm having trouble with this...
>
> If I do:
>
> 	my $blast_report_copy = $blast_report;
>
> I'm just copying the reference to the SearchIO blast result, so it
> doesn't help me. How can I make another physical copy of this blast
> result object? Seems like a simple thing but how to do it is  
> escaping me.
>
> But better yet, the way to go is to 'reset the counter,' or to find a
> way to look at/print/sort the results without removing data from the
> blast result object. How is this done though??
>
> Sendu and Brian, I didn't post the sort_results subroutine because  
> it is
> sprawling, as is a lot of my code. The code I provided was more  
> like an
> aid for my explanation of the problem.. it doesn't actually run -  
> sorry
> for the confusion, I should have more clear on that.  The important
> thing to know perhaps is that both sort_results and  
> print_blast_results
> contain a foreach loop where I am using the 'next_results' method to
> view blast results. (And to clarify for Torsten, the blastall() is
> working just fine - the analysis/viewing of the results object is  
> where
> I am encountering the problem.)
>
>
> Any other ideas would be greatly appreciated...
>
> Thank you,
> Genevieve
>
>
>
>
> Sendu Bala wrote:
>
>> Genevieve DeClerck wrote:
>>
>>> Hi,
>>
>> [snip]
>>
>>> If I've sorted the results the sorted-results will print to screen,
>>> however when I try to print the Hit Table results nothing is  
>>> returned,
>>> as if the blast results have evaporated.... and visa versa, if i
>>> comment out the part where i point my sorting subroutine to the  
>>> blast
>>> results reference,  my hit table results suddenly prints to screen.
>>
>> [snip]
>>
>>> Here's an abbreviated version of my code:
>>
>> [snip]
>>
>>> #######
>>> ### the following 2 actions seem to be mutually exclusive.
>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>> # SeqFeature objs stored in arrays. arrays are then printed
>>> # to stdout
>>> &sort_results($blast_report);
>>>
>>> # 2) print blast results
>>> &print_blast_results($blast_report);
>>
>>
>>> sub print_blast_results{
>>>    my $report = shift;
>>>    while(my $result = $report->next_result()){
>>
>> [snip]
>>
>> You didn't give us your sort_results subroutine, but is it as  
>> simple as
>> they both use $report->next_result (and/or $result->next_hit), but  
>> you
>> don't reset the internal counter back to the start, so the second
>> subroutine tries to get the next_result and finds the first  
>> subroutine
>> has already looked at the last result and so next_result returns  
>> false?
>>
>>  From a quick look it wasn't obvious how to reset the counter.  
>> Hopefully
>> this can be done and someone else knows how.
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun  3 15:13:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 3 Jun 2006 14:13:22 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
Message-ID: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>

Nice!  Didn't know I could do that.  Maybe we should add some of this  
to the HOWTO (or is it already in there?).

Chris

On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:

> you can get all the Hits or hsps with the following method:
> my @hits = $result->hits;
> my @hsps = $hit->hsps;
>
>
> You can also reset the counter since these implementations are in-
> memory and already parsed (and not a stream processor per se).
> next_XX just iterates through the list stored in the parent object.
>
> $result->rewind;
>
>    and
>
> $hit->rewind;
>
>
> For example, the rewind needs to be called if you want to use a
> ResultWriter object and filter some of the values for the final
> writing after first inspecting them.
>
> -jason
>
>
> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>
>> Thanks for your comment Sendu, it was very helpful. I think this
>> must be
>> what's going on.. I am using $blast_report->next_result in both
>> subroutines. It appears that analyzing the blast results first w/ my
>> sort subroutine empties (?) the $blast_result object so that when I
>> try
>> to print, there is nothing left to print. (and visa-versa when I  
>> print
>> first then try to sort).
>> So, from the looks of things, using next_result has the effect of
>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>> blast report object??
>>
>> It seems I could get around this by making a copy of the blast
>> report by
>> setting it to another new variable...(not the most elegant
>> solution) but
>> I'm having trouble with this...
>>
>> If I do:
>>
>> 	my $blast_report_copy = $blast_report;
>>
>> I'm just copying the reference to the SearchIO blast result, so it
>> doesn't help me. How can I make another physical copy of this blast
>> result object? Seems like a simple thing but how to do it is
>> escaping me.
>>
>> But better yet, the way to go is to 'reset the counter,' or to find a
>> way to look at/print/sort the results without removing data from the
>> blast result object. How is this done though??
>>
>> Sendu and Brian, I didn't post the sort_results subroutine because
>> it is
>> sprawling, as is a lot of my code. The code I provided was more
>> like an
>> aid for my explanation of the problem.. it doesn't actually run -
>> sorry
>> for the confusion, I should have more clear on that.  The important
>> thing to know perhaps is that both sort_results and
>> print_blast_results
>> contain a foreach loop where I am using the 'next_results' method to
>> view blast results. (And to clarify for Torsten, the blastall() is
>> working just fine - the analysis/viewing of the results object is
>> where
>> I am encountering the problem.)
>>
>>
>> Any other ideas would be greatly appreciated...
>>
>> Thank you,
>> Genevieve
>>
>>
>>
>>
>> Sendu Bala wrote:
>>
>>> Genevieve DeClerck wrote:
>>>
>>>> Hi,
>>>
>>> [snip]
>>>
>>>> If I've sorted the results the sorted-results will print to screen,
>>>> however when I try to print the Hit Table results nothing is
>>>> returned,
>>>> as if the blast results have evaporated.... and visa versa, if i
>>>> comment out the part where i point my sorting subroutine to the
>>>> blast
>>>> results reference,  my hit table results suddenly prints to screen.
>>>
>>> [snip]
>>>
>>>> Here's an abbreviated version of my code:
>>>
>>> [snip]
>>>
>>>> #######
>>>> ### the following 2 actions seem to be mutually exclusive.
>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>> # to stdout
>>>> &sort_results($blast_report);
>>>>
>>>> # 2) print blast results
>>>> &print_blast_results($blast_report);
>>>
>>>
>>>> sub print_blast_results{
>>>>    my $report = shift;
>>>>    while(my $result = $report->next_result()){
>>>
>>> [snip]
>>>
>>> You didn't give us your sort_results subroutine, but is it as
>>> simple as
>>> they both use $report->next_result (and/or $result->next_hit), but
>>> you
>>> don't reset the internal counter back to the start, so the second
>>> subroutine tries to get the next_result and finds the first
>>> subroutine
>>> has already looked at the last result and so next_result returns
>>> false?
>>>
>>>  From a quick look it wasn't obvious how to reset the counter.
>>> Hopefully
>>> this can be done and someone else knows how.
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 15:31:59 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 15:31:59 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
Message-ID: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>

In the HOWTO hits() and hsps() were there, I just added rewind in the  
table of methods.
If someone wanted to write a little section in the HOWTO about  
resetting the iterator that would be great.

-jason
On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:

> Nice!  Didn't know I could do that.  Maybe we should add some of this
> to the HOWTO (or is it already in there?).
>
> Chris
>
> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>
>> you can get all the Hits or hsps with the following method:
>> my @hits = $result->hits;
>> my @hsps = $hit->hsps;
>>
>>
>> You can also reset the counter since these implementations are in-
>> memory and already parsed (and not a stream processor per se).
>> next_XX just iterates through the list stored in the parent object.
>>
>> $result->rewind;
>>
>>    and
>>
>> $hit->rewind;
>>
>>
>> For example, the rewind needs to be called if you want to use a
>> ResultWriter object and filter some of the values for the final
>> writing after first inspecting them.
>>
>> -jason
>>
>>
>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>
>>> Thanks for your comment Sendu, it was very helpful. I think this
>>> must be
>>> what's going on.. I am using $blast_report->next_result in both
>>> subroutines. It appears that analyzing the blast results first w/ my
>>> sort subroutine empties (?) the $blast_result object so that when I
>>> try
>>> to print, there is nothing left to print. (and visa-versa when I
>>> print
>>> first then try to sort).
>>> So, from the looks of things, using next_result has the effect of
>>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>>> blast report object??
>>>
>>> It seems I could get around this by making a copy of the blast
>>> report by
>>> setting it to another new variable...(not the most elegant
>>> solution) but
>>> I'm having trouble with this...
>>>
>>> If I do:
>>>
>>> 	my $blast_report_copy = $blast_report;
>>>
>>> I'm just copying the reference to the SearchIO blast result, so it
>>> doesn't help me. How can I make another physical copy of this blast
>>> result object? Seems like a simple thing but how to do it is
>>> escaping me.
>>>
>>> But better yet, the way to go is to 'reset the counter,' or to  
>>> find a
>>> way to look at/print/sort the results without removing data from the
>>> blast result object. How is this done though??
>>>
>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>> it is
>>> sprawling, as is a lot of my code. The code I provided was more
>>> like an
>>> aid for my explanation of the problem.. it doesn't actually run -
>>> sorry
>>> for the confusion, I should have more clear on that.  The important
>>> thing to know perhaps is that both sort_results and
>>> print_blast_results
>>> contain a foreach loop where I am using the 'next_results' method to
>>> view blast results. (And to clarify for Torsten, the blastall() is
>>> working just fine - the analysis/viewing of the results object is
>>> where
>>> I am encountering the problem.)
>>>
>>>
>>> Any other ideas would be greatly appreciated...
>>>
>>> Thank you,
>>> Genevieve
>>>
>>>
>>>
>>>
>>> Sendu Bala wrote:
>>>
>>>> Genevieve DeClerck wrote:
>>>>
>>>>> Hi,
>>>>
>>>> [snip]
>>>>
>>>>> If I've sorted the results the sorted-results will print to  
>>>>> screen,
>>>>> however when I try to print the Hit Table results nothing is
>>>>> returned,
>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>> comment out the part where i point my sorting subroutine to the
>>>>> blast
>>>>> results reference,  my hit table results suddenly prints to  
>>>>> screen.
>>>>
>>>> [snip]
>>>>
>>>>> Here's an abbreviated version of my code:
>>>>
>>>> [snip]
>>>>
>>>>> #######
>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>> # to stdout
>>>>> &sort_results($blast_report);
>>>>>
>>>>> # 2) print blast results
>>>>> &print_blast_results($blast_report);
>>>>
>>>>
>>>>> sub print_blast_results{
>>>>>    my $report = shift;
>>>>>    while(my $result = $report->next_result()){
>>>>
>>>> [snip]
>>>>
>>>> You didn't give us your sort_results subroutine, but is it as
>>>> simple as
>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>> you
>>>> don't reset the internal counter back to the start, so the second
>>>> subroutine tries to get the next_result and finds the first
>>>> subroutine
>>>> has already looked at the last result and so next_result returns
>>>> false?
>>>>
>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>> Hopefully
>>>> this can be done and someone else knows how.
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Sat Jun  3 19:54:20 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 04 Jun 2006 09:54:20 +1000
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <4482212C.3000908@infotech.monash.edu.au>

Hubert,

> another question for parsing the xml output....is there a xml parser 
> available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
> I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

I think we already answered this question for you on 20 May 2006:

http://bioperl.org/pipermail/bioperl-l/2006-May/021574.html
http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#How_to_parse_BLAST_XML_output

http://www.bioperl.org/wiki/HOWTO:SearchIO (search for "blastxml")

--Torsten Seemann


From cjfields at uiuc.edu  Sun Jun  4 01:17:46 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 00:17:46 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
Message-ID: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>

There's an interesting addition to this I found while checking this  
out; looks like if you use:

my @hits =  $result->hits;

to get all the hits, you don't need to use '$result->rewind'.  The  
rewind method resets the iterator for the hit list back back to the  
beginning, but using the hits method to grab all the hits doesn't use  
the iterator at all.  This works either pre- or post-iteration  
through the Hit::BlastHit objects.

Another thing; Genevieve was passing the SearchIO report object (i.e.  
the parser object which was returned from StandAloneBlast,  
$blast_report) to the methods, not the  
Bio::Search::Result::BlastResult object; looks like there was some  
confusion between the two object types since she refers to the report  
as the result object when it's actually the SearchIO parser object.   
So, once the parser was passed into the first method, a result object  
was generated, then destroyed.  When entering the second method, the  
parser had already read parsed the report and generated the objects,  
so it ended with no output.

Though passing the BlastResult object is better since one should only  
have to parse the report once and use the objects, for curiosity's  
sake, is there a method to rewind the parser itself (in other words,  
read through the report again)?

Chris


On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:

> In the HOWTO hits() and hsps() were there, I just added rewind in the
> table of methods.
> If someone wanted to write a little section in the HOWTO about
> resetting the iterator that would be great.
>
> -jason
> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>
>> Nice!  Didn't know I could do that.  Maybe we should add some of this
>> to the HOWTO (or is it already in there?).
>>
>> Chris
>>
>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>
>>> you can get all the Hits or hsps with the following method:
>>> my @hits = $result->hits;
>>> my @hsps = $hit->hsps;
>>>
>>>
>>> You can also reset the counter since these implementations are in-
>>> memory and already parsed (and not a stream processor per se).
>>> next_XX just iterates through the list stored in the parent object.
>>>
>>> $result->rewind;
>>>
>>>    and
>>>
>>> $hit->rewind;
>>>
>>>
>>> For example, the rewind needs to be called if you want to use a
>>> ResultWriter object and filter some of the values for the final
>>> writing after first inspecting them.
>>>
>>> -jason
>>>
>>>
>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>
>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>> must be
>>>> what's going on.. I am using $blast_report->next_result in both
>>>> subroutines. It appears that analyzing the blast results first  
>>>> w/ my
>>>> sort subroutine empties (?) the $blast_result object so that when I
>>>> try
>>>> to print, there is nothing left to print. (and visa-versa when I
>>>> print
>>>> first then try to sort).
>>>> So, from the looks of things, using next_result has the effect of
>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>> SearchIO
>>>> blast report object??
>>>>
>>>> It seems I could get around this by making a copy of the blast
>>>> report by
>>>> setting it to another new variable...(not the most elegant
>>>> solution) but
>>>> I'm having trouble with this...
>>>>
>>>> If I do:
>>>>
>>>> 	my $blast_report_copy = $blast_report;
>>>>
>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>> doesn't help me. How can I make another physical copy of this blast
>>>> result object? Seems like a simple thing but how to do it is
>>>> escaping me.
>>>>
>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>> find a
>>>> way to look at/print/sort the results without removing data from  
>>>> the
>>>> blast result object. How is this done though??
>>>>
>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>> it is
>>>> sprawling, as is a lot of my code. The code I provided was more
>>>> like an
>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>> sorry
>>>> for the confusion, I should have more clear on that.  The important
>>>> thing to know perhaps is that both sort_results and
>>>> print_blast_results
>>>> contain a foreach loop where I am using the 'next_results'  
>>>> method to
>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>> working just fine - the analysis/viewing of the results object is
>>>> where
>>>> I am encountering the problem.)
>>>>
>>>>
>>>> Any other ideas would be greatly appreciated...
>>>>
>>>> Thank you,
>>>> Genevieve
>>>>
>>>>
>>>>
>>>>
>>>> Sendu Bala wrote:
>>>>
>>>>> Genevieve DeClerck wrote:
>>>>>
>>>>>> Hi,
>>>>>
>>>>> [snip]
>>>>>
>>>>>> If I've sorted the results the sorted-results will print to
>>>>>> screen,
>>>>>> however when I try to print the Hit Table results nothing is
>>>>>> returned,
>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>> blast
>>>>>> results reference,  my hit table results suddenly prints to
>>>>>> screen.
>>>>>
>>>>> [snip]
>>>>>
>>>>>> Here's an abbreviated version of my code:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> #######
>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>> # to stdout
>>>>>> &sort_results($blast_report);
>>>>>>
>>>>>> # 2) print blast results
>>>>>> &print_blast_results($blast_report);
>>>>>
>>>>>
>>>>>> sub print_blast_results{
>>>>>>    my $report = shift;
>>>>>>    while(my $result = $report->next_result()){
>>>>>
>>>>> [snip]
>>>>>
>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>> simple as
>>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>>> you
>>>>> don't reset the internal counter back to the start, so the second
>>>>> subroutine tries to get the next_result and finds the first
>>>>> subroutine
>>>>> has already looked at the last result and so next_result returns
>>>>> false?
>>>>>
>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>> Hopefully
>>>>> this can be done and someone else knows how.
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sun Jun  4 10:08:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 4 Jun 2006 10:08:29 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
Message-ID: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>

right - you don't need rewind if you aren't going to use the iterator  
(next_XXX) -- we provide two different ways to get access to the data.
you can do
for my $hit ( $result->hits ) {

}
or
while( my $hit = $result->next_hit ) {
}


If you want to rewind the parser then (assuming you are using a  
filestream and not a data stream from the web or zcat or something)  
just reset the filehandle
seek($searchio->_fh, 0);

but then you'll have to re-parse everything and pay that cost twice -  
it makes more sense to me to just save the results and put them in  
list if you are going to deliberately make two passes over all the  
results.    You either pay the cost of memory (keeping all the  
objects) or time (reparse the results).


-jason
On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:

> There's an interesting addition to this I found while checking this  
> out; looks like if you use:
>
> my @hits =  $result->hits;
>
> to get all the hits, you don't need to use '$result->rewind'.  The  
> rewind method resets the iterator for the hit list back back to the  
> beginning, but using the hits method to grab all the hits doesn't  
> use the iterator at all.  This works either pre- or post-iteration  
> through the Hit::BlastHit objects.
>
> Another thing; Genevieve was passing the SearchIO report object  
> (i.e. the parser object which was returned from StandAloneBlast,  
> $blast_report) to the methods, not the  
> Bio::Search::Result::BlastResult object; looks like there was some  
> confusion between the two object types since she refers to the  
> report as the result object when it's actually the SearchIO parser  
> object.  So, once the parser was passed into the first method, a  
> result object was generated, then destroyed.  When entering the  
> second method, the parser had already read parsed the report and  
> generated the objects, so it ended with no output.
>
> Though passing the BlastResult object is better since one should  
> only have to parse the report once and use the objects, for  
> curiosity's sake, is there a method to rewind the parser itself (in  
> other words, read through the report again)?
>
> Chris
>
>
> On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:
>
>> In the HOWTO hits() and hsps() were there, I just added rewind in the
>> table of methods.
>> If someone wanted to write a little section in the HOWTO about
>> resetting the iterator that would be great.
>>
>> -jason
>> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>>
>>> Nice!  Didn't know I could do that.  Maybe we should add some of  
>>> this
>>> to the HOWTO (or is it already in there?).
>>>
>>> Chris
>>>
>>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>>
>>>> you can get all the Hits or hsps with the following method:
>>>> my @hits = $result->hits;
>>>> my @hsps = $hit->hsps;
>>>>
>>>>
>>>> You can also reset the counter since these implementations are in-
>>>> memory and already parsed (and not a stream processor per se).
>>>> next_XX just iterates through the list stored in the parent object.
>>>>
>>>> $result->rewind;
>>>>
>>>>    and
>>>>
>>>> $hit->rewind;
>>>>
>>>>
>>>> For example, the rewind needs to be called if you want to use a
>>>> ResultWriter object and filter some of the values for the final
>>>> writing after first inspecting them.
>>>>
>>>> -jason
>>>>
>>>>
>>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>>
>>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>>> must be
>>>>> what's going on.. I am using $blast_report->next_result in both
>>>>> subroutines. It appears that analyzing the blast results first  
>>>>> w/ my
>>>>> sort subroutine empties (?) the $blast_result object so that  
>>>>> when I
>>>>> try
>>>>> to print, there is nothing left to print. (and visa-versa when I
>>>>> print
>>>>> first then try to sort).
>>>>> So, from the looks of things, using next_result has the effect of
>>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>>> SearchIO
>>>>> blast report object??
>>>>>
>>>>> It seems I could get around this by making a copy of the blast
>>>>> report by
>>>>> setting it to another new variable...(not the most elegant
>>>>> solution) but
>>>>> I'm having trouble with this...
>>>>>
>>>>> If I do:
>>>>>
>>>>> 	my $blast_report_copy = $blast_report;
>>>>>
>>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>>> doesn't help me. How can I make another physical copy of this  
>>>>> blast
>>>>> result object? Seems like a simple thing but how to do it is
>>>>> escaping me.
>>>>>
>>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>>> find a
>>>>> way to look at/print/sort the results without removing data  
>>>>> from the
>>>>> blast result object. How is this done though??
>>>>>
>>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>>> it is
>>>>> sprawling, as is a lot of my code. The code I provided was more
>>>>> like an
>>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>>> sorry
>>>>> for the confusion, I should have more clear on that.  The  
>>>>> important
>>>>> thing to know perhaps is that both sort_results and
>>>>> print_blast_results
>>>>> contain a foreach loop where I am using the 'next_results'  
>>>>> method to
>>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>>> working just fine - the analysis/viewing of the results object is
>>>>> where
>>>>> I am encountering the problem.)
>>>>>
>>>>>
>>>>> Any other ideas would be greatly appreciated...
>>>>>
>>>>> Thank you,
>>>>> Genevieve
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sendu Bala wrote:
>>>>>
>>>>>> Genevieve DeClerck wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> If I've sorted the results the sorted-results will print to
>>>>>>> screen,
>>>>>>> however when I try to print the Hit Table results nothing is
>>>>>>> returned,
>>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>>> blast
>>>>>>> results reference,  my hit table results suddenly prints to
>>>>>>> screen.
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> Here's an abbreviated version of my code:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> #######
>>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>>> # to stdout
>>>>>>> &sort_results($blast_report);
>>>>>>>
>>>>>>> # 2) print blast results
>>>>>>> &print_blast_results($blast_report);
>>>>>>
>>>>>>
>>>>>>> sub print_blast_results{
>>>>>>>    my $report = shift;
>>>>>>>    while(my $result = $report->next_result()){
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>>> simple as
>>>>>> they both use $report->next_result (and/or $result->next_hit),  
>>>>>> but
>>>>>> you
>>>>>> don't reset the internal counter back to the start, so the second
>>>>>> subroutine tries to get the next_result and finds the first
>>>>>> subroutine
>>>>>> has already looked at the last result and so next_result returns
>>>>>> false?
>>>>>>
>>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>>> Hopefully
>>>>>> this can be done and someone else knows how.
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From cjfields at uiuc.edu  Sun Jun  4 11:51:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 10:51:53 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
Message-ID: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>


On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:

> right - you don't need rewind if you aren't going to use the  
> iterator (next_XXX) -- we provide two different ways to get access  
> to the data.
> you can do
> for my $hit ( $result->hits ) {
>
> }
> or
> while( my $hit = $result->next_hit ) {
> }
>
>
> If you want to rewind the parser then (assuming you are using a  
> filestream and not a data stream from the web or zcat or something)  
> just reset the filehandle
> seek($searchio->_fh, 0);
>
> but then you'll have to re-parse everything and pay that cost twice  
> - it makes more sense to me to just save the results and put them  
> in list if you are going to deliberately make two passes over all  
> the results.    You either pay the cost of memory (keeping all the  
> objects) or time (reparse the results).

I agree there isn't any really good reason to rewind the parser; I  
was mainly just curious how this was accomlished.  Your point about a  
memory or time hit might be a point we want to make in the HOWTO.  I  
already added some example code about rewinding the iterator and  
hits, so I'll add a bit about this.

I think a good deal of confusion here comes from not knowing how  
SearchIO works (i.e. that parsing a report can return several  
results, in turn which can return hits, in tur returning HSP's).  Of  
course that doesn't include iterations in the case of PSI-BLAST.    
The HOWTO, I think, explains this all well so it may be a matter of  
just RTM (I left the 'F' out to be a bit more polite).

Chris

> -jason
> On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:
>
...


Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Mon Jun  5 04:16:59 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 05 Jun 2006 16:16:59 +0800
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
Message-ID: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>


Dear Lincoln and experts

Curently I have a CGI application that does this:

1.  read and uploaded file 
2. check the content of the file whether fasta or not
3. print out the content of the file.


Now the problem I'm facing is that
on step three. The content of the file handled is altered
namely the very first line does not get printed. 

So for example if "test1.fasta" looks like this:

>Seq0
ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC
 
When it was printed it gives only:

ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC

Why is this happening? 

Below is the complete cgi script that 
does the task  I mentioned earlier.

Did I missed out anything in my code?


__BEGIN__
#!/usr/bin/perl -w

use CGI qw/:standard :html3/;
use CGI::Carp qw( fatalsToBrowser );
use Data::Dumper;

BEGIN {
    if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {

        # Blindly untaint.  Taintchecking is to protect
        # from Web data;
        # the environment is under our control.
        eval "use lib '$_';" foreach (
            reverse
            split( /:/, $1 )
        );
    }
}


use Bio::Tools::GuessSeqFormat;

print header,
    start_html('file upload'),
    h1('file upload!');
print_form()    unless param;
print_results() if param;
print end_html;

sub print_form {
    print start_multipart_form(),
       filefield(-name=>'upload',-size=>60),br,
       submit(-label=>'Upload File'),
       end_form;
}

sub print_results {
    my $length;
    my $file = param('upload');
    my $fh_upload = upload('upload');

    my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload );
    my $format_upload  = $guesser_upload->guess;

    if ( !$file ) {
        print "No file uploaded.";
        return;
    }
    print h2('File name'),      $file;
    print h2('Format'), $format_upload;
    print h2('The content is'),br;

    while (<$fh_upload>) {

     # The very first line of the file is not get printed here
     # Why?

        print;
        print br;
        $length += length($_);
    }
    print h2('File length'), $length;
}


__END__

Hope to hear from you again.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 05:02:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:02:48 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <4483F338.7090909@mrc-dunn.cam.ac.uk>

Wijaya Edward wrote:
> Dear Lincoln and experts
> 
> Curently I have a CGI application that does this:
> 
> 1.  read and uploaded file 
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
> 
> 
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed. 

The problem is almost certainly that the guessing is done by reading the 
first line of the filehandle, so that your subsequent while loop on that 
same filehandle starts at the second line.
Just seek the filehandle back to the start before trying to print the 
contents out.

...
my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
my $format_upload  = $guesser_upload->guess;
seek($fh_upload, 0, 0);
...
while (<$fh_upload>) {
     ...
}

An alternative might be to pass GuessSeqFormat the filename in which 
case it would make its own filehandle and close it, leaving your own 
filehandle untouched.


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 05:57:52 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:57:52 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
Message-ID: <44840020.4020604@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
> 
>> If you want to rewind the parser then (assuming you are using a 
>> filestream and not a data stream from the web or zcat or something) 
>> just reset the filehandle
>> seek($searchio->_fh, 0);
>>
>> but then you'll have to re-parse everything and pay that cost twice - 
>> it makes more sense to me to just save the results and put them in 
>> list if you are going to deliberately make two passes over all the 
>> results.    You either pay the cost of memory (keeping all the 
>> objects) or time (reparse the results).
> 
> I agree there isn't any really good reason to rewind the parser; I was 
> mainly just curious how this was accomlished.

Didn't you already explain why seeking a SearchIO wouldn't work? And 
indeed, didn't Genevieve already try to do this after I suggested it and 
  found that it didn't work?

Confused...


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 09:19:12 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 14:19:12 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
Message-ID: <44842F50.7090408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> 
> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
> 
>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>> indeed, didn't Genevieve already try to do this after I suggested it and
>> found that it didn't work?
>>
>> Confused...
>>
> There is an internal _rewind if you are using the next_XX methods that 
> resets the internal iterator (all the data has already been parsed).
> 
> You >>can<< reseek the internal filehandle (accessible by calling 
> $object->_fh ), but you can't call seek on the searchio object itsself.

... poor choice of words on my part. Or maybe I'm not understanding 
you... I already suggested to Genevieve that she try:

# in the following, $blast_report is a SearchIO
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);

She reported that neither way of doing it worked. You seem to be saying 
that at least the second way should have. Is that right?
rewind() would of course be preferable, I just wanted to know if my 
assumption about seek working was correct or not.


From jason at bioperl.org  Mon Jun  5 09:45:40 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:45:40 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44842F50.7090408@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
Message-ID: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>

It depends on how you have run StandAloneBlast -- if the stream you  
are dealing with is not a file, but a datastream as in the STDOUT  
from BLAST, then the seek won't work (as it wouldn't work for a zcat  
on gzipped file).  I think the default StandAloneBlast behavior is to  
operate on a STDOUT stream so seeking won't work no matter what.


On Jun 5, 2006, at 9:19 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
>>
>>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>>> indeed, didn't Genevieve already try to do this after I suggested  
>>> it and
>>> found that it didn't work?
>>>
>>> Confused...
>>>
>> There is an internal _rewind if you are using the next_XX methods  
>> that
>> resets the internal iterator (all the data has already been parsed).
>>
>> You >>can<< reseek the internal filehandle (accessible by calling
>> $object->_fh ), but you can't call seek on the searchio object  
>> itsself.
>
> ... poor choice of words on my part. Or maybe I'm not understanding
> you... I already suggested to Genevieve that she try:
>
> # in the following, $blast_report is a SearchIO
>> my $blast_report = $factory->blastall($ref_seq_objs);
>> my $blast_fh = $blast_report->fh();
>> while (<$blast_fh>) {
>>      # $_ is a ResultI object, use as normal
>> }
>> seek($blast_fh, 0, 0); # this would be great, but does it work?
>> while <$blast_fh>) {
>>      # go through the results again in your second subroutine
>> }
>>
>> An alternative hacky way of doing it, which may also not work,  
>> would be
>> to go through your $blast_report as normal, but then before going
>> through it a second time, say
>> my $fh = $blast_report->_fh;
>> seek($fh, 0, 0);
>
> She reported that neither way of doing it worked. You seem to be  
> saying
> that at least the second way should have. Is that right?
> rewind() would of course be preferable, I just wanted to know if my
> assumption about seek working was correct or not.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 10:13:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 15:13:03 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
	<E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
Message-ID: <44843BEF.6080609@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> It depends on how you have run StandAloneBlast -- if the stream you are 
> dealing with is not a file, but a datastream as in the STDOUT from 
> BLAST, then the seek won't work (as it wouldn't work for a zcat on 
> gzipped file).  I think the default StandAloneBlast behavior is to 
> operate on a STDOUT stream so seeking won't work no matter what.

As far as I can see, when you say blastall() on a StandAloneBlast, it 
eventually does:

if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
     $blast_obj = Bio::SearchIO->new(-file=>$outfile,
			            -format => 'blast' );
}

So seeking should work? Tools like StandAloneBlast creating temp files 
for their results prior to parsing is actually one of things I don't 
like about the bioperl tool system.


From lstein at cshl.edu  Mon Jun  5 10:51:52 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 5 Jun 2006 10:51:52 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <200606051051.52648.lstein@cshl.edu>

Hi,

From the Synopsis for GuessSeqFormat:

           # To guess the format from an already open filehandle:
           my $guesser = new Bio::Tools::GuessSeqFormat( -fh => $filehandle );
           my $format  = $guesser->guess;
           # If the filehandle is seekable (STDIN isn't), it will be
           # returned to its original position.

The filehandle returned by CGI.pm is not seekable.

Lincoln

On Monday 05 June 2006 04:16, Wijaya Edward wrote:
> Dear Lincoln and experts
>
> Curently I have a CGI application that does this:
>
> 1.  read and uploaded file
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
>
>
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed.
>
> So for example if "test1.fasta" looks like this:
> >Seq0
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> When it was printed it gives only:
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> Why is this happening?
>
> Below is the complete cgi script that
> does the task  I mentioned earlier.
>
> Did I missed out anything in my code?
>
>
>
> __BEGIN__
> #!/usr/bin/perl -w
>
> use CGI qw/:standard :html3/;
> use CGI::Carp qw( fatalsToBrowser );
> use Data::Dumper;
>
> BEGIN {
>     if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {
>
>         # Blindly untaint.  Taintchecking is to protect
>         # from Web data;
>         # the environment is under our control.
>         eval "use lib '$_';" foreach (
>             reverse
>             split( /:/, $1 )
>         );
>     }
> }
>
>
> use Bio::Tools::GuessSeqFormat;
>
> print header,
>     start_html('file upload'),
>     h1('file upload!');
> print_form()    unless param;
> print_results() if param;
> print end_html;
>
> sub print_form {
>     print start_multipart_form(),
>        filefield(-name=>'upload',-size=>60),br,
>        submit(-label=>'Upload File'),
>        end_form;
> }
>
> sub print_results {
>     my $length;
>     my $file = param('upload');
>     my $fh_upload = upload('upload');
>
>     my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload
> ); my $format_upload  = $guesser_upload->guess;
>
>     if ( !$file ) {
>         print "No file uploaded.";
>         return;
>     }
>     print h2('File name'),      $file;
>     print h2('Format'), $format_upload;
>     print h2('The content is'),br;
>
>     while (<$fh_upload>) {
>
>      # The very first line of the file is not get printed here
>      # Why?
>
>         print;
>         print br;
>         $length += length($_);
>     }
>     print h2('File length'), $length;
> }
>
>
> __END__
>
> Hope to hear from you again.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the
> intended recipient, please delete it and notify us immediately. Please do
> not copy or use it for any purpose, or disclose its contents to any other
> person. Thank you. --------------------------------------------------------

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060605/0d6f7bb0/attachment-0002.bin>

From cjfields at uiuc.edu  Mon Jun  5 12:30:41 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 11:30:41 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44843BEF.6080609@mrc-dunn.cam.ac.uk>
Message-ID: <006001c688bd$62d48850$15327e82@pyrimidine>

If you want flexibility or added functionality then you can always
contribute a patch, such as adding an option for filehandles, IO::String,
pipes/forks, or whatever you wish.  Or you could suggest such to the module
maintainer, Torsten, and then it's his choice whether he wants to make it a
priority to implement it.  Simply stating this is 'one of things I don't
like about the bioperl tool system' isn't productive here.   It hasn't been
a top priority to implement something along those lines since the module
works for them as is, so if you want these options you'll have to add them,
and add the appropriate tests.

As for the seek issue, the file handle you get by using '$blast_report-fh()'
isn't the raw input file stream but is a tied filehandle of a stream of
ResultI objects:
==================================
Jason's version:
# seek called on the >>internal<< filehandle (from Bio::Root::IO)
# this is the raw data input stream from a file, so should work
seek($searchio->_fh, 0);
==================================
Your version:
# seek called on SearchIO object filehandle
my $blast_report = $factory->blastall($ref_seq_objs);
# this is a tied filehandle for an output stream of objects from SearchIO,
# NOT the raw input stream
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
	# a stream of Bio::Search::Result::BlastResult objects 
} 
# can't use seek on a tied filehandle, won't work unless 
# SEEK class method is implemented (and it's not)
seek($blast_fh, 0, 0); 
==================================

There's a good deal in Programming Perl about tied filehandles.  You'll
notice that Bio::SearchIO implements TIEHANDLE, READLINE, DESTROY, and PRINT
methods, but not SEEK since we've never needed it.  You can always add one
if you want but I really don't see the point based on reasons Jason and I
outlined before.

Seems there is not much overall documentation on newFh or $blast_report->fh,
but I believe it's analogous to the SeqIO version which is covered a bit in
the bptutorial file, now on the wiki:

http://www.bioperl.org/wiki/Bptutorial.pl#III.2.1_Transforming_sequence_file
s_.28SeqIO.29

$in  = Bio::SeqIO->newFh(-file => "inputfilename" ,
                          -format => 'fasta');
$out = Bio::SeqIO->newFh(-format => 'embl');
print $out $_ while <$in>;

Wouldn't hurt if someone wants to add a bit more about these to the SearchIO
HOWTO.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 05, 2006 9:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Jason Stajich wrote:
> > It depends on how you have run StandAloneBlast -- if the stream you are
> > dealing with is not a file, but a datastream as in the STDOUT from
> > BLAST, then the seek won't work (as it wouldn't work for a zcat on
> > gzipped file).  I think the default StandAloneBlast behavior is to
> > operate on a STDOUT stream so seeking won't work no matter what.
> 
> As far as I can see, when you say blastall() on a StandAloneBlast, it
> eventually does:
> 
> if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
>      $blast_obj = Bio::SearchIO->new(-file=>$outfile,
> 			            -format => 'blast' );
> }
> 
> So seeking should work? Tools like StandAloneBlast creating temp files
> for their results prior to parsing is actually one of things I don't
> like about the bioperl tool system.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Jun  5 09:02:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:02:02 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44840020.4020604@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
Message-ID: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>


On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
>>
>>> If you want to rewind the parser then (assuming you are using a
>>> filestream and not a data stream from the web or zcat or something)
>>> just reset the filehandle
>>> seek($searchio->_fh, 0);
>>>
>>> but then you'll have to re-parse everything and pay that cost  
>>> twice -
>>> it makes more sense to me to just save the results and put them in
>>> list if you are going to deliberately make two passes over all the
>>> results.    You either pay the cost of memory (keeping all the
>>> objects) or time (reparse the results).
>>
>> I agree there isn't any really good reason to rewind the parser; I  
>> was
>> mainly just curious how this was accomlished.
>
> Didn't you already explain why seeking a SearchIO wouldn't work? And
> indeed, didn't Genevieve already try to do this after I suggested  
> it and
>   found that it didn't work?
>
> Confused...
>
There is an internal _rewind if you are using the next_XX methods  
that resets the internal iterator (all the data has already been  
parsed).

You >>can<< reseek the internal filehandle (accessible by calling  
$object->_fh ), but you can't call seek on the searchio object itsself.

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 13:23:36 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 18:23:36 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006001c688bd$62d48850$15327e82@pyrimidine>
References: <006001c688bd$62d48850$15327e82@pyrimidine>
Message-ID: <44846898.8020001@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> If you want flexibility or added functionality then you can always
> contribute a patch, such as adding an option for filehandles, IO::String,
> pipes/forks, or whatever you wish.

Well, it wouldn't be a new feature per se, but just changing the way the 
modules work under the hood.


> Or you could suggest such to the module
> maintainer, Torsten, and then it's his choice whether he wants to make it a
> priority to implement it.  Simply stating this is 'one of things I don't
> like about the bioperl tool system' isn't productive here.

Yes, I apologise for that. I had thought too much would need to be 
changed and backward compatibility wouldn't be possible, but just 
changing StandAloneBlast should be possible.

I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... is there any point in working 
up a patch (would something of questionable reliability ever be 
committed into bioperl)?


> As for the seek issue, the file handle you get by using '$blast_report-fh()'
> isn't the raw input file stream but is a tied filehandle of a stream of
> ResultI objects:
> ==================================
> Jason's version:
> # seek called on the >>internal<< filehandle (from Bio::Root::IO)
> # this is the raw data input stream from a file, so should work
> seek($searchio->_fh, 0);
> ==================================
> Your version:
> # seek called on SearchIO object filehandle
> my $blast_report = $factory->blastall($ref_seq_objs);
> # this is a tied filehandle for an output stream of objects from SearchIO,
> # NOT the raw input stream
> my $blast_fh = $blast_report->fh();

For academic interest, how do I get the 'raw input stream'? Wasn't that 
what my second version did?

 > my $fh = $blast_report->_fh;
 > seek($fh, 0, 0);


From hubert.prielinger at gmx.at  Mon Jun  5 14:17:53 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 12:17:53 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>	<4480E7AA.3020603@gmx.at>
	<720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
Message-ID: <44847551.7040705@gmx.at>

hi,
you were right, removing the composition-based statistics solved the 
problem. Now I get the result viewed on STDIN, but it doesn't save the 
output in the file.
I haved tried it by reopening the file and writing it to an other file 
again, but it doesn't work.....
The strange thing is that if I retrieve text instead of xml output it 
works without any problem. Don't know why

Hubert


Chris Fields wrote:
> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>
>   
>> hi chris,
>> thanks but I never intended to run the remoteblast with so much,  
>> only a few of them, acutally I goal is to run the phiblast with  
>> regular expression, so that i just don't need that
>> file anymore
>>     
>
> Not a problem.  Just to let you know, I did manage to get the script  
> working, so I'm marking the bug INVALID.  I think the problem isn't  
> that there is an infinite loop so much as setting composition-based  
> statistics causes the search to take much much longer; try removing  
> that line to see what I mean.
>
> Just so you know, using $result->query_name doesn't get you what you  
> would expect (it gives you a part of the RID, which you don't want;  
> this is something in the XML output that is beyond our control).  You  
> might want to change it to something else or you'll get filenames  
> with numerical names.
>
>   
>> another question for parsing the xml output....is there a xml  
>> parser available for blast xml output or how to start.....
>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>> is their maybe another introduction or an example.
>>     
>
> Bio::SearchIO objects are used to parse BLAST XML output if you have  
> it saved to a file.  For instance:
>
> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>
> while (my $result = $factory->next_result) {
>    while (my $hit = $result->next_hit) {
>       while (my $hsp = $hit->next_hsp {
>          #do stuff here
>        }
>     }
> }
>
> The only thing that changes in parsing a text BLAST report from an  
> XML BLAST report is the -format line (similar to the -readmethod  
> parameter in RemoteBlast).  You shouldn't need to look up any more  
> documentation other than these on the wiki:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>
> Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
> up parsing.
>
> Chris
>
>   
>> thanks
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Yes, I see the same error you do.  But I have a similar script   
>>> (blastp, XML blast report, XML parsing, similar loop structure)  
>>> that  works fine.  I'm trying to dissect the problem but I think  
>>> it may be  something logically wrong here (something not so  
>>> obvious) and not a  bug...
>>>
>>> What I'm trying to say is, when you send sequences using  
>>> remoteblast  like, this you are essentially spamming the NCBI  
>>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>>> that intent in mind;  you should really try to set up your own  
>>> local blast database if  possible.  If you can't, try running this  
>>> script in off-hours  (10pm-6am EST or something like that).
>>>
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi,
>>>> input database: swissprot
>>>>         matrix: pam30
>>>>         count: 1
>>>>         gapcosts: 9 1
>>>>
>>>> I know that there are  a lot of sequences, but that doesn't  
>>>> matter,  you can delete all of them except one, the amount of the  
>>>> sequences  is not the problem, the script reads one line and  
>>>> submits  it.....then the second line and so on.....I have tried  
>>>> it with only  one sequence either and I got the same result....  
>>>> the script run at  that time for more than 20  
>>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>>> the results for ONE sequence, I guess
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> You need to add the input conditions as well (you have several   
>>>>> <STDIN> lines which may play a role; I would like to know what  
>>>>> you  normally enter for those).
>>>>>
>>>>> How long did you let the script run?  I ran a quick check on  
>>>>> your  sequences; you have almost 1600, so you have to expect  
>>>>> that you'll  run into some problems here!  Most here (including  
>>>>> me) would  suggest you try installing a local blast setup for  
>>>>> something like  this.
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> I have submitted the bug -> Bug 2017
>>>>>> with the script and input file, just start it from command line
>>>>>>
>>>>>> thank you very much
>>>>>> greetings
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Hubert,
>>>>>>>
>>>>>>> I have a script that's using blastxml and XML output which  
>>>>>>> seems  to work.
>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>>> 'Sendu  Bala'
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>>> run  several
>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>> I didn't get any results.
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Sendu, Hubert,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>>> the  problem
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> (break
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>>> RemoteBlast in
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> CVS;
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>>> CVS  to see if
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> it
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> hi,
>>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>>> for  retrieving
>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>>> anymore.....
>>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>>> the  query, but
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>> I
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>
>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>>> over..... it
>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>>> the  NCBI server
>>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>>> do  a blast,
>>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>>> normal  'waiting'
>>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>>> seconds,  but
>>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>>> an  xml error
>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>
>>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>>> treats no
>>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>>> patch  to fix this
>>>>>>>>>> at this bug page:
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Mon Jun  5 14:32:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:32:47 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44847551.7040705@gmx.at>
Message-ID: <006101c688ce$7185c330$15327e82@pyrimidine>

Hubert, 

Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
option to save XML was committed relatively recently (last month or so).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Monday, June 05, 2006 1:18 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> you were right, removing the composition-based statistics solved the
> problem. Now I get the result viewed on STDIN, but it doesn't save the
> output in the file.
> I haved tried it by reopening the file and writing it to an other file
> again, but it doesn't work.....
> The strange thing is that if I retrieve text instead of xml output it
> works without any problem. Don't know why
> 
> Hubert
> 
> 
> 
> Chris Fields wrote:
> > On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
> >
> >
> >> hi chris,
> >> thanks but I never intended to run the remoteblast with so much,
> >> only a few of them, acutally I goal is to run the phiblast with
> >> regular expression, so that i just don't need that
> >> file anymore
> >>
> >
> > Not a problem.  Just to let you know, I did manage to get the script
> > working, so I'm marking the bug INVALID.  I think the problem isn't
> > that there is an infinite loop so much as setting composition-based
> > statistics causes the search to take much much longer; try removing
> > that line to see what I mean.
> >
> > Just so you know, using $result->query_name doesn't get you what you
> > would expect (it gives you a part of the RID, which you don't want;
> > this is something in the XML output that is beyond our control).  You
> > might want to change it to something else or you'll get filenames
> > with numerical names.
> >
> >
> >> another question for parsing the xml output....is there a xml
> >> parser available for blast xml output or how to start.....
> >> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
> >> but I'm not sure how to start....sorry, I guess I'm too stupid....
> >> is their maybe another introduction or an example.
> >>
> >
> > Bio::SearchIO objects are used to parse BLAST XML output if you have
> > it saved to a file.  For instance:
> >
> > my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
> >
> > while (my $result = $factory->next_result) {
> >    while (my $hit = $result->next_hit) {
> >       while (my $hsp = $hit->next_hsp {
> >          #do stuff here
> >        }
> >     }
> > }
> >
> > The only thing that changes in parsing a text BLAST report from an
> > XML BLAST report is the -format line (similar to the -readmethod
> > parameter in RemoteBlast).  You shouldn't need to look up any more
> > documentation other than these on the wiki:
> >
> > http://www.bioperl.org/wiki/HOWTO:SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
> >
> > Pay attention to the fact you'll need to install XML::SAX (CPAN) and
> > that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
> > up parsing.
> >
> > Chris
> >
> >
> >> thanks
> >> Hubert
> >>
> >>
> >> Chris Fields wrote:
> >>
> >>> Yes, I see the same error you do.  But I have a similar script
> >>> (blastp, XML blast report, XML parsing, similar loop structure)
> >>> that  works fine.  I'm trying to dissect the problem but I think
> >>> it may be  something logically wrong here (something not so
> >>> obvious) and not a  bug...
> >>>
> >>> What I'm trying to say is, when you send sequences using
> >>> remoteblast  like, this you are essentially spamming the NCBI
> >>> BLAST server with  ~1600 requests.  This script wasn't set up with
> >>> that intent in mind;  you should really try to set up your own
> >>> local blast database if  possible.  If you can't, try running this
> >>> script in off-hours  (10pm-6am EST or something like that).
> >>>
> >>>
> >>> Chris
> >>>
> >>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
> >>>
> >>>
> >>>
> >>>> hi,
> >>>> input database: swissprot
> >>>>         matrix: pam30
> >>>>         count: 1
> >>>>         gapcosts: 9 1
> >>>>
> >>>> I know that there are  a lot of sequences, but that doesn't
> >>>> matter,  you can delete all of them except one, the amount of the
> >>>> sequences  is not the problem, the script reads one line and
> >>>> submits  it.....then the second line and so on.....I have tried
> >>>> it with only  one sequence either and I got the same result....
> >>>> the script run at  that time for more than 20
> >>>> minutes!!!!!! .....and that should be  enough time to retrieve
> >>>> the results for ONE sequence, I guess
> >>>>
> >>>> regards
> >>>> Hubert
> >>>>
> >>>>
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> You need to add the input conditions as well (you have several
> >>>>> <STDIN> lines which may play a role; I would like to know what
> >>>>> you  normally enter for those).
> >>>>>
> >>>>> How long did you let the script run?  I ran a quick check on
> >>>>> your  sequences; you have almost 1600, so you have to expect
> >>>>> that you'll  run into some problems here!  Most here (including
> >>>>> me) would  suggest you try installing a local blast setup for
> >>>>> something like  this.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> hi,
> >>>>>> I have submitted the bug -> Bug 2017
> >>>>>> with the script and input file, just start it from command line
> >>>>>>
> >>>>>> thank you very much
> >>>>>> greetings
> >>>>>>
> >>>>>> Hubert
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Hubert,
> >>>>>>>
> >>>>>>> I have a script that's using blastxml and XML output which
> >>>>>>> seems  to work.
> >>>>>>> I'll try looking at it to get a better idea this weekend.
> >>>>>>>
> >>>>>>> Chris
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> >>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
> >>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
> >>>>>>>> 'Sendu  Bala'
> >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>
> >>>>>>>> hi,
> >>>>>>>> sorry, but I have updated the remoteblast module and I have
> >>>>>>>> run  several
> >>>>>>>> attempts with the same results as before. It didn't work.
> >>>>>>>> I didn't get any results.
> >>>>>>>>
> >>>>>>>> regards
> >>>>>>>> Hubert
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Chris Fields wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Sendu, Hubert,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
> >>>>>>>>> the  problem
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> (break
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
> >>>>>>>>> RemoteBlast in
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> CVS;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
> >>>>>>>>> CVS  to see if
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> it
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> works.
> >>>>>>>>>
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
> >>>>>>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>>>
> >>>>>>>>>> Hubert Prielinger wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> hi,
> >>>>>>>>>>> I have the following program and it worked quite well,
> >>>>>>>>>>> for  retrieving
> >>>>>>>>>>> remoteblast results in a textfile,
> >>>>>>>>>>> now I have altered it to to xml, and it didn't work
> >>>>>>>>>>> anymore.....
> >>>>>>>>>>> it takes all the parameter at the commandline, submits
> >>>>>>>>>>> the  query, but
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>> I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> don't retrieve any results file anymore.....
> >>>>>>>>>>>
> >>>>>>>>>>> it seems that it hangs in a endless loop......
> >>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
> >>>>>>>>>>> over..... it
> >>>>>>>>>>> doesn't enter the else term anymore....
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> There is no problem with your code. The problem is with
> >>>>>>>>>> the  NCBI server
> >>>>>>>>>> and should be reported to them. You can visit the site and
> >>>>>>>>>> do  a blast,
> >>>>>>>>>> requesting xml format, and you will typically get one
> >>>>>>>>>> normal  'waiting'
> >>>>>>>>>> message and the promise that it will be updated in x
> >>>>>>>>>> seconds,  but
> >>>>>>>>>> subsequent attempts to get progress information result in
> >>>>>>>>>> an  xml error
> >>>>>>>>>> page because the NCBI server doesn't actually send any data.
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately the way that the bioperl code is written, it
> >>>>>>>>>> treats no
> >>>>>>>>>> data as 'waiting' instead of an error. I've offered a
> >>>>>>>>>> patch  to fix this
> >>>>>>>>>> at this bug page:
> >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>> Christopher Fields
> >>> Postdoctoral Researcher
> >>> Lab of Dr. Robert Switzer
> >>> Dept of Biochemistry
> >>> University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jun  5 14:56:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:56:18 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44846898.8020001@mrc-dunn.cam.ac.uk>
Message-ID: <006201c688d1$bad2aff0$15327e82@pyrimidine>


> Chris Fields wrote:
> > If you want flexibility or added functionality then you can always
> > contribute a patch, such as adding an option for filehandles,
> IO::String,
> > pipes/forks, or whatever you wish.
> 
> Well, it wouldn't be a new feature per se, but just changing the way the
> modules work under the hood.

...

> I use IPC::Open3 for blasts and have never run into problems, but it
> pretty much falls into the 'apt to cause deadlock' camp. It may pass
> tests on one machine but fail on others... is there any point in working
> up a patch (would something of questionable reliability ever be
> committed into bioperl)?

The main thing you should avoid is major API changes or issues which break
this module on other OS's.  I'm not sure that StandAloneBlast is 'broken' by
using a tempfile as the location of the BLAST report.  

Any way you go about it, you'll have to capture the BLAST output as a stream
and get it to persist in a SearchIO object somehow.  It's can be a pretty
decent memory hit to keep that report hanging around, esp. if it is larger.

...

> For academic interest, how do I get the 'raw input stream'? Wasn't that
> what my second version did?
> 
>  > my $fh = $blast_report->_fh;
>  > seek($fh, 0, 0);

That should work, yes.  Didn't see that one your previous response.  I can
get it work w/o problems with SearchIO directly but I haven't tried it with
StandAloneBlast.  Below is my script.  Commenting the seek line below
doesn't move the file pointer so the second round of parsing won't happen.

my $parser = Bio::SearchIO->new(  -file => shift,
                                  -format => 'blast');

my $fh = $parser->_fh;

while (<$fh>) {
     print;
}

seek($fh, 0,0);

$fh = $parser->fh;

print "Second round:\n";
while (<$fh>) {
    while (my $hit = $_->next_hit) {
        print $hit->accession,"\n";
    }
}


Chris


From hubert.prielinger at gmx.at  Mon Jun  5 15:12:37 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 13:12:37 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <006101c688ce$7185c330$15327e82@pyrimidine>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
Message-ID: <44848225.8080003@gmx.at>

hi chris,
sorry, I have tried it with the latest CVS version:

# $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $

but it still doesn't work.

Hubert

Chris Fields wrote:
> Hubert, 
>
> Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
> option to save XML was committed relatively recently (last month or so).
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Monday, June 05, 2006 1:18 PM
>> To: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> you were right, removing the composition-based statistics solved the
>> problem. Now I get the result viewed on STDIN, but it doesn't save the
>> output in the file.
>> I haved tried it by reopening the file and writing it to an other file
>> again, but it doesn't work.....
>> The strange thing is that if I retrieve text instead of xml output it
>> works without any problem. Don't know why
>>
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi chris,
>>>> thanks but I never intended to run the remoteblast with so much,
>>>> only a few of them, acutally I goal is to run the phiblast with
>>>> regular expression, so that i just don't need that
>>>> file anymore
>>>>
>>>>         
>>> Not a problem.  Just to let you know, I did manage to get the script
>>> working, so I'm marking the bug INVALID.  I think the problem isn't
>>> that there is an infinite loop so much as setting composition-based
>>> statistics causes the search to take much much longer; try removing
>>> that line to see what I mean.
>>>
>>> Just so you know, using $result->query_name doesn't get you what you
>>> would expect (it gives you a part of the RID, which you don't want;
>>> this is something in the XML output that is beyond our control).  You
>>> might want to change it to something else or you'll get filenames
>>> with numerical names.
>>>
>>>
>>>       
>>>> another question for parsing the xml output....is there a xml
>>>> parser available for blast xml output or how to start.....
>>>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
>>>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>>>> is their maybe another introduction or an example.
>>>>
>>>>         
>>> Bio::SearchIO objects are used to parse BLAST XML output if you have
>>> it saved to a file.  For instance:
>>>
>>> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>>>
>>> while (my $result = $factory->next_result) {
>>>    while (my $hit = $result->next_hit) {
>>>       while (my $hsp = $hit->next_hsp {
>>>          #do stuff here
>>>        }
>>>     }
>>> }
>>>
>>> The only thing that changes in parsing a text BLAST report from an
>>> XML BLAST report is the -format line (similar to the -readmethod
>>> parameter in RemoteBlast).  You shouldn't need to look up any more
>>> documentation other than these on the wiki:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>>>
>>> Pay attention to the fact you'll need to install XML::SAX (CPAN) and
>>> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
>>> up parsing.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> thanks
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Yes, I see the same error you do.  But I have a similar script
>>>>> (blastp, XML blast report, XML parsing, similar loop structure)
>>>>> that  works fine.  I'm trying to dissect the problem but I think
>>>>> it may be  something logically wrong here (something not so
>>>>> obvious) and not a  bug...
>>>>>
>>>>> What I'm trying to say is, when you send sequences using
>>>>> remoteblast  like, this you are essentially spamming the NCBI
>>>>> BLAST server with  ~1600 requests.  This script wasn't set up with
>>>>> that intent in mind;  you should really try to set up your own
>>>>> local blast database if  possible.  If you can't, try running this
>>>>> script in off-hours  (10pm-6am EST or something like that).
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> input database: swissprot
>>>>>>         matrix: pam30
>>>>>>         count: 1
>>>>>>         gapcosts: 9 1
>>>>>>
>>>>>> I know that there are  a lot of sequences, but that doesn't
>>>>>> matter,  you can delete all of them except one, the amount of the
>>>>>> sequences  is not the problem, the script reads one line and
>>>>>> submits  it.....then the second line and so on.....I have tried
>>>>>> it with only  one sequence either and I got the same result....
>>>>>> the script run at  that time for more than 20
>>>>>> minutes!!!!!! .....and that should be  enough time to retrieve
>>>>>> the results for ONE sequence, I guess
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> You need to add the input conditions as well (you have several
>>>>>>> <STDIN> lines which may play a role; I would like to know what
>>>>>>> you  normally enter for those).
>>>>>>>
>>>>>>> How long did you let the script run?  I ran a quick check on
>>>>>>> your  sequences; you have almost 1600, so you have to expect
>>>>>>> that you'll  run into some problems here!  Most here (including
>>>>>>> me) would  suggest you try installing a local blast setup for
>>>>>>> something like  this.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> hi,
>>>>>>>> I have submitted the bug -> Bug 2017
>>>>>>>> with the script and input file, just start it from command line
>>>>>>>>
>>>>>>>> thank you very much
>>>>>>>> greetings
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Hubert,
>>>>>>>>>
>>>>>>>>> I have a script that's using blastxml and XML output which
>>>>>>>>> seems  to work.
>>>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
>>>>>>>>>> 'Sendu  Bala'
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> sorry, but I have updated the remoteblast module and I have
>>>>>>>>>> run  several
>>>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>>>> I didn't get any results.
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Chris Fields wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Sendu, Hubert,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
>>>>>>>>>>> the  problem
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> (break
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
>>>>>>>>>>> RemoteBlast in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> CVS;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
>>>>>>>>>>> CVS  to see if
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>>>
>>>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>>>> hi,
>>>>>>>>>>>>> I have the following program and it worked quite well,
>>>>>>>>>>>>> for  retrieving
>>>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>>>> now I have altered it to to xml, and it didn't work
>>>>>>>>>>>>> anymore.....
>>>>>>>>>>>>> it takes all the parameter at the commandline, submits
>>>>>>>>>>>>> the  query, but
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>>>
>>>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
>>>>>>>>>>>>> over..... it
>>>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> There is no problem with your code. The problem is with
>>>>>>>>>>>> the  NCBI server
>>>>>>>>>>>> and should be reported to them. You can visit the site and
>>>>>>>>>>>> do  a blast,
>>>>>>>>>>>> requesting xml format, and you will typically get one
>>>>>>>>>>>> normal  'waiting'
>>>>>>>>>>>> message and the promise that it will be updated in x
>>>>>>>>>>>> seconds,  but
>>>>>>>>>>>> subsequent attempts to get progress information result in
>>>>>>>>>>>> an  xml error
>>>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the way that the bioperl code is written, it
>>>>>>>>>>>> treats no
>>>>>>>>>>>> data as 'waiting' instead of an error. I've offered a
>>>>>>>>>>>> patch  to fix this
>>>>>>>>>>>> at this bug page:
>>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher
>>>>>>> Lab of Dr. Robert Switzer
>>>>>>> Dept of Biochemistry
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 15:14:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 20:14:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006201c688d1$bad2aff0$15327e82@pyrimidine>
References: <006201c688d1$bad2aff0$15327e82@pyrimidine>
Message-ID: <44848280.1080703@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>>> If you want flexibility or added functionality then you can 
>>> always contribute a patch, such as adding an option for 
>>> filehandles, IO::String, pipes/forks, or whatever you wish.
>> 
>> Well, it wouldn't be a new feature per se, but just changing the 
>> way the modules work under the hood.
> 
> ...
> 
>> I use IPC::Open3 for blasts and have never run into problems, but 
>> it pretty much falls into the 'apt to cause deadlock' camp. It may
>> pass tests on one machine but fail on others... is there any point
>> in working up a patch (would something of questionable reliability
>> ever be committed into bioperl)?
> 
> The main thing you should avoid is major API changes or issues which
> break this module on other OS's.  I'm not sure that StandAloneBlast
> is 'broken' by using a tempfile as the location of the BLAST report.
> 
> 
> 
> Any way you go about it, you'll have to capture the BLAST output as a
> stream and get it to persist in a SearchIO object somehow.  It's can
> be a pretty decent memory hit to keep that report hanging around, 
> esp. if it is larger.

Well at the moment StandAloneBlast runs the blast program and stores its
output to a temp file, then gives the temp file name as an arg to
SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
of the blast program directly to my parser. The question is, why wasn't
this done in StandAloneBlast? I would get the blast program output
handle and pass it directly to SearchIO with the -fh option of new().
The only difference here is it's faster and more efficient with the
direct pipe, but you can't subsequently seek the SearchIO's internal
filehandle (as we discussing in this thread). There are no (additional)
issues with memory.

If it isn't done using IPC::Open3 (or similar) because the original
author already knew it wouldn't be reliable enough, or for some other
reason(s), fine. Does anyone know the reasons?


From cjfields at uiuc.edu  Mon Jun  5 15:43:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 14:43:50 -0500
Subject: [Bioperl-l] StandAloneBlast
In-Reply-To: <44848280.1080703@mrc-dunn.cam.ac.uk>
Message-ID: <006301c688d8$5e4ce910$15327e82@pyrimidine>

> Well at the moment StandAloneBlast runs the blast program and stores its
> output to a temp file, then gives the temp file name as an arg to
> SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
> of the blast program directly to my parser. The question is, why wasn't
> this done in StandAloneBlast? 

Probably for the reasons you outlined before:

'I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... '

Why would we take a chance on using something that works on one OS/machine
and fails to work on another?  

> I would get the blast program output handle and pass it directly to 
> SearchIO with the -fh option of new().
> The only difference here is it's faster and more efficient with the
> direct pipe, but you can't subsequently seek the SearchIO's internal
> filehandle (as we discussing in this thread). There are no (additional)
> issues with memory.

Like I said before, you can make changes and submit a patch.  The code here
is over five years old, and many many things have changed since then, so you
might find something works now which wasn't available or didn't work then.
It hasn't really been a priority (it certainly hasn't been mine).  Most
people don't care b/c it just works and a vast majority don't worry/care
about the internals.  

The issue at hand is whether any code changes will work on all OS's, not
just yours.  BioPerl is used the world over on just about every OS, so ANY
code changes need to take that into consideration.  I can guarantee that if
you made changes that break or reduce performance on 50% of the OS's, it'll
likely get rolled back.  You need the best cross-platform compatibility
possible.

We've now veered WAY off topic here.  If we intend on continuing this, we
need to switch the thread topic.

Chris

> If it isn't done using IPC::Open3 (or similar) because the original
> author already knew it wouldn't be reliable enough, or for some other
> reason(s), fine. Does anyone know the reasons?


From cjfields at uiuc.edu  Mon Jun  5 16:30:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 15:30:01 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
Message-ID: <006401c688de$d38035b0$15327e82@pyrimidine>

I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
finished yet (BioSQL and Bioperl-guts isn't done yet) and there are probably
some mangld worsd in there so have mercy on me!  It's been a busy month.

http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006

Fling your mud and abuses by responding to this thread per usual

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Mon Jun  5 23:42:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 22:42:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44848225.8080003@gmx.at>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
	<44848225.8080003@gmx.at>
Message-ID: <D7A85F26-1ADD-446E-A5F3-8C3420746364@uiuc.edu>

Hubert,

I had no trouble getting this to work; the script scans through each  
sequence and save the XML output to a file on both Windows and Mac OS  
X, both using bioperl-live.  The older RemoteBlast would only save  
text; otherwise it saved an empty file.  Using your script I get  
several XML BLAST output files (1.xml, 2.xml, etc) based on a  
counter, each about 1 MB.  All were parseable by SearchIO.

I did notice that if certain parameters weren't entered in correctly  
then you will get no data (such as setting the database to 'swiss'  
instead of 'swissprot').  A warning pops up stating that no data was  
returned when this occurs (it doesn't tell you what was wrong, just  
that no data came back from NCBI).  If you see this then that is  
likely the problem.  Besides that, I don't know what else it can be.

Chris

On Jun 5, 2006, at 2:12 PM, Hubert Prielinger wrote:

> hi chris,
> sorry, I have tried it with the latest CVS version:
>
> # $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $
>
> but it still doesn't work.
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> Make sure you have the latest Bio::Tools::Run::RemoteBlast from  
>> CVS.  The
>> option to save XML was committed relatively recently (last month  
>> or so).
>>
>> Chris
>>

...

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From heikki at sanbi.ac.za  Tue Jun  6 03:40:06 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 09:40:06 +0200
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <200606060940.07285.heikki@sanbi.ac.za>

Chris,

I am mystified. I'll try to get the massive 'return undef' change done first 
and the have an other look.

	-Heikki

On Friday 02 June 2006 21:13, Chris Fields wrote:
> Heikki,
>
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
>
> Not sure what's going on there but using read_aln and write_aln seem to
> work normally.  It may have something to do with Bio::SimpleAlign but I'm
> not absolutely sure.
>
> Any ideas what may be going on here?
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Tue Jun  6 04:04:00 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 10:04:00 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606020952.08034.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
Message-ID: <200606061004.01193.heikki@sanbi.ac.za>


OK. I've gone through all cases where return and undef are on the same lines.
I've done changes in 185 files.

My aims have ben the following:

1. Remove undef from return undef when not necessary.
	This will make it easier to spot cases where undef matters in the future
	Most of the changes fall into this category. The context is clearly scalar.

2. Returning undef when user expects en empty list is bad

./Bio/Tools/Est2Genome.pm fixed
./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
                               not fixed
./Bio/Matrix/PSM/SiteMatrix.pm  fixed
./Bio/Matrix/PSM/Psm  fixed
./Bio/DB/Taxonomy::entrez.pm fixed

3. If docs say method returns nothing, explicit undef is not the right thing 
to return

4. do not return an explicit undef if the method is supposed to return false 
on failure


Before I do the commit, I'd like to see number people to do 'make test' on 
bioperl-live and report back after the commit they see changes. There are 
quite a few tests that fail currently.

I'll do the commit tomorrow Wednesday at 9 o'cock GMT.

	-Heikki


On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> I've started going through the files that have 'return undef' lines.
> I'll report back later.
>
> Initial impression is that there are a few cases where the context
> indicates list to be returned but failure returns an explicit undef. I'll
> fix those.
>
> Most of the cases are much more ambiguous. Even when documentation says the
> failure returns undef, it is clearly meant to mean false. In most cases
> documentation does not comment on return value at all. Luckily the context
> is almost always scalar and therefore it does not matter too much.
>
> I seem to be changing 'return undef' to plain 'return' a bit overzealously,
> so do not take it personally.
>
> 	-Heikki
>
> On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > ....
> >
> > > > Again, didn't do that.
> > >
> > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > certainly not directed at your recent changes to Bio::Restriction::IO.
> > > In fact, I put in the above * comment to exclude your changes from my
> > > discussion; you changed the docs because the code never did what they
> > > said they did (the docs were bad). That's fine (good!). My comments
> > > were a general point, slightly directed at the idea of changing all the
> > > return undef;s - changing the code so that it no longer matches the
> > > docs of a previously working method. That's what I think is bad. Though
> > > in this particular case it shouldn't make any difference at all.
> >
> > Agreed.  In any case, if tests have been properly set up then they should
> > catch problems.  This is, of course, if they are properly set up.
> >
> > Chris
> >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 05:17:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 10:17:48 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <4485483C.4080505@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Heikki,
> 
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> 
> Not sure what's going on there but using read_aln and write_aln seem to work
> normally.  It may have something to do with Bio::SimpleAlign but I'm not
> absolutely sure.
> 
> Any ideas what may be going on here?

Yes, see my replies on the bug page. But so more people see the 
question, I'll ask here: can anyone offer examples of metafasta files, 
especially multiple alignments?


From cjfields at uiuc.edu  Tue Jun  6 10:30:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:30:17 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <4485483C.4080505@mrc-dunn.cam.ac.uk>
Message-ID: <000901c68975$bb9968d0$15327e82@pyrimidine>

Sendu,

This is Heikki's original submission for the specs for meta format:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
sta

So it's really a specialized FASTA format used to store meta information
about sequences.  Seems mainly useful for amino acid sequences, but is
extended to include properties of nucleotides like DNA content, RNA sec.
structure, and so on.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Tuesday, June 06, 2006 4:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> work
> > normally.  It may have something to do with Bio::SimpleAlign but I'm not
> > absolutely sure.
> >
> > Any ideas what may be going on here?
> 
> Yes, see my replies on the bug page. But so more people see the
> question, I'll ask here: can anyone offer examples of metafasta files,
> especially multiple alignments?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun  6 10:36:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:36:16 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <200606060940.07285.heikki@sanbi.ac.za>
Message-ID: <000a01c68976$9479e300$15327e82@pyrimidine>

Heikki,

I agree it's all a bit weird.  Not too concerning at the moment though since
it works at the moment but it might take some tinkering with SimpleAlign to
get it to behave.

This alignment format has some of the same characteristics as Stockholm
alignment format but looks easier to work with.  I work with RNA,
specifically one with a conserved secondary structure so this format appeals
to me quite a bit.  If I get time (probably not for a while) I may tinker
with Bio::AlignIO::stockholm to get a write_aln() method up-and-running and
see if I can convert back-and-forth from the two.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Tuesday, June 06, 2006 2:40 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris,
> 
> I am mystified. I'll try to get the massive 'return undef' change done
> first
> and the have an other look.
> 
> 	-Heikki
> 
> On Friday 02 June 2006 21:13, Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> > work normally.  It may have something to do with Bio::SimpleAlign but
> I'm
> > not absolutely sure.
> >
> > Any ideas what may be going on here?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 11:40:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 16:40:05 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000901c68975$bb9968d0$15327e82@pyrimidine>
References: <000901c68975$bb9968d0$15327e82@pyrimidine>
Message-ID: <4485A1D5.5090805@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu,
> 
> This is Heikki's original submission for the specs for meta format:
> 
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
> sta
> 
> So it's really a specialized FASTA format used to store meta information
> about sequences.  Seems mainly useful for amino acid sequences, but is
> extended to include properties of nucleotides like DNA content, RNA sec.
> structure, and so on.  

Thanks. It's not really clear to me if the meta data needs to be 
considered in the context of an alignment. That is, if you have two meta 
sequences with the same primary sequence, will all their meta data 
necessarily be the same? Or could they be different?

If the same, then the test data and test need to be fixed so my patched 
version of Bio::AlignIO::metafasta passes the tests.

If different, how should the meta data be handled? Like the test implies 
with its expected value for the consensus (just treat the primary 
sequence and all meta data as one long string)?
Is it really the intent to include characters from the meta data names 
when considering what symbols we've seen with symbol_chars() method?
Do we include the meta data name symbols when numbering?

Thoughts anyone?


From cjfields at uiuc.edu  Tue Jun  6 17:07:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 16:07:39 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <006401c688de$d38035b0$15327e82@pyrimidine>
Message-ID: <000601c689ad$3e6aec20$15327e82@pyrimidine>

I hate talking to myself...

I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
(appropriately enough, on 6-6-06).  I am trying out a new script which helps
with all the developer list noise; hope everybody likes it.

Cheers,

Chris   

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Monday, June 05, 2006 3:30 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] ListSummaries for May 10-31.
> 
> I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
> finished yet (BioSQL and Bioperl-guts isn't done yet) and there are
> probably
> some mangld worsd in there so have mercy on me!  It's been a busy month.
> 
> http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006
> 
> Fling your mud and abuses by responding to this thread per usual
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun  6 20:41:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 19:41:08 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <44861D47.7090205@infotech.monash.edu.au>
Message-ID: <000601c689cb$11f568a0$15327e82@pyrimidine>

I could do something like that.  Right now I have a script that just grabs
the text from the web page:

http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html

and uses regexes and hashes to sort everything and make some sense of the
noise.  The resolution for a bug isn't on that page but in the linked
message so I would need to grab the link from HTML, go to that page, then
get the resolution if there is one, so at the moment I just check each one
(thanks for the bug hunt Jason!).  I usually have to do a little touching up
afterwards, such as fix links and such, but the script really saves on time.
As you can tell, it's been a busy month!

I'm (very slowly) updating the script to go through the mail list threads
recursively but haven't really gotten anywhere with that yet.  Benchwork has
intervened yet again!

Chris

> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Tuesday, June 06, 2006 7:27 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
> 
> > I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> > (appropriately enough, on 6-6-06).  I am trying out a new script which
> helps
> > with all the developer list noise; hope everybody likes it.
> 
> I like the CVS summaries.
> 
> For the bug summaries, would it make sense to categorise/sort by
> category/status eg. RESOLVED, WORKSFORME etc?
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Tue Jun  6 20:26:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 07 Jun 2006 10:26:47 +1000
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689ad$3e6aec20$15327e82@pyrimidine>
References: <000601c689ad$3e6aec20$15327e82@pyrimidine>
Message-ID: <44861D47.7090205@infotech.monash.edu.au>

> I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> (appropriately enough, on 6-6-06).  I am trying out a new script which helps
> with all the developer list noise; hope everybody likes it.

I like the CVS summaries.

For the bug summaries, would it make sense to categorise/sort by 
category/status eg. RESOLVED, WORKSFORME etc?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jason at bioperl.org  Wed Jun  7 00:04:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Jun 2006 00:04:02 -0400
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689cb$11f568a0$15327e82@pyrimidine>
References: <000601c689cb$11f568a0$15327e82@pyrimidine>
Message-ID: <8D9B514C-ADB4-409F-A55F-DC0C3DA9354A@bioperl.org>

It is possible some of this can be extracted from the bugzilla as a  
query (all the changes from X to Y) and generate RSS or text that can  
be processed.

-jason
On Jun 6, 2006, at 8:41 PM, Chris Fields wrote:

> I could do something like that.  Right now I have a script that  
> just grabs
> the text from the web page:
>
> http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html
>
> and uses regexes and hashes to sort everything and make some sense  
> of the
> noise.  The resolution for a bug isn't on that page but in the linked
> message so I would need to grab the link from HTML, go to that  
> page, then
> get the resolution if there is one, so at the moment I just check  
> each one
> (thanks for the bug hunt Jason!).  I usually have to do a little  
> touching up
> afterwards, such as fix links and such, but the script really saves  
> on time.
> As you can tell, it's been a busy month!
>
> I'm (very slowly) updating the script to go through the mail list  
> threads
> recursively but haven't really gotten anywhere with that yet.   
> Benchwork has
> intervened yet again!
>
> Chris
>
>> -----Original Message-----
>> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
>> Sent: Tuesday, June 06, 2006 7:27 PM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
>>
>>> I have updated the ListSummaries to include BioSQL-l and Bioperl- 
>>> guts-l
>>> (appropriately enough, on 6-6-06).  I am trying out a new script  
>>> which
>> helps
>>> with all the developer list noise; hope everybody likes it.
>>
>> I like the CVS summaries.
>>
>> For the bug summaries, would it make sense to categorise/sort by
>> category/status eg. RESOLVED, WORKSFORME etc?
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From heikki at sanbi.ac.za  Wed Jun  7 05:57:47 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Jun 2006 11:57:47 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606061004.01193.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
	<200606061004.01193.heikki@sanbi.ac.za>
Message-ID: <200606071157.47736.heikki@sanbi.ac.za>

Committed.

Please report any surprising changes in functionality to the list.

	-Heikki

On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> OK. I've gone through all cases where return and undef are on the same
> lines. I've done changes in 185 files.
>
> My aims have ben the following:
>
> 1. Remove undef from return undef when not necessary.
> 	This will make it easier to spot cases where undef matters in the future
> 	Most of the changes fall into this category. The context is clearly
> scalar.
>
> 2. Returning undef when user expects en empty list is bad
>
> ./Bio/Tools/Est2Genome.pm fixed
> ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
>                                not fixed
> ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> ./Bio/Matrix/PSM/Psm  fixed
> ./Bio/DB/Taxonomy::entrez.pm fixed
>
> 3. If docs say method returns nothing, explicit undef is not the right
> thing to return
>
> 4. do not return an explicit undef if the method is supposed to return
> false on failure
>
>
> Before I do the commit, I'd like to see number people to do 'make test' on
> bioperl-live and report back after the commit they see changes. There are
> quite a few tests that fail currently.
>
> I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
>
> 	-Heikki
>
> On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > I've started going through the files that have 'return undef' lines.
> > I'll report back later.
> >
> > Initial impression is that there are a few cases where the context
> > indicates list to be returned but failure returns an explicit undef. I'll
> > fix those.
> >
> > Most of the cases are much more ambiguous. Even when documentation says
> > the failure returns undef, it is clearly meant to mean false. In most
> > cases documentation does not comment on return value at all. Luckily the
> > context is almost always scalar and therefore it does not matter too
> > much.
> >
> > I seem to be changing 'return undef' to plain 'return' a bit
> > overzealously, so do not take it personally.
> >
> > 	-Heikki
> >
> > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > ....
> > >
> > > > > Again, didn't do that.
> > > >
> > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > certainly not directed at your recent changes to
> > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > exclude your changes from my discussion; you changed the docs because
> > > > the code never did what they said they did (the docs were bad).
> > > > That's fine (good!). My comments were a general point, slightly
> > > > directed at the idea of changing all the return undef;s - changing
> > > > the code so that it no longer matches the docs of a previously
> > > > working method. That's what I think is bad. Though in this particular
> > > > case it shouldn't make any difference at all.
> > >
> > > Agreed.  In any case, if tests have been properly set up then they
> > > should catch problems.  This is, of course, if they are properly set
> > > up.
> > >
> > > Chris
> > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Michael.Muratet at operon.com  Tue Jun  6 14:34:38 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Tue, 6 Jun 2006 13:34:38 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>

Greetings

I am trying to install bioperl-db in preparation for installing a biosql database. I'm running on a Dell PowerEdge with quad dual-core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl 1.5.1.  I have installed mysql v5.0.21 from source with --with-innodb set for the configuration. I installed bioperl-db from cvs. I have the latest DBI and DBD:mysql installed a few weeks ago from CPAN. The installation has been working well with perl otherwise, for example, the Ensembl core API works OK. SHOW ENGINES indicates that innodb is enabled.  I have attached a snippet from the top of the output below. I searched the web and the bioperl-db list and haven't found anything that appears to be relevant. I've done several of these installs and they've pretty much completed without a single glitch. Does anyone have any ideas how to isolate the problem?

Thanks

Mike

[mmuratet at HSV-PROBE bioperl-db]$ make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/01dbadaptor.....ok 14/19
------------- EXCEPTION  -------------
MSG: failed to open connection: Transactions not supported by database
STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1477
STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm:518
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
STACK toplevel t/01dbadaptor.t:62


From hlapp at gmx.net  Wed Jun  7 08:52:22 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 08:52:22 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
Message-ID: <4F23D2EA-2218-4023-A3F6-3284912952BE@gmx.net>

Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From nlhepler at umd.edu  Wed Jun  7 09:46:32 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 09:46:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
Message-ID: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>

Hello,

I am having some difficulty here.  I have a list of accessions, which  
are the parameters for a get_Stream_by_acc() function on a  
Bio::DB::GenBank object.  None of the returned GenBank information  
for any of my accessions seems to contain variation data, no matter  
how I try to coax it out with unflattener and typemapper.  This data  
is, however, available via the web interface of NCBI Nucleotide, as  
an optional feature (SNP).  I was wondering if there was some option  
I'm missing in the initialization of the Bio::DB::GenBank object (no  
options currently) that will coax the database into giving me this  
data?  Or something else that I'm missing altogether.  The organism  
of interest is human, taxon:9606.

Nicolaus Lance Hepler
nlhepler at mail dot umd dot edu


From cjfields at uiuc.edu  Wed Jun  7 09:56:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 08:56:16 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606071157.47736.heikki@sanbi.ac.za>
Message-ID: <000601c68a3a$265552a0$15327e82@pyrimidine>

Yikes!  I'll download a tarball from anon CVS and run a comparison (vs my
pre-updated bioperl-live) on WinXP and Mac OS X 10.4 (Intel) and report back
success/fail; may be a bit.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> 	-Heikki
> 
> On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> > OK. I've gone through all cases where return and undef are on the same
> > lines. I've done changes in 185 files.
> >
> > My aims have ben the following:
> >
> > 1. Remove undef from return undef when not necessary.
> > 	This will make it easier to spot cases where undef matters in the
> future
> > 	Most of the changes fall into this category. The context is clearly
> > scalar.
> >
> > 2. Returning undef when user expects en empty list is bad
> >
> > ./Bio/Tools/Est2Genome.pm fixed
> > ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
> >                                not fixed
> > ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> > ./Bio/Matrix/PSM/Psm  fixed
> > ./Bio/DB/Taxonomy::entrez.pm fixed
> >
> > 3. If docs say method returns nothing, explicit undef is not the right
> > thing to return
> >
> > 4. do not return an explicit undef if the method is supposed to return
> > false on failure
> >
> >
> > Before I do the commit, I'd like to see number people to do 'make test'
> on
> > bioperl-live and report back after the commit they see changes. There
> are
> > quite a few tests that fail currently.
> >
> > I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
> >
> > 	-Heikki
> >
> > On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > > I've started going through the files that have 'return undef' lines.
> > > I'll report back later.
> > >
> > > Initial impression is that there are a few cases where the context
> > > indicates list to be returned but failure returns an explicit undef.
> I'll
> > > fix those.
> > >
> > > Most of the cases are much more ambiguous. Even when documentation
> says
> > > the failure returns undef, it is clearly meant to mean false. In most
> > > cases documentation does not comment on return value at all. Luckily
> the
> > > context is almost always scalar and therefore it does not matter too
> > > much.
> > >
> > > I seem to be changing 'return undef' to plain 'return' a bit
> > > overzealously, so do not take it personally.
> > >
> > > 	-Heikki
> > >
> > > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > > ....
> > > >
> > > > > > Again, didn't do that.
> > > > >
> > > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > > certainly not directed at your recent changes to
> > > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > > exclude your changes from my discussion; you changed the docs
> because
> > > > > the code never did what they said they did (the docs were bad).
> > > > > That's fine (good!). My comments were a general point, slightly
> > > > > directed at the idea of changing all the return undef;s - changing
> > > > > the code so that it no longer matches the docs of a previously
> > > > > working method. That's what I think is bad. Though in this
> particular
> > > > > case it shouldn't make any difference at all.
> > > >
> > > > Agreed.  In any case, if tests have been properly set up then they
> > > > should catch problems.  This is, of course, if they are properly set
> > > > up.
> > > >
> > > > Chris
> > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed Jun  7 11:42:32 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 07 Jun 2006 11:42:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>
Message-ID: <C0AC6C28.8C12%osborne1@optonline.net>

Nicolaus,

The short answer is no, there's no option that will omit or add a particular
feature or annotation to the Sequence object returned by Bio::DB::GenBank.
Can you give some example accessions?

Brian O.


On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:

> Hello,
> 
> I am having some difficulty here.  I have a list of accessions, which
> are the parameters for a get_Stream_by_acc() function on a
> Bio::DB::GenBank object.  None of the returned GenBank information
> for any of my accessions seems to contain variation data, no matter
> how I try to coax it out with unflattener and typemapper.  This data
> is, however, available via the web interface of NCBI Nucleotide, as
> an optional feature (SNP).  I was wondering if there was some option
> I'm missing in the initialization of the Bio::DB::GenBank object (no
> options currently) that will coax the database into giving me this
> data?  Or something else that I'm missing altogether.  The organism
> of interest is human, taxon:9606.
> 
> Nicolaus Lance Hepler
> nlhepler at mail dot umd dot edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nlhepler at umd.edu  Wed Jun  7 12:26:06 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 12:26:06 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <C0AC6C28.8C12%osborne1@optonline.net>
References: <C0AC6C28.8C12%osborne1@optonline.net>
Message-ID: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>

Brian,

A sample accession is BC000007.  I figured a way around it though.   
Rather than automate the whole process, I just downloaded from Batch  
Entrez a flat .gb file of all my accessions.  It's not flexible, and  
will be inconvenient when we expand the dataset, but it will provide  
me with data to work with for now.

Nicolaus

> Nicolaus,
>
> The short answer is no, there's no option that will omit or add a  
> particular
> feature or annotation to the Sequence object returned by  
> Bio::DB::GenBank.
> Can you give some example accessions?
>
> Brian O.
>
>
> On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
>
>> Hello,
>>
>> I am having some difficulty here.  I have a list of accessions, which
>> are the parameters for a get_Stream_by_acc() function on a
>> Bio::DB::GenBank object.  None of the returned GenBank information
>> for any of my accessions seems to contain variation data, no matter
>> how I try to coax it out with unflattener and typemapper.  This data
>> is, however, available via the web interface of NCBI Nucleotide, as
>> an optional feature (SNP).  I was wondering if there was some option
>> I'm missing in the initialization of the Bio::DB::GenBank object (no
>> options currently) that will coax the database into giving me this
>> data?  Or something else that I'm missing altogether.  The organism
>> of interest is human, taxon:9606.
>>
>> Nicolaus Lance Hepler
>> nlhepler at mail dot umd dot edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From lstein at cshl.edu  Wed Jun  7 12:50:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Jun 2006 12:50:24 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <4483F338.7090909@mrc-dunn.cam.ac.uk>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
Message-ID: <200606071250.25026.lstein@cshl.edu>

I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
because the CGI upload filehandle is not seekable (for good reasons that I 
won't inflict on you)! You'll have to write to a temporary file, or else read 
the whole sequence into memory. Sorry about this.

Lincoln

On Monday 05 June 2006 05:02, Sendu Bala wrote:
> Wijaya Edward wrote:
> > Dear Lincoln and experts
> >
> > Curently I have a CGI application that does this:
> >
> > 1.  read and uploaded file
> > 2. check the content of the file whether fasta or not
> > 3. print out the content of the file.
> >
> >
> > Now the problem I'm facing is that
> > on step three. The content of the file handled is altered
> > namely the very first line does not get printed.
>
> The problem is almost certainly that the guessing is done by reading the
> first line of the filehandle, so that your subsequent while loop on that
> same filehandle starts at the second line.
> Just seek the filehandle back to the start before trying to print the
> contents out.
>
> ..
> my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
> my $format_upload  = $guesser_upload->guess;
> seek($fh_upload, 0, 0);
> ..
> while (<$fh_upload>) {
>      ...
> }
>
> An alternative might be to pass GuessSeqFormat the filename in which
> case it would make its own filehandle and close it, leaving your own
> filehandle untouched.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From paul.boutros at utoronto.ca  Wed Jun  7 13:03:01 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 13:03:01 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
Message-ID: <1149699781.448706c5e803d@webmail.utoronto.ca>

Hi,

Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 and I had a few 
failures:

Failed Test         Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/Annotation.t                    89    2  79 88
t/Biblio.t                        24    1  2
t/LocusLink.t                     23    1  23
t/PhysicalMap.t                   14    2  11-12
t/RepeatMasker.t                   6    3  1-2 6
t/StandAloneBlast.t               18    4  19-22
t/TaxonTree.t                     17   30  11 18-42
t/alignUtilities.t                 9    1  9
t/psm.t              255 65280    48   35  29 32-48
t/tutorial.t                      21   15  7-21

Not sure if any of these are related to the "return undef" changes, or are known.  I also 
had some warnings running BioGraphics.t

t/BioGraphics................Use of uninitialized value in numeric lt (<) at Bio/Graphics/
FeatureFile.pm line 547, <GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
t/BioGraphics................ok

I also ran the tests manually and below I've attached what came out (doesn't always agree 
with the results of make test, and in a few cases (e.g. tutorial.t or StandAloneBlast.t) 
there were no errors running the tests manually.
Paul

Annotation.t
============
not ok 8
# Test 8 got: '' (t/Annotation.t at line 59)
#   Expected: '0'

not ok 71
# Test 71 got: 'dumpster|test case|Ann:00001' (t/Annotation.t at line 187)
#    Expected: 'dumpster|test case|'

not ok 79
# Failed test 79 in t/Annotation.t at line 217

ok 85
Use of uninitialized value in concatenation (.) or string at /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annot
ationFactory.pm line 236.

------------- EXCEPTION  -------------
MSG: Bio::AnnotationI implementation Bio::Annotation:: failed to load:
------------- EXCEPTION  -------------
MSG: Failed to load module Bio::Annotation::. Can't locate Bio/Annotation/.pm in @INC 
(@INC contains: t /db2blast/Paul/perl5.8
.7/lib/5.8.7/aix /db2blast/Paul/perl5.8.7/lib/5.8.7 /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/aix /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7 /db2blast/Paul/perl5.8.7/lib/site_perl .) at /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7/Bio/Root/Root.pm
 line 396.

STACK Bio::Root::Root::_load_module /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Root/
Root.pm:398
STACK (eval) /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Annotation/
AnnotationFactory.pm:149
STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:148
STACK toplevel t/Annotation.t:237
--------------------------------------

STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:152
STACK toplevel t/Annotation.t:237
--------------------------------------


PhysicalMap.t
=============
not ok 11
# Test 11 got: <UNDEF> (t/PhysicalMap.t at line 55)
#    Expected: '0' (code holds and returns a string, definition requires a boolean)
not ok 12
# Test 12 got: '3' (t/PhysicalMap.t at line 56)
#    Expected: '1' (code holds and returns a string, definition requires a boolean)

TaxonTree.t
===========
ok 10
Use of uninitialized value in string eq at /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/
Bio/Taxonomy/Taxon.pm line 559.
not ok 11
# Test 11 got: <UNDEF> (t/TaxonTree.t at line 35)
#    Expected: 'species'
ok 12 # foo is not a rank, class variable @RANK not initialised
ok 13
ok 14
ok 15
ok 16
ok 17
ok 18
Can't use string ("this could be anything") as a HASH ref while "strict refs" in use at /
db2blast/Paul/perl5.8.7/lib/site_perl
/5.8.7/Bio/Taxonomy/Taxon.pm line 452.

alignUtilities.t
================
ok 6

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------
ok 7
ok 8
not ok 9
# Test 9 got: '1' (t/alignUtilities.t at line 53)
#   Expected: '3'

RepeatMasker.t
==============
t/RepeatMasker...............FAILED tests 1-2, 6
        Failed 3/6 tests, 50.00% okay

StandAloneBlast.t
=================
t/StandAloneBlast............FAILED tests 19-22
        Failed 4/18 tests, 77.78% okay

psm.t
=====
t/Pseudowise.................ok
t/psm........................NOK 29Illegal division by zero at t/psm.t line 147, <GEN1> 
line 36.
t/psm........................dubious
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
        Failed 18/48 tests, 62.50% okay
t/QRNA.......................ok

tutorial.t
==========
t/tutorial...................ok 5/21
The following numeric arguments can be passed to run the corresponding demo-script.
1  => sequence_manipulations
2  => seqstats_and_seqwords
3  => restriction_and_sigcleave
4  => other_seq_utilities
5  => run_perl
6  => searchio_parsing
8  => hmmer_parsing
9  => simplealign
10 => gene_prediction_parsing
11 => access_remote_db
12 => index_local_db
13 => fetch_local_db    (NOTE: needs to be run with demo 12)
14 => sequence_annotation
15 => largeseqs
16 => liveseqs
17 => run_struct
18 => demo_variations
19 => demo_xml
20 => run_tree
21 => run_map
22 => run_remoteblast
23 => run_standaloneblast
24 => run_clustalw_tcoffee
25 => run_psw_bl2seq

In addition the argument "100" followed by the name of a single
bioperl object will display a list of all the public methods
available from that object and from what object they are inherited.

Using the parameter "0" will run all the tests that do not require
external programs (i.e. tests 1 to 22).
Using any other argument (or no argument) will run this display.

So typical command lines might be:
To run all core demo scripts:
 > perl -w  bptutorial.pl 0
or to just run the local indexing demos:
 > perl -w  bptutorial.pl 12 13
or to list all the methods available for object Bio::Tools::SeqStats -
 > perl -w  bptutorial.pl 100 Bio::Tools::SeqStats

t/tutorial...................FAILED tests 7-21
        Failed 15/21 tests, 28.57% okay

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> -Heikki
> 


From sb at mrc-dunn.cam.ac.uk  Wed Jun  7 12:54:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 07 Jun 2006 17:54:31 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <200606071250.25026.lstein@cshl.edu>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
	<200606071250.25026.lstein@cshl.edu>
Message-ID: <448704C7.6080201@mrc-dunn.cam.ac.uk>

Lincoln Stein wrote:
> I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
> because the CGI upload filehandle is not seekable (for good reasons that I 
> won't inflict on you)! You'll have to write to a temporary file, or else read 
> the whole sequence into memory. Sorry about this.

The OP already had success with my alternative solution.


>> An alternative might be to pass GuessSeqFormat the filename in which
>> case it would make its own filehandle and close it, leaving your own
>> filehandle untouched.


From hlapp at gmx.net  Wed Jun  7 13:25:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 13:25:25 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
Message-ID: <76434774-51A4-46E7-97AA-1E9227CB7771@gmx.net>

Hi Michael,

yes it looks like a problem in DBD if DBD::mysql fails to recognize  
that the mysql instance to which it is connected does support  
transactions. You can verify this by writing a simple script that  
tries to open a connection with
{ AutoCommit => 0 } as the parameter hash:

	use DBI;
	my $dbh = DBI->connect("dbi:mysql:database=<yourdb>;host=<yourhost>",
	                       "username","password",
	                       { AutoCommit => 0, RaiseError => 0 });
	die DBI::errstr unless $dbh;
	$dbh->disconnect;

If this succeeds fine then something in Biosql may be related to the  
problem, but otherwise not.

	-hilmar


On Jun 7, 2006, at 12:01 PM, Michael Muratet US-Huntsville wrote:

> Hilmar
>
> Pardon the top post.
>
> I tried the test below and it failed. So, I went back and redid the  
> Innodb configuration (deleted all the index files--they were empty  
> anyway, reinstalled biosql (which was empty,too) and restarted the  
> server. Now, the test below works. I went into the DBD-3.0003 and  
> did a distclean and reinstalled the package, but it fails the one  
> transaction test, too. So, it looks like the problem is in DBD, yes?
>
> We had a RAID 5 drive glitch the day before yesterday and rebuilt  
> it. That's the only thing that's changed that I know of that could  
> have caused the problem with ibxxx files.
>
> I have received a reply on the DBD list. Can you think of anything  
> else I should try from the biosql end?
>
> Thanks a million.
>
> Mike
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 07, 2006 7:52 AM
> To: Michael Muratet US-Huntsville
> Cc: Bioperl; BioSQL
> Subject: Re: [Bioperl-l] bioperl-db failing tests
>
>
> Hi Michael,
>
> Bioperl-db will open all connections with AutoCommit => 0 in the DBI
> parameter hash. The test you're stumbling over is actually there to
> test that the database  does support transactions, but apparently in
> 5.x versions MySQL no longer silently ignores the AutoCommit
> parameter if it doesn't support transactions (effectively preempting
> the test ...).
>
> Now you say that innodb shows as enabled - i.e., you can confirm that
> you changed the Mysql configuration parameter that designates the
> directory for innodb to store its files?
>
> You can confirm that transactions are supported by simple tests on
> the sql level. Open a mysql shell and do the following:
>
> 	-- BTW 'start transaction;' will (should) work too
> 	mysql> set autocommit = 0;
> 	mysql> insert into biodatabase (name) values ('__dummy__');
> 	mysql> select name from biodatabase where name = '__dummy__';
> 	mysql> rollback;
> 	mysql> select name from biodatabase where name = '__dummy__';
>
> The first SELECT query should return one and the last query should
> return zero rows if transactions are supported, and there shouldn't
> be any error.
>
> If the above succeeds (which I don't expect it to) then it looks like
> the DBD::mysql driver thinks the database doesn't support
> transactions when in reality it does. Let me know the result.
>
> 	-hilmar
>
> On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:
>
>> Greetings
>>
>> I am trying to install bioperl-db in preparation for installing a
>> biosql database. I'm running on a Dell PowerEdge with quad dual-
>> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl
>> 1.5.1.  I have installed mysql v5.0.21 from source with --with-
>> innodb set for the configuration. I installed bioperl-db from cvs.
>> I have the latest DBI and DBD:mysql installed a few weeks ago from
>> CPAN. The installation has been working well with perl otherwise,
>> for example, the Ensembl core API works OK. SHOW ENGINES indicates
>> that innodb is enabled.  I have attached a snippet from the top of
>> the output below. I searched the web and the bioperl-db list and
>> haven't found anything that appears to be relevant. I've done
>> several of these installs and they've pretty much completed without
>> a single glitch. Does anyone have any ideas how to isolate the
>> problem?
>>
>> Thanks
>>
>> Mike
>>
>> [mmuratet at HSV-PROBE bioperl-db]$ make test
>> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>> t/01dbadaptor.....ok 14/19
>> ------------- EXCEPTION  -------------
>> MSG: failed to open connection: Transactions not supported by  
>> database
>> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 255
>> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 215
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/
>> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/
>> BioSQL/BasePersistenceAdaptor.pm:1477
>> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/
>> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/
>> DB/BioSQL/BaseDriver.pm:518
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
>> STACK toplevel t/01dbadaptor.t:62
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 14:08:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 13:08:19 -0500
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>
Message-ID: <001501c68a5d$5db655a0$15327e82@pyrimidine>

Nicolaus,

Bio::DB::GenBank use NCBI's efetch mainly; I implemented epost but it's a
hack at best and only works in certain circumstances.  So you could get the
sequence data directly but the links aren't included and are only given
through NCBI's elink.  There is no way I know of to get this information via
bioperl as there isn't an interface to NCBI's elink AFAIK (Brian?).  I'm
working on a rewrite for a general NCBI eutils interface for each tool
(efetch, epost, elink, etc), but it isn't working yet and probably won't be
ready to go until the end of summer-beginning of fall.

Just so you know how complex the situation is when using accessions, you
can't use a sequence accession directly when querying elink (and most
eutils), it has to be the GI number; I believe efetch is the only one that
accepts accessions.  So you would have to run esearch first using the
accessions as a query, grab the GI from the XML, run elink with the GI, grab
the SNP cluster ID, efetch the SNP data, and parse the data to get into
Bio::ClusterIO.  Fun, huh?  You would think NCBI would try making this a
little easier...

There used to be a way to parse dbSNP data using Bio::ClusterIO but the XML
schema changed so the parser is likely broken (the tests work but the file
is from the old schema).  I think Allen Day was in charge of it.

I used the eutils test interface () to grab the SNP cluster accessions for
your sequence using elink (note that the format is XML, which one  would
have to parse out to grab the cluster ID's):

<eLinkResult>
<LinkSet>
	<DbFrom>nucleotide</DbFrom>
	<IdList>
		<Id>33875090</Id>
	</IdList>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp</LinkName>
		<Link>
			<Id>4631</Id>
		</Link>
	</LinkSetDb>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp_genegenotype</LinkName>
		<Link>
			<Id>28362589</Id>
		</Link>
		<Link>
			<Id>4635949</Id>
		</Link>

		<Link>
			<Id>28362591</Id>
		</Link>
		<Link>
			<Id>11545838</Id>
		</Link>
		<Link>
			<Id>4246814</Id>

		</Link>
		<Link>
			<Id>28670911</Id>
		</Link>
		<Link>
			<Id>4073746</Id>
		</Link>
		<Link>

			<Id>9313754</Id>
		</Link>
		<Link>
			<Id>11545840</Id>
		</Link>
		<Link>
			<Id>17077806</Id>

		</Link>
		<Link>
			<Id>28362590</Id>
		</Link>
		<Link>
			<Id>4076327</Id>
		</Link>
		<Link>

			<Id>9834</Id>
		</Link>
		<Link>
			<Id>4073745</Id>
		</Link>
		<Link>
			<Id>6879874</Id>

		</Link>
	</LinkSetDb>
</LinkSet>
</eLinkResult>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nicolaus Hepler
> Sent: Wednesday, June 07, 2006 11:26 AM
> To: Brian Osborne; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] GenBank Feature: variation
> 
> Brian,
> 
> A sample accession is BC000007.  I figured a way around it though.
> Rather than automate the whole process, I just downloaded from Batch
> Entrez a flat .gb file of all my accessions.  It's not flexible, and
> will be inconvenient when we expand the dataset, but it will provide
> me with data to work with for now.
> 
> Nicolaus
> 
> > Nicolaus,
> >
> > The short answer is no, there's no option that will omit or add a
> > particular
> > feature or annotation to the Sequence object returned by
> > Bio::DB::GenBank.
> > Can you give some example accessions?
> >
> > Brian O.
> >
> >
> > On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
> >
> >> Hello,
> >>
> >> I am having some difficulty here.  I have a list of accessions, which
> >> are the parameters for a get_Stream_by_acc() function on a
> >> Bio::DB::GenBank object.  None of the returned GenBank information
> >> for any of my accessions seems to contain variation data, no matter
> >> how I try to coax it out with unflattener and typemapper.  This data
> >> is, however, available via the web interface of NCBI Nucleotide, as
> >> an optional feature (SNP).  I was wondering if there was some option
> >> I'm missing in the initialization of the Bio::DB::GenBank object (no
> >> options currently) that will coax the database into giving me this
> >> data?  Or something else that I'm missing altogether.  The organism
> >> of interest is human, taxon:9606.
> >>
> >> Nicolaus Lance Hepler
> >> nlhepler at mail dot umd dot edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Michael.Muratet at operon.com  Wed Jun  7 12:01:29 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Wed, 7 Jun 2006 11:01:29 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>

Hilmar

Pardon the top post.

I tried the test below and it failed. So, I went back and redid the Innodb configuration (deleted all the index files--they were empty anyway, reinstalled biosql (which was empty,too) and restarted the server. Now, the test below works. I went into the DBD-3.0003 and did a distclean and reinstalled the package, but it fails the one transaction test, too. So, it looks like the problem is in DBD, yes?

We had a RAID 5 drive glitch the day before yesterday and rebuilt it. That's the only thing that's changed that I know of that could have caused the problem with ibxxx files. 

I have received a reply on the DBD list. Can you think of anything else I should try from the biosql end?

Thanks a million.

Mike

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net]
Sent: Wednesday, June 07, 2006 7:52 AM
To: Michael Muratet US-Huntsville
Cc: Bioperl; BioSQL
Subject: Re: [Bioperl-l] bioperl-db failing tests


Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 15:38:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 14:38:08 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
Message-ID: <001901c68a69$e7ece8e0$15327e82@pyrimidine>

All,

Don't know how many people use Bio::ClusterIO this module, but it looks like
Bio::ClusterIO::dbsnp is broken unless you are using older XML versions of
the dbSNP database; the schema for ASN.1 and XML format for SNP has changed:

http://www.ncbi.nlm.nih.gov/projects/SNP/

under 'Announcements'.

I actually tried parsing the dbsnp test file and a newer schema XML file to
confirm this; the new version doesn't work (returned object from
next_cluster is undef).  I'm filing a bug as a reminder.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From paul.boutros at utoronto.ca  Wed Jun  7 18:35:46 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 18:35:46 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149719746.448754c2ef4e0@webmail.utoronto.ca>

> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
Yup yup, you're right. I should have mentioned in my original message that I didn't run 
any remote-server tests, and unfortunately can't do so on this box.
Paul

Quoting David Messina <dmessina at wustl.edu>:

> To look for problems related to Heikki's "return undef" sweep, I ran  
> 'make test' on both today's version of bioperl-live and on an older  
> version I had checked out on May 12. This was done on OS X 10.4.6 and  
> perl 5.8.6.
> 
> 
> Here are the results:
> 
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
> need to be examined.
> 
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
> 
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
> between 5/12 and today, so this appears to be not 'return undef'- 
> related.
> 
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load  
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
> 5/12 and today, so this is not 'return undef'-related.
> 
> 
> 
> All the other test failures appear in both versions of bioperl-live,  
> so presumably they are not affected by the 'return undef' changes.
> 
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
> 
> Also, he saw errors in Biblio.t, Repeatmasker.t, and  
> StandAloneBlast.t that I did not.
> 
> Dave
> 
> 
> Today's bioperl-live test results:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
> 99.84% okay.
> 
> Note that this is including tests requiring a remote server.
> 
> And here's the output from a May 12 checkout of bioperl-live:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
> 99.89% okay.
> 
> 
> 
> 
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
> 
> > Hi,
> >
> > Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> > and I had a few
> > failures:
> >
> > Failed Test         Stat Wstat Total Fail  List of Failed
> > ---------------------------------------------------------------------- 
> > ---------
> > t/Annotation.t                    89    2  79 88
> > t/Biblio.t                        24    1  2
> > t/LocusLink.t                     23    1  23
> > t/PhysicalMap.t                   14    2  11-12
> > t/RepeatMasker.t                   6    3  1-2 6
> > t/StandAloneBlast.t               18    4  19-22
> > t/TaxonTree.t                     17   30  11 18-42
> > t/alignUtilities.t                 9    1  9
> > t/psm.t              255 65280    48   35  29 32-48
> > t/tutorial.t                      21   15  7-21
> 
> 


From dmessina at wustl.edu  Wed Jun  7 18:26:25 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 17:26:25 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <1149699781.448706c5e803d@webmail.utoronto.ca>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
Message-ID: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>

To look for problems related to Heikki's "return undef" sweep, I ran  
'make test' on both today's version of bioperl-live and on an older  
version I had checked out on May 12. This was done on OS X 10.4.6 and  
perl 5.8.6.


Here are the results:

Failures in today's version of bioperl-live but NOT in 5/12 version
===================================================================
- psm.t -
The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
need to be examined.

Here's the error message:
Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
t/psm........................dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
         Failed 18/48 tests, 62.50% okay


Failures in 5/12 version of bioperl-live but NOT in today's version
===================================================================
- OntologyStore.t -
Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
touched between 5/12 and today.

The error looks like a transient network problem to me, but I'm not  
sure:
-------------------- WARNING ---------------------
MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
*checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
500.  retrying...
---------------------------------------------------
[REPEATED 5 times -Dave]

t/OntologyStore..............FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay


- RepeatMasker.t -
Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
between 5/12 and today, so this appears to be not 'return undef'- 
related.

- SeqVersion.t -
The SeqVersion error was due to a failure to find and load  
Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
5/12 and today, so this is not 'return undef'-related.


All the other test failures appear in both versions of bioperl-live,  
so presumably they are not affected by the 'return undef' changes.

Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
that he didn't. Those only pop up when I run the optional remote- 
server tests, however. Perhaps Paul didn't run those and that  
accounts for the discrepancy?

Also, he saw errors in Biblio.t, Repeatmasker.t, and  
StandAloneBlast.t that I did not.

Dave


Today's bioperl-live test results:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/PhysicalMap.t                  14    2  14.29%  11-12
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/psm.t             255 65280    48   35  72.92%  29 32-48
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
99.84% okay.

Note that this is including tests requiring a remote server.

And here's the output from a May 12 checkout of bioperl-live:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/OntologyStore.t                 6    4  66.67%  3-6
t/PhysicalMap.t                  14    2  14.29%  11-12
t/RepeatMasker.t                  6    3  50.00%  1-2 6
t/SeqVersion.t      255 65280     6   10 166.67%  2-6
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
99.89% okay.


On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:

> Hi,
>
> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> and I had a few
> failures:
>
> Failed Test         Stat Wstat Total Fail  List of Failed
> ---------------------------------------------------------------------- 
> ---------
> t/Annotation.t                    89    2  79 88
> t/Biblio.t                        24    1  2
> t/LocusLink.t                     23    1  23
> t/PhysicalMap.t                   14    2  11-12
> t/RepeatMasker.t                   6    3  1-2 6
> t/StandAloneBlast.t               18    4  19-22
> t/TaxonTree.t                     17   30  11 18-42
> t/alignUtilities.t                 9    1  9
> t/psm.t              255 65280    48   35  29 32-48
> t/tutorial.t                      21   15  7-21


From cjfields at uiuc.edu  Wed Jun  7 19:38:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 18:38:10 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>

I saw a ton of activity from Jason on bioperl-guts for test files and  
modules; you may want to check your tests vs. his changes in case  
they were fixed.  I'll be running similar tests on WinXP ad Mac OS X;  
would be nice to see how my results compare to Dave's

Chris

On Jun 7, 2006, at 5:26 PM, David Messina wrote:

> To look for problems related to Heikki's "return undef" sweep, I ran
> 'make test' on both today's version of bioperl-live and on an older
> version I had checked out on May 12. This was done on OS X 10.4.6 and
> perl 5.8.6.
>
>
> Here are the results:
>
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may
> need to be examined.
>
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
>
>
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been
> touched between 5/12 and today.
>
> The error looks like a transient network problem to me, but I'm not
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
>
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
>
>
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm
> between 5/12 and today, so this appears to be not 'return undef'-
> related.
>
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between
> 5/12 and today, so this is not 'return undef'-related.
>
>
>
> All the other test failures appear in both versions of bioperl-live,
> so presumably they are not affected by the 'return undef' changes.
>
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG
> that he didn't. Those only pop up when I run the optional remote-
> server tests, however. Perhaps Paul didn't run those and that
> accounts for the discrepancy?
>
> Also, he saw errors in Biblio.t, Repeatmasker.t, and
> StandAloneBlast.t that I did not.
>
> Dave
>
>
> Today's bioperl-live test results:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.
>
> Note that this is including tests requiring a remote server.
>
> And here's the output from a May 12 checkout of bioperl-live:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,
> 99.89% okay.
>
>
>
>
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
>
>> Hi,
>>
>> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7
>> and I had a few
>> failures:
>>
>> Failed Test         Stat Wstat Total Fail  List of Failed
>> --------------------------------------------------------------------- 
>> -
>> ---------
>> t/Annotation.t                    89    2  79 88
>> t/Biblio.t                        24    1  2
>> t/LocusLink.t                     23    1  23
>> t/PhysicalMap.t                   14    2  11-12
>> t/RepeatMasker.t                   6    3  1-2 6
>> t/StandAloneBlast.t               18    4  19-22
>> t/TaxonTree.t                     17   30  11 18-42
>> t/alignUtilities.t                 9    1  9
>> t/psm.t              255 65280    48   35  29 32-48
>> t/tutorial.t                      21   15  7-21
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Wed Jun  7 20:50:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 19:50:48 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
Message-ID: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>

Thanks for letting me know, Chris.

Here's a new round of results on bioperl-live checked out moments ago:
[OS X 10.4.6, perl 5.8.6]

Failed Test   Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/DBCUTG.t                  29    5  17.24%  26 30-32
t/LocusLink.t               23    1   4.35%  23
t/PopGen.t                  89    1   1.12%  85
t/psm.t        255 65280    48   35  72.92%  29 32-48
t/tutorial.t                21   15  71.43%  7-21
121 subtests skipped.
Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,  
99.69% okay.

Fixed since earlier today
=========================
Annotation.t
PhysicalMap.t
TaxonTree.t
alignUtilities.t

New since earlier today
=======================
PopGen.t

t/PopGen.....................FAILED test 85
         Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86  
okay, 96.63%)

Unchanged
=========
DBCUTG.t
LocusLink.t
psm.t
tutorial.t

Remote-server tests were run like before. I forgot to mention last  
time that I skipped the local DB tests and I don't have bioperl-ext  
installed, so several staden-related tests were also skipped.

Dave


My results from earlier today for reference:
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.


From heikki at sanbi.ac.za  Thu Jun  8 04:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jun  8 04:52:27 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:52:27 +0200
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
Message-ID: <200606081052.27446.heikki@sanbi.ac.za>

I sort of fixed this.

At least the tests pass (I commented out two) when using the new sample XML. 
To be really usefull, the code need much more work, so I left the bug open.

http://bugzilla.open-bio.org/show_bug.cgi?id=2018


	-Heikki


On Wednesday 07 June 2006 21:38, Chris Fields wrote:
> All,
>
> Don't know how many people use Bio::ClusterIO this module, but it looks
> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
> versions of the dbSNP database; the schema for ASN.1 and XML format for SNP
> has changed:
>
> http://www.ncbi.nlm.nih.gov/projects/SNP/
>
> under 'Announcements'.
>
> I actually tried parsing the dbsnp test file and a newer schema XML file to
> confirm this; the new version doesn't work (returned object from
> next_cluster is undef).  I'm filing a bug as a reminder.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jun  8 04:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 01:55:09 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 08 Jun 2006 15:55:09 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
Message-ID: <4487BBBD.6060702@infotech.monash.edu.au>

Hi all,

I've just been further auditing the Bioperl code and noticed that
Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
can't locate an example/sample sequence file in "Lasergene" format.

 From the code it looks similar to 'raw' format but has "^^" as
a separator character.

Can anyone provide a real-life example so I can augment the 
t/lasergene.t tests?

Thanks,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jrm62 at cam.ac.uk  Thu Jun  8 07:38:40 2006
From: jrm62 at cam.ac.uk (John Mifsud)
Date: 08 Jun 2006 12:38:40 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
Message-ID: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>

Dear all,

Firstly I hope this is the right email list to write to! 

Secondly, I have a little program that parses the BLAST results i have got 
running remotely to the NCBI server and takes out all the hit sequences and 
converts them to FASTA format.

Now when using BROAD BLAST and getting results this works fine (tblastn ver 
2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
the output is different and the parsing no longer works. I was wondering if 
anyone knew of a new SearchIO module / script that is designed to blast the 
updated NCBI BLAST output?

Thanks for your time,


John


From cjfields at uiuc.edu  Thu Jun  8 08:56:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 07:56:27 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <200606081052.27446.heikki@sanbi.ac.za>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
	<200606081052.27446.heikki@sanbi.ac.za>
Message-ID: <AB8EE4BC-4774-48A6-8F26-2A8356F8E700@uiuc.edu>

Sounds good to me.  If someone wants to use this down the line, they  
might be desperate enough to provide patches; there are a lot of  
commented out tags.

Chris

On Jun 8, 2006, at 3:52 AM, Heikki Lehvaslaiho wrote:

> I sort of fixed this.
>
> At least the tests pass (I commented out two) when using the new  
> sample XML.
> To be really usefull, the code need much more work, so I left the  
> bug open.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2018
>
>
> 	-Heikki
>
>
> On Wednesday 07 June 2006 21:38, Chris Fields wrote:
>> All,
>>
>> Don't know how many people use Bio::ClusterIO this module, but it  
>> looks
>> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
>> versions of the dbSNP database; the schema for ASN.1 and XML  
>> format for SNP
>> has changed:
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/
>>
>> under 'Announcements'.
>>
>> I actually tried parsing the dbsnp test file and a newer schema  
>> XML file to
>> confirm this; the new version doesn't work (returned object from
>> next_cluster is undef).  I'm filing a bug as a reminder.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 09:03:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 14:03:05 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <44882009.1040906@mrc-dunn.cam.ac.uk>

John Mifsud wrote:
> Dear all,
> 
> Firstly I hope this is the right email list to write to! 
> 
> Secondly, I have a little program that parses the BLAST results i have got 
> running remotely to the NCBI server and takes out all the hit sequences and 
> converts them to FASTA format.
> 
> Now when using BROAD BLAST and getting results this works fine (tblastn ver 
> 2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
> the output is different and the parsing no longer works. I was wondering if 
> anyone knew of a new SearchIO module / script that is designed to blast the 
> updated NCBI BLAST output?

You'll probably need to get the latest SearchIO blast module from 
bioperl-live.
http://bioperl.org/wiki/Getting_BioPerl

If you're having difficulties with your setup, John, I can just send you 
the relevant file(s). Mail me (or Alan) privately for that.


From cjfields at uiuc.edu  Thu Jun  8 09:12:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 08:12:23 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>

I would say, based on previous responses, update to the latest CVS  
(bioperl-live).  You could also try updating  
Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you  
don't want to update the entire toolkit.  Running these with BLAST  
2.2.14 output seems to work fine.

Though this is the likely fix, if you have additional problems next  
time please make sure to include more information.  We have no idea  
what OS, bioperl version, perl version you are running.  And a code  
snippet and bug description would be nice (i.e. "it doesn't work" -  
not a good description; "the script freezes" is a little more  
informative).

Chris

On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:

> Dear all,
>
> Firstly I hope this is the right email list to write to!
>
> Secondly, I have a little program that parses the BLAST results i  
> have got
> running remotely to the NCBI server and takes out all the hit  
> sequences and
> converts them to FASTA format.
>
> Now when using BROAD BLAST and getting results this works fine  
> (tblastn ver
> 2.2.9). However, NCBI have just updated their BLAST server (to  
> 2.2.14) and
> the output is different and the parsing no longer works. I was  
> wondering if
> anyone knew of a new SearchIO module / script that is designed to  
> blast the
> updated NCBI BLAST output?
>
> Thanks for your time,
>
>
> John
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 12:03:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 17:03:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>	<200605311255.19166.heikki@sanbi.ac.za>
	<447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <44884A49.6060805@mrc-dunn.cam.ac.uk>

Sendu Bala wrote:
> Heikki Lehvaslaiho wrote:
>> In my opinion the sooner the bugs get exposed the better. It is much more 
>> likely that there is a well hidden bug caused by assigning accidentally undef 
>> into an one element array that someone intentionally writing code that 
>> expects that behaviour!
>>
>> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
>> and could not see any differences in the test output. 
>>
>> Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible 
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Looks like Heikki's work went well. If there is any further interest in 
getting rid of all the remaining undef returns, this also need to be fixed:

sub x {
   # return (...) on success
   # do nothing on failure
}

Needs to be changed to:

sub x {
   # return (...) on success
   return;
}


From roy at colibase.bham.ac.uk  Thu Jun  8 12:31:10 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 08 Jun 2006 17:31:10 +0100
Subject: [Bioperl-l] Truncate sequence with features
Message-ID: <448850CE.1040105@colibase.bham.ac.uk>

Hi all.

I've been playing around with a subroutine to truncate a sequence and 
adjust the coordinates of any features that overlap the specified 
region- something that according to the comments in 
Bio::Location::Simple has been abortively worked on in the past.

I've submitted the subroutine as an enhancement in Bugzilla. It's a bit 
hacky but works for what I needed it for. However I'm a bit unsure on 
the best way to deal with split locations where one of the sublocations 
is entirely outside the truncated region. My current method results in 
locations like:
join(1..500, >1000..>1000)

which is quite ugly and possibly invalid, but kind of makes sense. Does 
anyone know what would be the correct behaviour for this situation?

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From cjfields at uiuc.edu  Thu Jun  8 14:47:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 13:47:19 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <000701c68b2b$f8cc21e0$15327e82@pyrimidine>

Thomas;

That error isn't related to BioPerl.  This is the standard HTML response
NCBI gives as a web page; the error imbedded in the HTML you received as a
warning has:

ERROR: Cannot accept request, error code: 1Number of unfinished requests
(151) from your IP address reached the HARD limit 150.

So you may have too many requests in the BLAST queue.  

Chris

> -----Original Message-----
> From: Thomas J Keller [mailto:kellert at ohsu.edu]
> Sent: Thursday, June 08, 2006 1:39 PM
> To: Chris Fields
> Cc: John Mifsud; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] NCBI BLAST results parsing
> 
> I'm having the same problem bp_remote_blast.pl worked yesterday,
> today it's busted. Incidently, I got the following email from NCBI
> this morning:
> The new version of the NCBI SOAP E-Utilities, which includes recent
> changes to the NCBI sequence databases schema, was released today.
> 
> Thank you.
> NCBI E-Utilities Team
> 
> I wouldn't have thought that that would affect
> Bio::Tools::RemoteBlast but something has changed.
> 
> Here's a snippet of the output after $ bp_remote_blast.pl -p blastn -
> d nr -e 1e-3 -i nm_008540.fasta
> 
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
> Content-Length: 267
> Content-Type: application/x-www-form-urlencoded
> 
> DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+%
> 25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C
> +mRNA.%
> 0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm
> ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn
> 
> 
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content-
> Type" content="text/html; charset=utf-8"/><link rel="stylesheet"
> href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body
> bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"
> onload="StartBlastCgi();"><!--  the header   --> <table border="0"
> width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"
> colspan=4>    <map name="head_img_map">    <area shape="rect"
> coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home
> page">       <area shape="rect" coords="301,0,600,40" href="http://
> www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>
> <IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"
> NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"
> HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr
> align="center">    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI=
> yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN
> KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes
> &HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT
> O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td
> width="150" bgcolor="#003366">        <a href="http://
> www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI
> =yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI
> NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye
> s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</
> FONT></a></td></tr></table><br><!--  the contents   --> <form
> action="Blast.cgi" enctype="application/x-www-form-urlencoded"
> method="POST"><script src="blastcgi.js"></script><SCRIPT
> LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =
> 'html/head_formating.gif';// --></SCRIPT><br><hr><font
> color="red">ERROR: Cannot accept request, error code: 1Number of
> unfinished requests (151)  from your IP address reached the HARD
> limit 150.</font><hr></form>   </body></html>
> ---------------------------------------------------
> 
> On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:
> 
> > I would say, based on previous responses, update to the latest CVS
> > (bioperl-live).  You could also try updating
> > Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> > don't want to update the entire toolkit.  Running these with BLAST
> > 2.2.14 output seems to work fine.
> >
> > Though this is the likely fix, if you have additional problems next
> > time please make sure to include more information.  We have no idea
> > what OS, bioperl version, perl version you are running.  And a code
> > snippet and bug description would be nice (i.e. "it doesn't work" -
> > not a good description; "the script freezes" is a little more
> > informative).
> >
> > Chris
> >
> > On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
> >
> >> Dear all,
> >>
> >> Firstly I hope this is the right email list to write to!
> >>
> >> Secondly, I have a little program that parses the BLAST results i
> >> have got
> >> running remotely to the NCBI server and takes out all the hit
> >> sequences and
> >> converts them to FASTA format.
> >>
> >> Now when using BROAD BLAST and getting results this works fine
> >> (tblastn ver
> >> 2.2.9). However, NCBI have just updated their BLAST server (to
> >> 2.2.14) and
> >> the output is different and the parsing no longer works. I was
> >> wondering if
> >> anyone knew of a new SearchIO module / script that is designed to
> >> blast the
> >> updated NCBI BLAST output?
> >>
> >> Thanks for your time,
> >>
> >>
> >> John
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From kellert at ohsu.edu  Thu Jun  8 14:39:04 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 11:39:04 -0700
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
Message-ID: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>

I'm having the same problem bp_remote_blast.pl worked yesterday,  
today it's busted. Incidently, I got the following email from NCBI  
this morning:
The new version of the NCBI SOAP E-Utilities, which includes recent
changes to the NCBI sequence databases schema, was released today.

Thank you.
NCBI E-Utilities Team

I wouldn't have thought that that would affect  
Bio::Tools::RemoteBlast but something has changed.

Here's a snippet of the output after $ bp_remote_blast.pl -p blastn - 
d nr -e 1e-3 -i nm_008540.fasta

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
Content-Length: 267
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+% 
25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C 
+mRNA.% 
0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm 
ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn


---------------------------------------------------

-------------------- WARNING ---------------------
MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content- 
Type" content="text/html; charset=utf-8"/><link rel="stylesheet"  
href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body  
bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"  
onload="StartBlastCgi();"><!--  the header   --> <table border="0"  
width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"  
colspan=4>    <map name="head_img_map">    <area shape="rect"  
coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home  
page">       <area shape="rect" coords="301,0,600,40" href="http:// 
www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>     
<IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"  
NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"  
HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr  
align="center">    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI= 
yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN 
KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes 
&HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT 
O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td  
width="150" bgcolor="#003366">        <a href="http:// 
www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI 
=yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI 
NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye 
s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</ 
FONT></a></td></tr></table><br><!--  the contents   --> <form  
action="Blast.cgi" enctype="application/x-www-form-urlencoded"  
method="POST"><script src="blastcgi.js"></script><SCRIPT  
LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =  
'html/head_formating.gif';// --></SCRIPT><br><hr><font  
color="red">ERROR: Cannot accept request, error code: 1Number of  
unfinished requests (151)  from your IP address reached the HARD  
limit 150.</font><hr></form>   </body></html>
---------------------------------------------------

On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:

> I would say, based on previous responses, update to the latest CVS
> (bioperl-live).  You could also try updating
> Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> don't want to update the entire toolkit.  Running these with BLAST
> 2.2.14 output seems to work fine.
>
> Though this is the likely fix, if you have additional problems next
> time please make sure to include more information.  We have no idea
> what OS, bioperl version, perl version you are running.  And a code
> snippet and bug description would be nice (i.e. "it doesn't work" -
> not a good description; "the script freezes" is a little more
> informative).
>
> Chris
>
> On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
>
>> Dear all,
>>
>> Firstly I hope this is the right email list to write to!
>>
>> Secondly, I have a little program that parses the BLAST results i
>> have got
>> running remotely to the NCBI server and takes out all the hit
>> sequences and
>> converts them to FASTA format.
>>
>> Now when using BROAD BLAST and getting results this works fine
>> (tblastn ver
>> 2.2.9). However, NCBI have just updated their BLAST server (to
>> 2.2.14) and
>> the output is different and the parsing no longer works. I was
>> wondering if
>> anyone knew of a new SearchIO module / script that is designed to
>> blast the
>> updated NCBI BLAST output?
>>
>> Thanks for your time,
>>
>>
>> John
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Jun  8 15:28:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 14:28:18 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <200606081049.40232.heikki@sanbi.ac.za>
Message-ID: <000001c68b31$b5320390$15327e82@pyrimidine>

Here are tests run from WinXP, ActivePerl 5.8.817; almost everything passes.
Not sure what's going on with StandAloneBlast or the protgraph tests, so
I'll check into it.  The psm.t tests that failed are the same as the ones
mentioned previously on other systems.
As an aside, I hate that using '-w' flag with ActivePerl gives a thousand
useless 'subroutines redefined' warnings; only way I found to turn it off is
to not use the flag.  Anyway, I pulled out the relevant chunks of code here;
I'll submit the Mac results separately to not confuse the two.  

...
t/StandAloneBlast............FAILED tests 19-22
	Failed 4/18 tests, 77.78% okay
...
t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
36-37, 45, 48-56, 59-60, 65-66
	Failed 22/66 tests, 66.67% okay
...
t/psm........................Illegal division by zero at t/psm.t line 147,
<GEN1> line 36.
dubious
	Test returned status 9 (wstat 2304, 0x900)
DIED. FAILED tests 29, 32-48
Failed 18/48 tests, 62.50% okay
...
Failed Test         Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/StandAloneBlast.t               18    4  22.22%  19-22
t/protgraph.t                     66   22  33.33%  11 13 20-21 26 33 36-37
45
                                                   48-56 59-60 65-66
t/psm.t                9  2304    48   35  72.92%  29 32-48
39 subtests skipped.
Failed 3/233 test scripts, 98.71% okay. 36/11100 subtests failed, 99.68%
okay.
NMAKE :  U1077: 
Stop.


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Thursday, June 08, 2006 3:50 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; BioPerl Mailing List; Chris Fields
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall with
> "returnundef"
> 
> Looks like we survived the sweeping change - and fixed a number of
> existing
> bugs in the process. Thanks for everyone who helped!
> 
> 	-Heikki
> 
> On Thursday 08 June 2006 02:50, David Messina wrote:
> > Thanks for letting me know, Chris.
> >
> > Here's a new round of results on bioperl-live checked out moments ago:
> > [OS X 10.4.6, perl 5.8.6]
> >
> > Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> > ------------------------------------------------------------------------
> > -------
> > t/DBCUTG.t                  29    5  17.24%  26 30-32
> > t/LocusLink.t               23    1   4.35%  23
> > t/PopGen.t                  89    1   1.12%  85
> > t/psm.t        255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                21   15  71.43%  7-21
> > 121 subtests skipped.
> > Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> > 99.69% okay.
> >
> > Fixed since earlier today
> > =========================
> > Annotation.t
> > PhysicalMap.t
> > TaxonTree.t
> > alignUtilities.t
> >
> > New since earlier today
> > =======================
> > PopGen.t
> >
> > t/PopGen.....................FAILED test 85
> >          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> > okay, 96.63%)
> >
> > Unchanged
> > =========
> > DBCUTG.t
> > LocusLink.t
> > psm.t
> > tutorial.t
> >
> > Remote-server tests were run like before. I forgot to mention last
> > time that I skipped the local DB tests and I don't have bioperl-ext
> > installed, so several staden-related tests were also skipped.
> >
> > Dave
> >
> > My results from earlier today for reference:
> > > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > > ----------------------------------------------------------------------
> > > --
> > > -------
> > > t/Annotation.t                   89    2   2.25%  79 88
> > > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > > t/LocusLink.t                    23    1   4.35%  23
> > > t/PhysicalMap.t                  14    2  14.29%  11-12
> > > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > > t/alignUtilities.t                9    1  11.11%  9
> > > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > > t/tutorial.t                     21   15  71.43%  7-21
> > > 114 subtests skipped.
> > > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > > 99.84% okay.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fernan at iib.unsam.edu.ar  Thu Jun  8 13:02:27 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Thu, 8 Jun 2006 14:02:27 -0300
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <4487BBBD.6060702@infotech.monash.edu.au>
References: <4487BBBD.6060702@infotech.monash.edu.au>
Message-ID: <20060608170227.GF3334@iib.unsam.edu.ar>

+----[ Torsten Seemann <torsten.seemann at infotech.monash.edu.au> (08.Jun.2006 13:47):
|
| Hi all,
| 
| I've just been further auditing the Bioperl code and noticed that
| Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
| can't locate an example/sample sequence file in "Lasergene" format.
| 
|  From the code it looks similar to 'raw' format but has "^^" as
| a separator character.
| 
| Can anyone provide a real-life example so I can augment the 
| t/lasergene.t tests?
|
+----]

See the attached file. 

The format seems to be plain text, beginning with a free
text description that goes from the beginning of the file
until the "^^" delimiter, and after that the sequence.

Fernan
-------------- next part --------------
Created: Jueves, 08 de Junio de 2006 01:56 p.m.

This is a test sequence created with EditSeq (Lasergene's DNAStar)

^^
ATCGATCGATCG

From freimuth at pathology.wustl.edu  Thu Jun  8 13:12:36 2006
From: freimuth at pathology.wustl.edu (Freimuth, Robert)
Date: Thu, 8 Jun 2006 12:12:36 -0500
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
Message-ID: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>

Hi,

I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
of hits from blast, then get some information about the tiled result.  I
thought I'd use the num_unaligned_query and num_unaligned_hit methods to
get the number of unaligned bases in the tiled result, then subtract
that from the length of the query/subject sequence to get the number of
aligned bases in the region spanned by the hit(s).  My code is below,
followed by the error message.


while( my $result_obj = $blast_obj->next_result() )
{
    while( my $hit_obj = $result_obj->next_hit() )
    {
        my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
=> $hit_obj->name() );
        $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
this number of bp

        while( my $hsp_obj = $hit_obj->next_hsp() )
        {
            # add all HSPs to a GenericHit object so they can be tiled
together
            $generic_hit_obj->add_hsp( $hsp_obj );
        }

        my $num_unaligned_query =
$generic_hit_obj->num_unaligned_query();
        my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();


------------- EXCEPTION  -------------
MSG: Must have defined query_len
STACK Bio::Search::Hit::GenericHit::logical_length
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
STACK Bio::Search::Hit::GenericHit::num_unaligned_query
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
STACK toplevel blast_needle_timetrials_1.pl:94
 
--------------------------------------


I looked through the docs to try to find an explanation or some mention
of how to set query_len, but I didn't find anything.  Could someone
please point out what I'm doing wrong?  Additionally, if I'm making this
harder than it needs to be, please give me a gentle whack with the clue
stick.

Thanks,
Bob


From osborne1 at optonline.net  Thu Jun  8 15:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From osborne1 at optonline.net  Thu Jun  8 15:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From jason at bioperl.org  Thu Jun  8 16:15:47 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 8 Jun 2006 16:15:47 -0400
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <84AC010A-25E6-48C7-A723-CE4688ECA926@bioperl.org>

why are you trying to create new Hit objects?
  $hit_obj is-A GenericHit object...


-jason
On Jun 8, 2006, at 1:12 PM, Freimuth, Robert wrote:

> Hi,
>
> I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
> of hits from blast, then get some information about the tiled  
> result.  I
> thought I'd use the num_unaligned_query and num_unaligned_hit  
> methods to
> get the number of unaligned bases in the tiled result, then subtract
> that from the length of the query/subject sequence to get the  
> number of
> aligned bases in the region spanned by the hit(s).  My code is below,
> followed by the error message.
>
>
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
>
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
>
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
> STACK Bio::Search::Hit::GenericHit::num_unaligned_query
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
> STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
> STACK toplevel blast_needle_timetrials_1.pl:94
>
> --------------------------------------
>
>
> I looked through the docs to try to find an explanation or some  
> mention
> of how to set query_len, but I didn't find anything.  Could someone
> please point out what I'm doing wrong?  Additionally, if I'm making  
> this
> harder than it needs to be, please give me a gentle whack with the  
> clue
> stick.
>
> Thanks,
> Bob
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 18:36:00 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 09 Jun 2006 08:36:00 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <20060608170227.GF3334@iib.unsam.edu.ar>
References: <4487BBBD.6060702@infotech.monash.edu.au>
	<20060608170227.GF3334@iib.unsam.edu.ar>
Message-ID: <4488A650.2050803@infotech.monash.edu.au>

> I've just been further auditing the Bioperl code and noticed that
> Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
> can't locate an example/sample sequence file in "Lasergene" format.

Thanks to Fernan, Todd and Senthil who sent me example Lasergene files.
Those will be enough examples to write some tests.

--Torsten


From kellert at ohsu.edu  Thu Jun  8 20:29:10 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 17:29:10 -0700
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>

Greetings,
Is fink still a reasonable way to install and maintain bioperl?  
(There's been some emails about instability.) How 'bout upgrades: the  
way I have fink installed it's path is first when perl reads @INC. So  
if I put a newer Bio::something in /usr/local/whereever it won't be  
seen if an older module is in the fink path.  Can I upgrade in the  
fink "space" without messing up fink's database? Other options?

Thanks,
Tom K


Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From hlapp at gmx.net  Thu Jun  8 21:19:28 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 8 Jun 2006 21:19:28 -0400
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
	<1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
Message-ID: <060FC8CE-FD89-436E-B79C-135BB4F324CD@gmx.net>

Why don't you remove the fink bioperl package if you want to install  
a newer version locally?

BTW unless you use a custom-compiled perl your packages will end up  
in /Library/Perl/5.8.6/ (or /System/Library/Perl/5.8.6/), not /usr/ 
local, when you issue 'make install'.

	-hilmar

On Jun 8, 2006, at 8:29 PM, Thomas J Keller wrote:

> Greetings,
> Is fink still a reasonable way to install and maintain bioperl?
> (There's been some emails about instability.) How 'bout upgrades: the
> way I have fink installed it's path is first when perl reads @INC. So
> if I put a newer Bio::something in /usr/local/whereever it won't be
> seen if an older module is in the fink path.  Can I upgrade in the
> fink "space" without messing up fink's database? Other options?
>
> Thanks,
> Tom K
>
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Jun  8 22:30:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 21:30:20 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <C0ADF5CF.8C8F%osborne1@optonline.net>
Message-ID: <000c01c68b6c$a8184710$15327e82@pyrimidine>

Yes; using ActiveState's PPM:

ppm> query CLone
Querying target 1 (ActivePerl 5.8.7.815)
  1. Clone [0.20] recursively copy Perl datatypes
ppm>

v. 0.20 is the latest in CPAN.

I can try some additional tests with the relevant modules to see what the
problem is.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Thursday, June 08, 2006 2:42 PM
> To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; bioperl-l
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> with"returnundef"
> 
> Chris,
> 
> Odd. protgraph.t passes all of its tests on my computer. Do you have the
> Clone module installed?
> 
> Brian O.
> 
> 
> On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> > 36-37, 45, 48-56, 59-60, 65-66
> > Failed 22/66 tests, 66.67% okay
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  9 03:35:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 09:35:12 +0200
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <448850CE.1040105@colibase.bham.ac.uk>
References: <448850CE.1040105@colibase.bham.ac.uk>
Message-ID: <200606090935.12758.heikki@sanbi.ac.za>

Roy,

The definitive document describing the locations is the feature table 
definition:

http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#3.5

but you probably know that already.


Two questions come to mind:

1. Can you parse your joint location using bioperl without errors?

2. Is there a practical advantage in including a location which has no 
relevance to the sequence in hand?

I notice that the /partial qualifier is deprecated and the docs suggest using 
</> signs to indicate that the sequence is partial, so I guess what you are 
doing is  correct.

	-Heikki

On Thursday 08 June 2006 18:31, Roy Chaudhuri wrote:
> Hi all.
>
> I've been playing around with a subroutine to truncate a sequence and
> adjust the coordinates of any features that overlap the specified
> region- something that according to the comments in
> Bio::Location::Simple has been abortively worked on in the past.
>
> I've submitted the subroutine as an enhancement in Bugzilla. It's a bit
> hacky but works for what I needed it for. However I'm a bit unsure on
> the best way to deal with split locations where one of the sublocations
> is entirely outside the truncated region. My current method results in
> locations like:
> join(1..500, >1000..>1000)
>
> which is quite ugly and possibly invalid, but kind of makes sense. Does
> anyone know what would be the correct behaviour for this situation?
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
>
> http://xbase.bham.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Fri Jun  9 04:06:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 10:06:30 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <000c01c68b6c$a8184710$15327e82@pyrimidine>
References: <000c01c68b6c$a8184710$15327e82@pyrimidine>
Message-ID: <200606091006.30893.heikki@sanbi.ac.za>

I am using:
   This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
and I have Clone installed, but more than half the tests fail.

Something is badly wrong.


	-Heikki
bala ~/src/bioperl/core> perl -w t/protgraph.t
1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Failed test 10 in t/protgraph.t at line 85
not ok 11
# Test 11 got: '5' (t/protgraph.t at line 86)
#    Expected: '13'
not ok 12
# Failed test 12 in t/protgraph.t at line 94
not ok 13
# Test 13 got: '5' (t/protgraph.t at line 95)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t/protgraph.t at line 113)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t/protgraph.t at line 114)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t/protgraph.t at line 122)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t/protgraph.t at line 150)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t/protgraph.t at line 158)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t/protgraph.t at line 163)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t/protgraph.t at line 187
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t/protgraph.t at line 212)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t/protgraph.t at line 228)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t/protgraph.t at line 229)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t/protgraph.t at line 233)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t/protgraph.t at line 234)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t/protgraph.t at line 248)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t/protgraph.t at line 249)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t/protgraph.t at line 253)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t/protgraph.t at line 254)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t/protgraph.t at line 263)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t/protgraph.t at line 264)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t/protgraph.t at line 275)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t/protgraph.t at line 276)
#    Expected: '5'


On Friday 09 June 2006 04:30, Chris Fields wrote:
> Yes; using ActiveState's PPM:
>
> ppm> query CLone
> Querying target 1 (ActivePerl 5.8.7.815)
>   1. Clone [0.20] recursively copy Perl datatypes
> ppm>
>
> v. 0.20 is the latest in CPAN.
>
> I can try some additional tests with the relevant modules to see what the
> problem is.
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > Sent: Thursday, June 08, 2006 2:42 PM
> > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > with"returnundef"
> >
> > Chris,
> >
> > Odd. protgraph.t passes all of its tests on my computer. Do you have the
> > Clone module installed?
> >
> > Brian O.
> >
> > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > Failed 22/66 tests, 66.67% okay
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 04:08:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 09:08:18 +0100
Subject: [Bioperl-l] undef query_len error
	with	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <44892C72.2040605@mrc-dunn.cam.ac.uk>

Freimuth, Robert wrote:
> Hi,
> 
> I'm trying to use the Bio::Search::Hit::GenericHit
[snip]
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
> 
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
> 
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
> 
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
[snip]
> I looked through the docs to try to find an explanation or some mention
> of how to set query_len, but I didn't find anything.

As Jason asked, why are you essentially recreating the hit object?
The problem you are seeing is that the query length is normally set via 
SearchIO stream via ResultI when it internally creates a new hit object.
When you created your own hit object you didn't supply -query_len as an 
option to new(), nor did you later use the query_length() method to set it.

If you really do need your $generic_hit_obj (instead of just using 
$hit_obj), do $generic_hit_obj->query_length($hit_obj->query_length); 
(Or if you know the length of your query sequence, supply that directly.)


From zhangchnxp at gmail.com  Fri Jun  9 05:05:36 2006
From: zhangchnxp at gmail.com (Zhang chnxp)
Date: Fri, 9 Jun 2006 17:05:36 +0800
Subject: [Bioperl-l] Are there any modules handling the HLA Typing (Sequence
	Based Typing) ?
Message-ID: <4d1768a60606090205m6e360413paf172fa4e731ef2e@mail.gmail.com>

Hi there,
  I have some .abi trace files from an ABI3100 Genetic Analyzer. Are
there any packages handling the typing work of HLA-A, -B, -C, -DRB1,
etc.? Or are there any free softwares solving the ambiguity through
the SBT?


From cain at cshl.edu  Wed Jun  7 19:02:43 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 07 Jun 2006 19:02:43 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"return	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149721363.12513.96.camel@localhost.localdomain>

On Wed, 2006-06-07 at 17:26 -0500, David Messina wrote:
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
That is a problem with the cvs server at SourceForge (where the Sequence
Ontology is hosted).  I changed the module that tries to get that file
(I don't remember off hand what it was).  

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060607/eca6cf35/attachment-0002.bin>

From oldham at ucla.edu  Thu Jun  8 22:07:34 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Thu, 8 Jun 2006 19:07:34 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large file
Message-ID: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>

Dear all,

I am a total Bioperl newbie struggling to accomplish a conceptually simple
task.  I have a single large fasta file containing about 200,000 probe
sequences (from an Affymetrix microarray), each of which looks like this:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC

What I would like to do is extract from this file a subset of ~130,800
probes (both the header and the sequence) and output this subset into a new
fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
("1138_at" is the probe set ID in the header listed above); I have these
8,175 IDs listed in a separate file.  I *think* that I managed to create an
index of all 200,000 probes in the original fasta file using the following
script:

#!/usr/bin/perl -w

 # script 1: create the index

 use Bio::Index::Fasta;
 use strict;
 my $Index_File_Name = shift;
 my $inx = Bio::Index::Fasta->new(
     -filename => $Index_File_Name,
     -write_flag => 1);
 $inx->make_index(@ARGV);

I'm not sure if this is the most sensible approach, and even if it is, I'm
not sure what to do next.  Any help would be greatly appreciated!

Many thanks,
Mike O.


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006


From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 10:52:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 15:52:59 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <44898B4B.8080901@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes
[snip]
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
[snip]
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!

I'd say you're on the right lines. Next, you should continue reading the 
  rest of the synopsis and description in the docs for Bio::Index::Fasta.

Perhaps it's not clear, but you don't need to say 
$inx->make_index(@ARGV); if you've already provided -file to new() and 
are only dealing with one file. You also can't supply -file to new() if 
you want to change the id_parser (which you do, since you need to tell 
it how to detect your probe set ID).

Having indexed your file you can then output the desired sequences, just 
like the foreach loop suggested in the synopsis. (You could have that in 
the same script.)


One thing I'm not clear on is why it needs -write_flag => 1. Why can't 
it index a read-only database? Even when you set -write_flag allowing it 
to work, it doesn't write anything...


From simon.andrews at bbsrc.ac.uk  Fri Jun  9 11:01:05 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 9 Jun 2006 16:01:05 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>

 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Michael Oldham
> Sent: 09 June 2006 03:08
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a 
> single large file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a 
> conceptually simple task.  I have a single large fasta file 
> containing about 200,000 probe sequences (from an Affymetrix 
> microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC

Unfortunately that's not Fasta format (which only has a single header
line starting with a '>'.  I'd imagine that most programs which deal
with fasta which read that entry would see it as two sequences, the
first of which is empty.


> What I would like to do is extract from this file a subset of 
> ~130,800 probes (both the header and the sequence) and output 
> this subset into a new fasta file.  These 130,800 probes 
> correspond to 8,175 probe set IDs ("1138_at" is the probe set 
> ID in the header listed above)

If you're only having to do this once then it should be fairly quick to
knock up a one off script to do this.  Since you've only got 8000ish
probeset ids then you can probably just read those into a hash to start
with then parse through your big sequence file with something like;


#!perl
use warnings;
use strict;

my %probe_ids;

# Add real code here to populate your hash
$probe_ids{1138_at} = 1;
##########################################


open (IN,'your_affy_file.txt') or die "Can't read affy file: $!";

open (OUT,'>','probe_list.txt') or die "Can't write output: $!";

while (<IN>) {

  if (/^>probe/) {
    # This assumes there are always 3 lines per probe entry
    if (exists $probe_ids{(split(/:/))[2]}) {
      print OUT;
      print OUT scalar <IN>;
      print OUT scalar <IN>;
    }
  }
}


From MEC at stowers-institute.org  Fri Jun  9 10:58:22 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 9 Jun 2006 09:58:22 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work: 

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a 
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks 
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this 
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I 
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed 
>to create an
>index of all 200,000 probes in the original fasta file using 
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even 
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From senthil at cdfd.org.in  Fri Jun  9 18:21:11 2006
From: senthil at cdfd.org.in (M Senthil Kumar)
Date: Fri, 9 Jun 2006 15:21:11 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>


On Fri, 9 Jun 2006, simon andrews (BI) wrote:
|
|
|> -----Original Message-----
|> From: bioperl-l-bounces at lists.open-bio.org
|> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
|> Michael Oldham
|> Sent: 09 June 2006 03:08
|> To: bioperl-l at lists.open-bio.org
|> Subject: [Bioperl-l] Output a subset of FASTA data from a
|> single large file
|>
|> Dear all,
|>
|> I am a total Bioperl newbie struggling to accomplish a
|> conceptually simple task.  I have a single large fasta file
|> containing about 200,000 probe sequences (from an Affymetrix
|> microarray), each of which looks like this:
|>
|> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
|> >Antisense;
|> TGGCTCCTGCTGAGGTCCCCTTTCC
|
|Unfortunately that's not Fasta format (which only has a single header
|line starting with a '>'.  I'd imagine that most programs which deal
|with fasta which read that entry would see it as two sequences, the
|first of which is empty.
|

[snipped]

hi,

I think the file is in fasta format and probably you might have seen it
differently because of your mail transport agent.

Senthil


From cjfields at uiuc.edu  Fri Jun  9 13:59:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:18 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>
Message-ID: <002b01c68bee$6e3237e0$15327e82@pyrimidine>

No; I saw the same thing here.  It's not FASTA in the traditional sense:

http://www.bioperl.org/wiki/FASTA_sequence_format

though he did get it to build a database successfully.  Well, 'success' in
the sense that no errors were thrown.  I've learned the absence of error
messages does not necessarily mean that everything went as planned; it
depends on how much error handling has been added to the module by the
submitting author.  

It's possible that the second annotation line was ignored completely.  I
suppose it's also possible that two sequences are entered into the database,
an empty sequence for the first '>' line and the full sequence for the
second.  It's all dependent on how the parser handles this.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of M Senthil Kumar
> Sent: Friday, June 09, 2006 5:21 PM
> To: simon andrews (BI)
> Cc: bioperl-l at lists.open-bio.org; Michael Oldham
> Subject: Re: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> 
> 
> On Fri, 9 Jun 2006, simon andrews (BI) wrote:
> |
> |
> |> -----Original Message-----
> |> From: bioperl-l-bounces at lists.open-bio.org
> |> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> |> Michael Oldham
> |> Sent: 09 June 2006 03:08
> |> To: bioperl-l at lists.open-bio.org
> |> Subject: [Bioperl-l] Output a subset of FASTA data from a
> |> single large file
> |>
> |> Dear all,
> |>
> |> I am a total Bioperl newbie struggling to accomplish a
> |> conceptually simple task.  I have a single large fasta file
> |> containing about 200,000 probe sequences (from an Affymetrix
> |> microarray), each of which looks like this:
> |>
> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> |> >Antisense;
> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> |
> |Unfortunately that's not Fasta format (which only has a single header
> |line starting with a '>'.  I'd imagine that most programs which deal
> |with fasta which read that entry would see it as two sequences, the
> |first of which is empty.
> |
> 
> [snipped]
> 
> hi,
> 
> I think the file is in fasta format and probably you might have seen it
> differently because of your mail transport agent.
> 
> Senthil
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 13:59:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:31 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606091006.30893.heikki@sanbi.ac.za>
Message-ID: <002c01c68bee$76219ef0$15327e82@pyrimidine>

I ran tests this morning on protgraph.t using bioperl-live, Mac OS X (Intel)
running perl 5.8.6 and all tests passed, but I haven't updated from CVS
since June 7th.  The test results are almost exactly alike; most failed
tests are from unexpected results (with exactly the same results for both
OS's).  A few look more serious: test 45 failed on both and tests 10 and 12
failed on linux (the only noticeable difference between the two) 
...

ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
...

The line numbers seem to also be off by one (linux tests seem to have one
extra line); not sure if that means anything.

Here's the full WinXP protgraph.t results:

1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t\protgraph.t at line 112)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t\protgraph.t at line 113)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t\protgraph.t at line 121)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t\protgraph.t at line 149)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t\protgraph.t at line 157)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t\protgraph.t at line 162)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t\protgraph.t at line 186
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t\protgraph.t at line 211)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t\protgraph.t at line 227)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t\protgraph.t at line 228)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t\protgraph.t at line 232)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t\protgraph.t at line 233)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t\protgraph.t at line 247)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t\protgraph.t at line 248)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t\protgraph.t at line 252)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t\protgraph.t at line 253)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t\protgraph.t at line 262)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t\protgraph.t at line 263)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t\protgraph.t at line 274)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t\protgraph.t at line 275)
#    Expected: '5'  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Friday, June 09, 2006 3:07 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields; 'Brian Osborne'
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> I am using:
>    This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
> and I have Clone installed, but more than half the tests fail.
> 
> Something is badly wrong.
> 
> 
> 	-Heikki
> bala ~/src/bioperl/core> perl -w t/protgraph.t
> 1..66
> ok 1
> ok 2
> ok 3
> ok 4
> ok 5
> ok 6
> ok 7
> ok 8
> ok 9
> not ok 10
> # Failed test 10 in t/protgraph.t at line 85
> not ok 11
> # Test 11 got: '5' (t/protgraph.t at line 86)
> #    Expected: '13'
> not ok 12
> # Failed test 12 in t/protgraph.t at line 94
> not ok 13
> # Test 13 got: '5' (t/protgraph.t at line 95)
> #    Expected: '13'
> ok 14
> ok 15
> ok 16
> ok 17
> ok 18
> ok 19
> not ok 20
> # Test 20 got: '0.013' (t/protgraph.t at line 113)
> #    Expected: '0.027'
> .not ok 21
> # Test 21 got: '1' (t/protgraph.t at line 114)
> #    Expected: ''
> ..ok 22
> .ok 23
> ok 24
> ..ok 25
> .not ok 26
> # Test 26 got: '1' (t/protgraph.t at line 122)
> #    Expected: '5'
> ok 27
> ok 28
> ok 29
> ok 30
> ok 31
> ok 32
> not ok 33
> # Test 33 got: '139' (t/protgraph.t at line 150)
> #    Expected: '71'
> ok 34
> ok 35
> not ok 36
> # Test 36 got: '126' (t/protgraph.t at line 158)
> #    Expected: '58'
> .not ok 37
> # Test 37 got: '1' (t/protgraph.t at line 163)
> #    Expected: '15'
> ok 38
> ok 39
> ok 40
> ok 41
> ok 42
> ok 43
> ok 44
> not ok 45
> # Failed test 45 in t/protgraph.t at line 187
> ok 46
> ok 47
> not ok 48
> # Test 48 got: '75' (t/protgraph.t at line 212)
> #    Expected: '72'
> not ok 49
> # Test 49 got: '343' (t/protgraph.t at line 228)
> #    Expected: '72'
> not ok 50
> # Test 50 got: '368' (t/protgraph.t at line 229)
> #    Expected: '74'
> not ok 51
> # Test 51 got: '344' (t/protgraph.t at line 233)
> #    Expected: '73'
> not ok 52
> # Test 52 got: '368' (t/protgraph.t at line 234)
> #    Expected: '74'
> not ok 53
> # Test 53 got: '432' (t/protgraph.t at line 248)
> #    Expected: '72'
> not ok 54
> # Test 54 got: '461' (t/protgraph.t at line 249)
> #    Expected: '74'
> not ok 55
> # Test 55 got: '434' (t/protgraph.t at line 253)
> #    Expected: '74'
> not ok 56
> # Test 56 got: '463' (t/protgraph.t at line 254)
> #    Expected: '76'
> ok 57
> ok 58
> not ok 59
> # Test 59 got: '437' (t/protgraph.t at line 263)
> #    Expected: '3'
> not ok 60
> # Test 60 got: '467' (t/protgraph.t at line 264)
> #    Expected: '4'
> ok 61
> ok 62
> ok 63
> ok 64
> not ok 65
> # Test 65 got: '440' (t/protgraph.t at line 275)
> #    Expected: '3'
> not ok 66
> # Test 66 got: '472' (t/protgraph.t at line 276)
> #    Expected: '5'
> 
> 
> On Friday 09 June 2006 04:30, Chris Fields wrote:
> > Yes; using ActiveState's PPM:
> >
> > ppm> query CLone
> > Querying target 1 (ActivePerl 5.8.7.815)
> >   1. Clone [0.20] recursively copy Perl datatypes
> > ppm>
> >
> > v. 0.20 is the latest in CPAN.
> >
> > I can try some additional tests with the relevant modules to see what
> the
> > problem is.
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > > Sent: Thursday, June 08, 2006 2:42 PM
> > > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > > with"returnundef"
> > >
> > > Chris,
> > >
> > > Odd. protgraph.t passes all of its tests on my computer. Do you have
> the
> > > Clone module installed?
> > >
> > > Brian O.
> > >
> > > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > > Failed 22/66 tests, 66.67% okay
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Fri Jun  9 14:29:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 09 Jun 2006 14:29:53 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002b01c68bee$6e3237e0$15327e82@pyrimidine>
Message-ID: <C0AF3661.CD0A%sdavis2@mail.nih.gov>


On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> No; I saw the same thing here.  It's not FASTA in the traditional sense:
> 
> http://www.bioperl.org/wiki/FASTA_sequence_format
> 
> though he did get it to build a database successfully.  Well, 'success' in
> the sense that no errors were thrown.  I've learned the absence of error
> messages does not necessarily mean that everything went as planned; it
> depends on how much error handling has been added to the module by the
> submitting author.
> 
> It's possible that the second annotation line was ignored completely.  I
> suppose it's also possible that two sequences are entered into the database,
> an empty sequence for the first '>' line and the full sequence for the
> second.  It's all dependent on how the parser handles this.

I think that Senthil was pointing out that even though >Antisense looks to
be on its own line, it isn't, but is simply a continutation of the FASTA
header.  Judging from the context, that is the only interpretation that
makes sense.  

Sean

>> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> |> >Antisense;
>> |> TGGCTCCTGCTGAGGTCCCCTTTCC
>> |
>> |Unfortunately that's not Fasta format (which only has a single header
>> |line starting with a '>'.  I'd imagine that most programs which deal
>> |with fasta which read that entry would see it as two sequences, the
>> |first of which is empty.
>> |
>> 
>> [snipped]
>> 
>> hi,
>> 
>> I think the file is in fasta format and probably you might have seen it
>> differently because of your mail transport agent.


From cjfields at uiuc.edu  Fri Jun  9 15:05:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:05:44 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <002e01c68bf7$b594d210$15327e82@pyrimidine>

There's information in the HOWTOs:

http://www.bioperl.org/wiki/HOWTO:Flat_databases

http://www.bioperl.org/wiki/HOWTO:OBDA

Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
('fasta' format I/O) and this is what I got as output:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;


i.e. an empty sequence, which is what I guessed might happen, though I
thought it might pick up the second '>' and the full sequence there.  Since
the sequence is tossed you'll have to prescreen your sequence input stream
by either concatenating the two '>' lines together or screening for the
relevant information you want to retain.  You can try maybe getting this
info into Bio::Seq objects and writing to a Bio::SeqIO stream (to file or
file handle).

Once you have that set up, the HOWTO tells you how to set up custom or
secondary namespaces, so you can use a regex to parse out the information
for a primary or secondary keys:

http://www.bioperl.org/wiki/HOWTO:Flat_databases#Secondary_or_custom_namespa
ces

then you could select specific sequences this way (per the HOWTO):

$db->secondary_namespaces("GI");
my $acc_seq = $db->get_Seq_by_id("P84139");
my $gi_seq = $db->get_Seq_by_secondary("GI",443893);

or for multiple sequences (judging from the POD):

my $acc_seqio = $db->get_Stream_by_id(@ids);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Oldham
> Sent: Thursday, June 08, 2006 9:08 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes (both the header and the sequence) and output this subset into a
> new
> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> ("1138_at" is the probe set ID in the header listed above); I have these
> 8,175 IDs listed in a separate file.  I *think* that I managed to create
> an
> index of all 200,000 probes in the original fasta file using the following
> script:
> 
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
>  use strict;
>  my $Index_File_Name = shift;
>  my $inx = Bio::Index::Fasta->new(
>      -filename => $Index_File_Name,
>      -write_flag => 1);
>  $inx->make_index(@ARGV);
> 
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!
> 
> Many thanks,
> Mike O.
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 15:49:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:49:51 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <C0AF3661.CD0A%sdavis2@mail.nih.gov>
Message-ID: <002f01c68bfd$e1111e20$15327e82@pyrimidine>

> On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > No; I saw the same thing here.  It's not FASTA in the traditional sense:
> >
> > http://www.bioperl.org/wiki/FASTA_sequence_format
> >
> > though he did get it to build a database successfully.  Well, 'success'
> in
> > the sense that no errors were thrown.  I've learned the absence of error
> > messages does not necessarily mean that everything went as planned; it
> > depends on how much error handling has been added to the module by the
> > submitting author.
> >
> > It's possible that the second annotation line was ignored completely.  I
> > suppose it's also possible that two sequences are entered into the
> database,
> > an empty sequence for the first '>' line and the full sequence for the
> > second.  It's all dependent on how the parser handles this.
> 
> I think that Senthil was pointing out that even though >Antisense looks to
> be on its own line, it isn't, but is simply a continutation of the FASTA
> header.  Judging from the context, that is the only interpretation that
> makes sense.
> 
> Sean

Sorry.  Just checked through another mail client and you're right.  That's
what I get for trusting Mr. Gates (stupid Outlook).  I have seen a few funky
FASTA derivations, so I thought that's what was going on here.  My bad!

My point, though erroneous, was that the fasta format parser may not parse
this data correctly if he did have two description lines, but may not
indicate there are problems by throwing an exception.  I demonstrated that
using Bio::SeqIO as an example (you get empty sequences).  Bio::Index::Fasta
parses the file itself using this loop to index:

	# Main indexing loop
	while (<FASTA>) {
		if (/^>/) {
			# $begin is the position of the first character
after the '>'
			my $begin = tell(FASTA) - length( $_ ) + 1;

			foreach my $id (&$id_parser($_)) {
				$self->add_record($id, $i, $begin);
			}
		}
	}

Which simply looks for '>'.  That's fine for a vast majority of sequences.
I thought it would be nice to have something that's a little more strenuous
in verifying the format rather than trusting it implicitly, maybe by using
an eval{} block to make sure the format is FASTA-like and looks like
DNA/RNA/protein.  

Chris


> >> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >> |> >Antisense;
> >> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> >> |
> >> |Unfortunately that's not Fasta format (which only has a single header
> >> |line starting with a '>'.  I'd imagine that most programs which deal
> >> |with fasta which read that entry would see it as two sequences, the
> >> |first of which is empty.
> >> |
> >>
> >> [snipped]
> >>
> >> hi,
> >>
> >> I think the file is in fasta format and probably you might have seen it
> >> differently because of your mail transport agent.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Fri Jun  9 09:23:21 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 9 Jun 2006 15:23:21 +0200
Subject: [Bioperl-l] SimpleAlign
Message-ID: <716af09c0606090623v37c72bc5r1ddbcb2b8355a4a0@mail.gmail.com>

Hi,

Two queries with respect to SimpleAlign. I am using the following code
based on the POD.

my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
print $out $_ while <$in>;

1) is it possible to set set_displayname_flat() globally without doing
$_->set_displayname_flat() per alignment.

2) My input files have an ID and description line for each seq in the
alignment. When the file is converted I loose the description line. I
know I can get the description of the sequences (e.g.
$aln->get_seq_by_pos(2)->description()).
How could I export the complete fasta defline including the
description (I realize that general clustal format has a limit on the
number of characters, but still).

Regards,
Bernd


From oldham at ucla.edu  Fri Jun  9 21:39:45 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 9 Jun 2006 18:39:45 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>

Thanks to everyone for their helpful advice.  I think I am getting closer,
but no cigar quite yet.  The script below runs quickly with no errors--but
the output file is empty.  It seems that the problem must lie somewhere in
the 'while' loop, and I'm sure it's quite obvious to a more experienced
eye--but not to mine!  Any suggestions?  Thanks again for your help.

--Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID.dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Friday, June 09, 2006 7:58 AM
To: Michael Oldham; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single large
file


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work:

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed
>to create an
>index of all 200,000 probes in the original fasta file using
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: 6/9/2006


From cjfields at uiuc.edu  Sun Jun 11 00:32:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Jun 2006 23:32:04 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <F4E1042A-CE2D-4E51-B711-BDBB6E052FEB@uiuc.edu>

What happens if you just print $idmatch or $1 (i.e. check to see if  
the regex matches anything)?  If there is nothing printed then either  
the regex isn't working as expected or there is something logically  
wrong.  The problem may be that the captured string must match the id  
exactly, the id being the key to the %ID hash; any extra characters  
picked up by the regex outside of your id key and you will not get  
anything.  Looking at Malcolm's regex it should work just fine, but  
we only had one example sequence to try here.

If your while loop is set up like this won't it only print only the  
matched description lines to the outfile (no sequence) even if there  
is a match?  Or is this what you wanted?   If you want the sequence  
you should add 'print OUT <PROBES>;' after the 'print OUT;' line.

Chris

On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:

> Thanks to everyone for their helpful advice.  I think I am getting  
> closer,
> but no cigar quite yet.  The script below runs quickly with no  
> errors--but
> the output file is empty.  It seems that the problem must lie  
> somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more  
> experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>
> --Mike O.
>
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID.dat.txt';
>
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
>
> my $probes = 'HG_U95Av2_probe_fasta.txt';
>
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
> keys=PSIDs and
> all values=1.
>
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;
>
>
> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Friday, June 09, 2006 7:58 AM
> To: Michael Oldham; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
> single large
> file
>
>
>
> I wouldn't bioperl for this, or create an index.  Perl would do  
> fine and
> probably be faster.
>
> Assuming your ids are one per line in a file named id.dat looking like
> this
>
> 1138_at
> 1134_at
> etc..
>
> this should work:
>
> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> mybigfile.fa
>
> good luck
>
> --Malcolm Cook
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Michael Oldham
>> Sent: Thursday, June 08, 2006 9:08 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>> single large file
>>
>> Dear all,
>>
>> I am a total Bioperl newbie struggling to accomplish a
>> conceptually simple
>> task.  I have a single large fasta file containing about 200,000  
>> probe
>> sequences (from an Affymetrix microarray), each of which looks
>> like this:
>>
>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> Antisense;
>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>
>> What I would like to do is extract from this file a subset of  
>> ~130,800
>> probes (both the header and the sequence) and output this
>> subset into a new
>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>> ("1138_at" is the probe set ID in the header listed above); I
>> have these
>> 8,175 IDs listed in a separate file.  I *think* that I managed
>> to create an
>> index of all 200,000 probes in the original fasta file using
>> the following
>> script:
>>
>> #!/usr/bin/perl -w
>>
>> # script 1: create the index
>>
>> use Bio::Index::Fasta;
>> use strict;
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fasta->new(
>>     -filename => $Index_File_Name,
>>     -write_flag => 1);
>> $inx->make_index(@ARGV);
>>
>> I'm not sure if this is the most sensible approach, and even
>> if it is, I'm
>> not sure what to do next.  Any help would be greatly appreciated!
>>
>> Many thanks,
>> Mike O.
>>
>>
>>
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
> 6/8/2006
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
> 6/9/2006
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 04:21:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:21:31 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002e01c68bf7$b594d210$15327e82@pyrimidine>
References: <002e01c68bf7$b594d210$15327e82@pyrimidine>
Message-ID: <448D240B.6040508@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> There's information in the HOWTOs:
> 
> http://www.bioperl.org/wiki/HOWTO:Flat_databases
> 
> http://www.bioperl.org/wiki/HOWTO:OBDA
> 
> Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
> ('fasta' format I/O) and this is what I got as output:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> 
> 
> i.e. an empty sequence, which is what I guessed might happen
[snip]

As you later discovered, that was an Outlook problem. Just to make this 
thread relevant to bioperl, the bioperl solution is:

use Bio::SeqIO;
use Bio::Index::Fasta;
my $inx = Bio::Index::Fasta->new(-write_flag => 1);
$inx->id_parser(\&get_id);
$inx->make_index(shift);

my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
my $wanted_ids_file = shift;
open(IDS, $wanted_ids_file);
while (<IDS>) {
   chomp;
   my $seq = $inx->fetch($_);
   $out->write_seq($seq);
}

sub get_id {
   my $line = shift;
   $line =~ /^>probe:\S+?:(\S+?):/;
   $1;
}

It works for me on the sample sequence given by the OP.


From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 04:49:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:49:49 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <448D2AAD.3030601@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Thanks to everyone for their helpful advice.  I think I am getting closer,
> but no cigar quite yet.  The script below runs quickly with no errors--but
> the output file is empty.  It seems that the problem must lie somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> 
> --Mike O.
> 
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> my $IDs = 'ID.dat.txt';
> 
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
> 
> my $probes = 'HG_U95Av2_probe_fasta.txt';
> 
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
> 
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> 
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
> all values=1.
> 
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;

Not sure why it would print nothing (are the ids in IDFILE the same case 
as the ids in the fasta file, do they only contain word characters?), 
but even if it did you would only be printing out the fasta headers and 
not the sequences. Doing it the bioperl way gives you more flexibility 
in the future; you may want to do something with the sequences after 
printing them out, in which case do it in bioperl using Seq objects and 
skip the intermediate step of printing them.


From MEC at stowers-institute.org  Mon Jun 12 11:28:41 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:28:41 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F98D@exchkc02.stowers-institute.org>

Michael,

I don't think you can call perl's `print` on just a filehandle as you
are doing.  This is probably your problem.

If you call `select OUT` after opeining it, print will print $_ to it.
And, every line in the fasta record whose header matches on of the IDS
will get printed, not just the fasta header lines.  Read the code again
nothing that $idmatch is only getting reset when a correctly formatted
fasta header line is matched.

--Malcolm


>-----Original Message-----
>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>Sent: Saturday, June 10, 2006 11:32 PM
>To: Michael Oldham
>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>What happens if you just print $idmatch or $1 (i.e. check to see if  
>the regex matches anything)?  If there is nothing printed then either  
>the regex isn't working as expected or there is something logically  
>wrong.  The problem may be that the captured string must match the id  
>exactly, the id being the key to the %ID hash; any extra characters  
>picked up by the regex outside of your id key and you will not get  
>anything.  Looking at Malcolm's regex it should work just fine, but  
>we only had one example sequence to try here.
>
>If your while loop is set up like this won't it only print only the  
>matched description lines to the outfile (no sequence) even if there  
>is a match?  Or is this what you wanted?   If you want the sequence  
>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>
>Chris
>
>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>
>> Thanks to everyone for their helpful advice.  I think I am getting  
>> closer,
>> but no cigar quite yet.  The script below runs quickly with no  
>> errors--but
>> the output file is empty.  It seems that the problem must lie  
>> somewhere in
>> the 'while' loop, and I'm sure it's quite obvious to a more  
>> experienced
>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>
>> --Mike O.
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>>
>> my $IDs = 'ID.dat.txt';
>>
>> unless (open(IDFILE, $IDs)) {
>> 	print "Could not open file $IDs!\n";
>> 	}
>>
>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>
>> unless (open(PROBES, $probes)) {
>> 	print "Could not open file $probes!\n";
>> 	}
>>
>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>
>> my @ID = <IDFILE>;
>> chomp @ID;
>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>> keys=PSIDs and
>> all values=1.
>>
>> 	while (<PROBES>) {
>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>> 		if ($idmatch){
>> 			print OUT;
>> 		}
>> 	}
>> exit;
>>
>>
>> -----Original Message-----
>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>> Sent: Friday, June 09, 2006 7:58 AM
>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>> single large
>> file
>>
>>
>>
>> I wouldn't bioperl for this, or create an index.  Perl would do  
>> fine and
>> probably be faster.
>>
>> Assuming your ids are one per line in a file named id.dat 
>looking like
>> this
>>
>> 1138_at
>> 1134_at
>> etc..
>>
>> this should work:
>>
>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>> mybigfile.fa
>>
>> good luck
>>
>> --Malcolm Cook
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Michael Oldham
>>> Sent: Thursday, June 08, 2006 9:08 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>> single large file
>>>
>>> Dear all,
>>>
>>> I am a total Bioperl newbie struggling to accomplish a
>>> conceptually simple
>>> task.  I have a single large fasta file containing about 200,000  
>>> probe
>>> sequences (from an Affymetrix microarray), each of which looks
>>> like this:
>>>
>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>> Antisense;
>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>
>>> What I would like to do is extract from this file a subset of  
>>> ~130,800
>>> probes (both the header and the sequence) and output this
>>> subset into a new
>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>> ("1138_at" is the probe set ID in the header listed above); I
>>> have these
>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>> to create an
>>> index of all 200,000 probes in the original fasta file using
>>> the following
>>> script:
>>>
>>> #!/usr/bin/perl -w
>>>
>>> # script 1: create the index
>>>
>>> use Bio::Index::Fasta;
>>> use strict;
>>> my $Index_File_Name = shift;
>>> my $inx = Bio::Index::Fasta->new(
>>>     -filename => $Index_File_Name,
>>>     -write_flag => 1);
>>> $inx->make_index(@ARGV);
>>>
>>> I'm not sure if this is the most sensible approach, and even
>>> if it is, I'm
>>> not sure what to do next.  Any help would be greatly appreciated!
>>>
>>> Many thanks,
>>> Mike O.
>>>
>>>
>>>
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>> 6/9/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>Christopher Fields
>Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>Dept of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>


From MEC at stowers-institute.org  Mon Jun 12 11:47:09 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:47:09 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F991@exchkc02.stowers-institute.org>

ooops, in my message 


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From MEC at stowers-institute.org  Mon Jun 12 11:48:02 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:48:02 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>

oops,

s/matches on of/matches one of/
s/nothing that/noting that/ 

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hubert.prielinger at gmx.at  Mon Jun 12 14:29:19 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 12 Jun 2006 12:29:19 -0600
Subject: [Bioperl-l] How to use gi2taxonid
Message-ID: <448DB27F.6090107@gmx.at>

hi,
I have downloaded the gi2taxonid file to get the taxonid for a GI number 
taken from a report as recommended here, but I don't know how to use the 
gi2taxonid file.
Jason wrote in a previous post that you have to make a DB_File out of 
it, but I don't know how....and finally tie it to a hash....
Can anybody give me a hint how to use it..... my final goal is to get 
the taxonomy.

thanks
Hubert


From cjfields at uiuc.edu  Mon Jun 12 15:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 14:13:30 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <000f01c68e54$4d155ac0$15327e82@pyrimidine>

Michael, Malcolm et al,

I ran Michael's code (not Malcolm's one-liner), with and w/o adding the file
handle line that I suggested.  My suggestion works b/c I'm calling the file
handle in scalar context, which reads the next line, just like '$foo =
<FILE>' or 'while(<FILE>) {}' advances to the next line (with $/ = "\n")
each time the file handle is called.  You could use:

$_ = <PROBES>;
print OUT;

I just chopped it down to one line.

Without the extra line I suggested I get only the description line (I used
this as a test file based on the original sequence and Michael's description
of the ID):

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;

Which I don't think Michael wants (he mentioned sequence and description, I
think).  

Modifying the loop in Michael's code to:
...

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT <PROBES>; # grabs next line and prints
	}
}

Gets:

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
AGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTATCC
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
TGGATCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;
TGGCTACTGCTGAGGTCCCCTTTCC

Which matches the ID's in the ID file (there are 10 sequences in the probes
file).  

I did notice one odd thing; I tried the above code on Mac OS X and it worked
fine (i.e. printed only the descriptions and sequences for the ID's in the
ID hash).  If I used Windows, I needed to use this version:

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT scalar(<PROBES>);		
	}
}

Or 'print <PROBES>;' prints all sequences (I guess it assumes list context
instead of scalar context when printing, so this forces it to be scalar).

Like I said, I haven't tried Malcolm's one-liner.  It's possible that it
works just as well as what I suggested.  I'm just responding to Michael's
code request.

Chris


> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 10:48 AM
> To: Cook, Malcolm; Chris Fields; Michael Oldham
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> oops,
> 
> s/matches on of/matches one of/
> s/nothing that/noting that/
> 
> --Malcolm
> 
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >Cook, Malcolm
> >Sent: Monday, June 12, 2006 10:29 AM
> >To: Chris Fields; Michael Oldham
> >Cc: bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >Michael,
> >
> >I don't think you can call perl's `print` on just a filehandle as you
> >are doing.  This is probably your problem.
> >
> >If you call `select OUT` after opeining it, print will print $_ to it.
> >And, every line in the fasta record whose header matches on of the IDS
> >will get printed, not just the fasta header lines.  Read the code again
> >nothing that $idmatch is only getting reset when a correctly formatted
> >fasta header line is matched.
> >
> >--Malcolm
> >
> >
> >>-----Original Message-----
> >>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>Sent: Saturday, June 10, 2006 11:32 PM
> >>To: Michael Oldham
> >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
> >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >>single large file
> >>
> >>What happens if you just print $idmatch or $1 (i.e. check to see if
> >>the regex matches anything)?  If there is nothing printed
> >then either
> >>the regex isn't working as expected or there is something logically
> >>wrong.  The problem may be that the captured string must
> >match the id
> >>exactly, the id being the key to the %ID hash; any extra characters
> >>picked up by the regex outside of your id key and you will not get
> >>anything.  Looking at Malcolm's regex it should work just fine, but
> >>we only had one example sequence to try here.
> >>
> >>If your while loop is set up like this won't it only print only the
> >>matched description lines to the outfile (no sequence) even if there
> >>is a match?  Or is this what you wanted?   If you want the sequence
> >>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
> >>
> >>Chris
> >>
> >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
> >>
> >>> Thanks to everyone for their helpful advice.  I think I am getting
> >>> closer,
> >>> but no cigar quite yet.  The script below runs quickly with no
> >>> errors--but
> >>> the output file is empty.  It seems that the problem must lie
> >>> somewhere in
> >>> the 'while' loop, and I'm sure it's quite obvious to a more
> >>> experienced
> >>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> >>>
> >>> --Mike O.
> >>>
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> use strict;
> >>>
> >>> my $IDs = 'ID.dat.txt';
> >>>
> >>> unless (open(IDFILE, $IDs)) {
> >>> 	print "Could not open file $IDs!\n";
> >>> 	}
> >>>
> >>> my $probes = 'HG_U95Av2_probe_fasta.txt';
> >>>
> >>> unless (open(PROBES, $probes)) {
> >>> 	print "Could not open file $probes!\n";
> >>> 	}
> >>>
> >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> >>>
> >>> my @ID = <IDFILE>;
> >>> chomp @ID;
> >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
> >>> keys=PSIDs and
> >>> all values=1.
> >>>
> >>> 	while (<PROBES>) {
> >>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> >>> 		if ($idmatch){
> >>> 			print OUT;
> >>> 		}
> >>> 	}
> >>> exit;
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> >>> Sent: Friday, June 09, 2006 7:58 AM
> >>> To: Michael Oldham; bioperl-l at lists.open-bio.org
> >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
> >>> single large
> >>> file
> >>>
> >>>
> >>>
> >>> I wouldn't bioperl for this, or create an index.  Perl would do
> >>> fine and
> >>> probably be faster.
> >>>
> >>> Assuming your ids are one per line in a file named id.dat
> >>looking like
> >>> this
> >>>
> >>> 1138_at
> >>> 1134_at
> >>> etc..
> >>>
> >>> this should work:
> >>>
> >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> >>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> >>> mybigfile.fa
> >>>
> >>> good luck
> >>>
> >>> --Malcolm Cook
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>>> Michael Oldham
> >>>> Sent: Thursday, June 08, 2006 9:08 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
> >>>> single large file
> >>>>
> >>>> Dear all,
> >>>>
> >>>> I am a total Bioperl newbie struggling to accomplish a
> >>>> conceptually simple
> >>>> task.  I have a single large fasta file containing about 200,000
> >>>> probe
> >>>> sequences (from an Affymetrix microarray), each of which looks
> >>>> like this:
> >>>>
> >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >>>> Antisense;
> >>>> TGGCTCCTGCTGAGGTCCCCTTTCC
> >>>>
> >>>> What I would like to do is extract from this file a subset of
> >>>> ~130,800
> >>>> probes (both the header and the sequence) and output this
> >>>> subset into a new
> >>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> >>>> ("1138_at" is the probe set ID in the header listed above); I
> >>>> have these
> >>>> 8,175 IDs listed in a separate file.  I *think* that I managed
> >>>> to create an
> >>>> index of all 200,000 probes in the original fasta file using
> >>>> the following
> >>>> script:
> >>>>
> >>>> #!/usr/bin/perl -w
> >>>>
> >>>> # script 1: create the index
> >>>>
> >>>> use Bio::Index::Fasta;
> >>>> use strict;
> >>>> my $Index_File_Name = shift;
> >>>> my $inx = Bio::Index::Fasta->new(
> >>>>     -filename => $Index_File_Name,
> >>>>     -write_flag => 1);
> >>>> $inx->make_index(@ARGV);
> >>>>
> >>>> I'm not sure if this is the most sensible approach, and even
> >>>> if it is, I'm
> >>>> not sure what to do next.  Any help would be greatly appreciated!
> >>>>
> >>>> Many thanks,
> >>>> Mike O.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> No virus found in this outgoing message.
> >>>> Checked by AVG Free Edition.
> >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>>> 6/8/2006
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>> --
> >>> No virus found in this incoming message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>> 6/8/2006
> >>>
> >>> --
> >>> No virus found in this outgoing message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
> >>> 6/9/2006
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>Christopher Fields
> >>Postdoctoral Researcher
> >>Lab of Dr. Robert Switzer
> >>Dept of Biochemistry
> >>University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From hlapp at gmx.net  Mon Jun 12 16:06:23 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 12 Jun 2006 16:06:23 -0400
Subject: [Bioperl-l] How to use gi2taxonid
In-Reply-To: <448DB27F.6090107@gmx.at>
References: <448DB27F.6090107@gmx.at>
Message-ID: <878FB829-AD31-457D-957E-210448D7F6F5@gmx.net>

Thought about typing

	$ perldoc DB_File

at the command line?

Hubert, are you trying to outsource what should be your own work to  
the bioperl list, or what motivates you to waste everybody's time? If  
you google 'how to ask good questions' this (indeed frequently cited,  
also on the bioperl list if you had paid attention) comes up as the  
first link:

http://www.catb.org/~esr/faqs/smart-questions.html

There's nothing I can add, except to read it in full before your next  
posting or you may reach the point fast at which nobody will bother  
to respond to you and do your homework for you.

On Jun 12, 2006, at 2:29 PM, Hubert Prielinger wrote:

> hi,
> I have downloaded the gi2taxonid file to get the taxonid for a GI  
> number
> taken from a report as recommended here, but I don't know how to  
> use the
> gi2taxonid file.
> Jason wrote in a previous post that you have to make a DB_File out of
> it, but I don't know how....and finally tie it to a hash....
> Can anybody give me a hint how to use it..... my final goal is to get
> the taxonomy.
>
> thanks
> Hubert
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon Jun 12 16:35:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 15:35:10 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <448D240B.6040508@mrc-dunn.cam.ac.uk>
Message-ID: <001201c68e5f$b34ec8c0$15327e82@pyrimidine>

...
> Chris Fields wrote:
> > There's information in the HOWTOs:
> >
> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >
> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >
...
> As you later discovered, that was an Outlook problem. Just to make this
> thread relevant to bioperl, the bioperl solution is:

Agreed (stupid Outlook).  It might be much faster to use non-Bioperl-ish
ways, but it is easier to further manipulate sequences (convert format,
analyze sequences, etc) using Bioperl directly.  I haven't used flat
databases much but it should move very quickly, even in an OO environment.

The one problem with the proposed non-bioperl method is, if you wanted
100,000 sequences (based on ID's) in a FASTA database file containing
200,000 sequences, all ID's would need to be stored (1) in an array (which
gulped the data from the ID file) and then map the ID's to (2) a hash;
that's may be a pretty big memory footprint depending on your system.  

Sendu's BioPerl version indexes the FASTA file based on the ID, then (1)
reads the ID's in one at a time from the file, (2) retrieves the data, then
(3) prints it out.   The advantage of this approach is that the built index
can be used in other bioperl scripts as well w/o having to rebuild it again,
so if you wanted a different set of ID's later on you can access the
database using the prebuilt index.  More can be found in the
Bio::Index::Fasta POD.  

You can also use the ideas and code in the HOWTO (Flat Databases) I
mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
advantage of these is that you can use Sleepycat's Berkeley Database through
the Perl BerkeleyDB module (more functionality than DB_File) which is faster
than a standard flat database.  In the HOWTO, specifically look under
'Secondary or custom namespaces' for ideas on how to use your ID as a
primary or secondary key.

Chris

> use Bio::SeqIO;
> use Bio::Index::Fasta;
> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> $inx->id_parser(\&get_id);
> $inx->make_index(shift);
> 
> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> my $wanted_ids_file = shift;
> open(IDS, $wanted_ids_file);
> while (<IDS>) {
>    chomp;
>    my $seq = $inx->fetch($_);
>    $out->write_seq($seq);
> }
> 
> sub get_id {
>    my $line = shift;
>    $line =~ /^>probe:\S+?:(\S+?):/;
>    $1;
> }
> 
> It works for me on the sample sequence given by the OP.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Mon Jun 12 16:23:45 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 12 Jun 2006 16:23:45 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>

I'm trying to install the bioperl-run package and an getting errors from
make test regarding PAML:

t/PAML....................ok 2/18Can't call method "get_MLmatrix" on an
undefined value at t/PAML.t line 85, <GEN2> line 85.
t/PAML....................dubious
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Is this a legitimate error or am I missing something?

Ryan


From MEC at stowers-institute.org  Mon Jun 12 17:15:35 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 16:15:35 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>

Yeah, good points...

... my recommendation of the one-liner was motivated based on a small
number of IDs and no other applications needing to index the entire
fasta database.


--Malcolm [At which point he bowed out of this fray]

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>Sent: Monday, June 12, 2006 3:35 PM
>To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>...
>> Chris Fields wrote:
>> > There's information in the HOWTOs:
>> >
>> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
>> >
>> > http://www.bioperl.org/wiki/HOWTO:OBDA
>> >
>...
>> As you later discovered, that was an Outlook problem. Just 
>to make this
>> thread relevant to bioperl, the bioperl solution is:
>
>Agreed (stupid Outlook).  It might be much faster to use 
>non-Bioperl-ish
>ways, but it is easier to further manipulate sequences (convert format,
>analyze sequences, etc) using Bioperl directly.  I haven't used flat
>databases much but it should move very quickly, even in an OO 
>environment.
>
>The one problem with the proposed non-bioperl method is, if you wanted
>100,000 sequences (based on ID's) in a FASTA database file containing
>200,000 sequences, all ID's would need to be stored (1) in an 
>array (which
>gulped the data from the ID file) and then map the ID's to (2) a hash;
>that's may be a pretty big memory footprint depending on your system.  
>
>Sendu's BioPerl version indexes the FASTA file based on the 
>ID, then (1)
>reads the ID's in one at a time from the file, (2) retrieves 
>the data, then
>(3) prints it out.   The advantage of this approach is that 
>the built index
>can be used in other bioperl scripts as well w/o having to 
>rebuild it again,
>so if you wanted a different set of ID's later on you can access the
>database using the prebuilt index.  More can be found in the
>Bio::Index::Fasta POD.  
>
>You can also use the ideas and code in the HOWTO (Flat Databases) I
>mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
>advantage of these is that you can use Sleepycat's Berkeley 
>Database through
>the Perl BerkeleyDB module (more functionality than DB_File) 
>which is faster
>than a standard flat database.  In the HOWTO, specifically look under
>'Secondary or custom namespaces' for ideas on how to use your ID as a
>primary or secondary key.
>
>Chris
>
>> use Bio::SeqIO;
>> use Bio::Index::Fasta;
>> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
>> $inx->id_parser(\&get_id);
>> $inx->make_index(shift);
>> 
>> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
>> my $wanted_ids_file = shift;
>> open(IDS, $wanted_ids_file);
>> while (<IDS>) {
>>    chomp;
>>    my $seq = $inx->fetch($_);
>>    $out->write_seq($seq);
>> }
>> 
>> sub get_id {
>>    my $line = shift;
>>    $line =~ /^>probe:\S+?:(\S+?):/;
>>    $1;
>> }
>> 
>> It works for me on the sample sequence given by the OP.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Jun 12 17:20:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 16:20:55 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>
Message-ID: <001601c68e66$17b760a0$15327e82@pyrimidine>

Sorry Malcolm.  I didn't want to imply that your way or the bioperl way was
best, just point out advantages/disadvantages.  

Oops, didn't point out the possible Bioperl disadvantage (too many objects
generated = slow slow slow).  

Chris

> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 4:16 PM
> To: Chris Fields; Sendu Bala; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> Yeah, good points...
> 
> ... my recommendation of the one-liner was motivated based on a small
> number of IDs and no other applications needing to index the entire
> fasta database.
> 
> 
> --Malcolm [At which point he bowed out of this fray]
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Monday, June 12, 2006 3:35 PM
> >To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >...
> >> Chris Fields wrote:
> >> > There's information in the HOWTOs:
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >> >
> >...
> >> As you later discovered, that was an Outlook problem. Just
> >to make this
> >> thread relevant to bioperl, the bioperl solution is:
> >
> >Agreed (stupid Outlook).  It might be much faster to use
> >non-Bioperl-ish
> >ways, but it is easier to further manipulate sequences (convert format,
> >analyze sequences, etc) using Bioperl directly.  I haven't used flat
> >databases much but it should move very quickly, even in an OO
> >environment.
> >
> >The one problem with the proposed non-bioperl method is, if you wanted
> >100,000 sequences (based on ID's) in a FASTA database file containing
> >200,000 sequences, all ID's would need to be stored (1) in an
> >array (which
> >gulped the data from the ID file) and then map the ID's to (2) a hash;
> >that's may be a pretty big memory footprint depending on your system.
> >
> >Sendu's BioPerl version indexes the FASTA file based on the
> >ID, then (1)
> >reads the ID's in one at a time from the file, (2) retrieves
> >the data, then
> >(3) prints it out.   The advantage of this approach is that
> >the built index
> >can be used in other bioperl scripts as well w/o having to
> >rebuild it again,
> >so if you wanted a different set of ID's later on you can access the
> >database using the prebuilt index.  More can be found in the
> >Bio::Index::Fasta POD.
> >
> >You can also use the ideas and code in the HOWTO (Flat Databases) I
> >mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
> >advantage of these is that you can use Sleepycat's Berkeley
> >Database through
> >the Perl BerkeleyDB module (more functionality than DB_File)
> >which is faster
> >than a standard flat database.  In the HOWTO, specifically look under
> >'Secondary or custom namespaces' for ideas on how to use your ID as a
> >primary or secondary key.
> >
> >Chris
> >
> >> use Bio::SeqIO;
> >> use Bio::Index::Fasta;
> >> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> >> $inx->id_parser(\&get_id);
> >> $inx->make_index(shift);
> >>
> >> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> >> my $wanted_ids_file = shift;
> >> open(IDS, $wanted_ids_file);
> >> while (<IDS>) {
> >>    chomp;
> >>    my $seq = $inx->fetch($_);
> >>    $out->write_seq($seq);
> >> }
> >>
> >> sub get_id {
> >>    my $line = shift;
> >>    $line =~ /^>probe:\S+?:(\S+?):/;
> >>    $1;
> >> }
> >>
> >> It works for me on the sample sequence given by the OP.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From roy at colibase.bham.ac.uk  Mon Jun 12 11:46:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 12 Jun 2006 16:46:49 +0100
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <200606090935.12758.heikki@sanbi.ac.za>
References: <448850CE.1040105@colibase.bham.ac.uk>
	<200606090935.12758.heikki@sanbi.ac.za>
Message-ID: <448D8C69.4030005@colibase.bham.ac.uk>

Hi Heikki.

> Two questions come to mind:
> 
> 1. Can you parse your joint location using bioperl without errors?
Seems to work fine as far as I can tell (no errors, and to_FTstring 
reproduces the location as expected).

> 2. Is there a practical advantage in including a location which has no 
> relevance to the sequence in hand?
I think it would be misleading to imply that a location was complete 
when it is only a part of the originally annotated feature. From the FT 
definition the other possibility would be to include the missing parts 
of the feature as remote locations, I guess that may be more satisfactory.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From colin.erdman at du.edu  Mon Jun 12 15:52:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Mon, 12 Jun 2006 13:52:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
Message-ID: <1150141965.2992.17.camel@localhost.localdomain>

Hello all,

I am doing a project relating to some forensic analysis of mitochondrial
DNA. 

I would like to write a script that will take a reference sequence, in
this case the Anderson sequence which is the standard mitochondrial
sequence which sample sequences are compared to, and compare it to an
unknown sequence.

I have been using this script:

use Bio::SearchIO;
use strict;
my $fh;
my @nomatches;
open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p blastn |") || die $!;

my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);

if( my $result = $parser->next_result ) { 
     if( my $hit = $result->next_hit ) {   
     if( my $hsp = $hit->next_hsp ) { 
         my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
	 my ( @hitbases) = $hsp->hit_string;
	 my ( @querybases) = $hsp->query_string;
	 my $seq_string = join("", at querybases);
	 my $seq_string1 = join("", at hitbases);
         for my $base (  @qmismatches ) {
            print "base $base of the hit sequence is a mismatch: ";
	    print substr $seq_string, $base-1, 1;
	    print "->";
            print substr $seq_string1, $base-1, 1;
            print "\n";
        }
	
     }
     }
}


The problem is, that some mitochondrial sequences from individuals have
insertions, deletion etc, that cause them to be offset from the
reference sequence, this then offsets the numbering system.

To provide an example:

>Anderson Reference Sequence|HV2
ATTTGGT...
1234567

>Sample|HV2....
ATTTG|C|GT
12345,5.1,67

The |C| denote an insertion, and traditionally in the forensics community
this would be called position 5.1G, but the program reads it as position 6.

So basically I need to figure out how to modify a perl script in order to recognize 
that 5.1G is an insertion, and that it is not position 6, position 6 is actually 
the G to the right of it, followed by position 7-T.

Any ideas and suggestions would be greatly helpful, I know this could be very tricky,
or very easy - I just have come to the point where the idea flow has stopped and would 
love to gather some outside input.

Thanks
Colin Erdman
colin.erdman at du.edu
Undergraduate Research Associate
Institute For Forensic Genetic
University of Denver 


From jason at bioperl.org  Tue Jun 13 10:19:04 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 10:19:04 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
Message-ID: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>

The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors  
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"  
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason at bioperl.org  Tue Jun 13 11:45:27 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 11:45:27 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
	<B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <F802F582-28E4-4761-873C-2A49A60B3593@bioperl.org>

And just to say - codeml 3.15 parsing does work - yn00 parsing just  
hasn't been updated.   I agree that it is bad the test is failing but  
it is dependent on the version that is installed and we should put  
some sort of detect version-skip test code in there so it doesn't  
cause the tests to fail.  Just need more hands on deck tracking these  
sort of things....

-jason
On Jun 13, 2006, at 10:19 AM, Jason Stajich wrote:

> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start  
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
>
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
>
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
>
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
>
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>>
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>>
>> Is this a legitimate error or am I missing something?
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Tue Jun 13 12:04:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 12:04:46 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001001c68f03$17429070$e6028a0a@GOLHARMOBILE1>

I'll take a look at it and see what I can do.  While I'm at it,
bioperl-run tests a module called Coil, but I don't have that installed.
The documentation doesn't specify where I can get this application.
Does anyone know where Coil comes from?


-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Kevin.M.Brown at asu.edu  Tue Jun 13 13:42:40 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 13 Jun 2006 10:42:40 -0700
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?


From aaron.j.mackey at gsk.com  Tue Jun 13 08:19:11 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 13 Jun 2006 08:19:11 -0400
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <1150141965.2992.17.camel@localhost.localdomain>
Message-ID: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>

See Bio::LocatableSeq

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:

> Hello all,
> 
> I am doing a project relating to some forensic analysis of mitochondrial
> DNA. 
> 
> I would like to write a script that will take a reference sequence, in
> this case the Anderson sequence which is the standard mitochondrial
> sequence which sample sequences are compared to, and compare it to an
> unknown sequence.
> 
> I have been using this script:
> 
> use Bio::SearchIO;
> use strict;
> my $fh;
> my @nomatches;
> open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> blastn |") || die $!;
> 
> my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> 
> if( my $result = $parser->next_result ) { 
>      if( my $hit = $result->next_hit ) { 
>      if( my $hsp = $hit->next_hsp ) { 
>          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
>     my ( @hitbases) = $hsp->hit_string;
>     my ( @querybases) = $hsp->query_string;
>     my $seq_string = join("", at querybases);
>     my $seq_string1 = join("", at hitbases);
>          for my $base (  @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch: ";
>        print substr $seq_string, $base-1, 1;
>        print "->";
>             print substr $seq_string1, $base-1, 1;
>             print "\n";
>         }
> 
>      }
>      }
> }
> 
> 
> The problem is, that some mitochondrial sequences from individuals have
> insertions, deletion etc, that cause them to be offset from the
> reference sequence, this then offsets the numbering system.
> 
> To provide an example:
> 
> >Anderson Reference Sequence|HV2
> ATTTGGT...
> 1234567
> 
> >Sample|HV2....
> ATTTG|C|GT
> 12345,5.1,67
> 
> The |C| denote an insertion, and traditionally in the forensics 
community
> this would be called position 5.1G, but the program reads it as position 
6.
> 
> So basically I need to figure out how to modify a perl script in 
> order to recognize 
> that 5.1G is an insertion, and that it is not position 6, position 6
> is actually 
> the G to the right of it, followed by position 7-T.
> 
> Any ideas and suggestions would be greatly helpful, I know this 
> could be very tricky,
> or very easy - I just have come to the point where the idea flow has
> stopped and would 
> love to gather some outside input.
> 
> Thanks
> Colin Erdman
> colin.erdman at du.edu
> Undergraduate Research Associate
> Institute For Forensic Genetic
> University of Denver 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From colin.erdman at du.edu  Tue Jun 13 11:12:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 09:12:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150211566.7034.1.camel@localhost.localdomain>

I could see how this will help... but I am not sure how to implement it
in my situation, I am not very familiar with the Bio::Range or
Bio::Location modules...

Thanks very much,
Colin E.

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From colin.erdman at du.edu  Tue Jun 13 12:05:30 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 10:05:30 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150214730.12044.2.camel@localhost.localdomain>

I actually have found EMBOSS DiffSeq to work quite well for detecting
the insertions and SNPs in the "sample sequence" as compared to the
"reference sequence". 

If I get this all figured out and integrated I will post a method, I
imagine this would prove useful to others as well.

Thanks all,
Colin

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From golharam at umdnj.edu  Tue Jun 13 14:59:59 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:59:59 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <002301c68f1b$917b8c80$e6028a0a@GOLHARMOBILE1>

Nevermind - don't check it in yet.  There are still some other problems
not being picked up by the test suite.  I'll work on that and add to the
test suite.  Jason, I'll send you everything once I have it complete.


-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, June 13, 2006 2:34 PM
To: 'Jason Stajich'
Cc: 'bioperl-l at bioperl.org'
Subject: RE: [Bioperl-l] Test errors in bioperl-run


It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors 
> from make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix" on 
> an undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Jonathan_Epstein at nih.gov  Tue Jun 13 14:21:00 2006
From: Jonathan_Epstein at nih.gov (Jonathan_Epstein at nih.gov)
Date: Tue, 13 Jun 2006 14:21:00 -0400
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <0J0T001LE9O5M6@lswsmta04.nmcc.sprintspectrum.com>

sounds like a job for MUMMER (from Steven Salzberg's group).

Jonathan Epstein 

----------- 
Sent from my Treo

-----Original Message-----

From:  "Kevin Brown" <Kevin.M.Brown at asu.edu>
Subj:  [Bioperl-l] Blast or blat against custom db?
Date:  Tue Jun 13, 2006 2:17 pm
Size:  1K
To:  <bioperl-l at lists.open-bio.org>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

--- message truncated ---


From golharam at umdnj.edu  Tue Jun 13 14:34:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:34:00 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>

It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAML.pm
Type: application/octet-stream
Size: 43262 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060613/566881b4/attachment-0002.obj>

From cjfields at uiuc.edu  Tue Jun 13 21:41:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:41:45 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000601c68f53$b1e4b090$15327e82@pyrimidine>

I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Tue Jun 13 21:42:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:42:25 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000701c68f53$c9addcb0$15327e82@pyrimidine>

Sorry, Brian beat me to it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From osborne1 at optonline.net  Tue Jun 13 21:38:09 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 13 Jun 2006 21:38:09 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <C0B4E0C1.8D74%osborne1@optonline.net>

Checked in.


On 6/13/06 2:34 PM, "Ryan Golhar" <golharam at umdnj.edu> wrote:

> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>> 
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>> 
>> Is this a legitimate error or am I missing something?
>> 
>> Ryan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Tue Jun 13 21:55:49 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 21:55:49 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <000601c68f53$b1e4b090$15327e82@pyrimidine>
Message-ID: <000101c68f55$a9fa8ec0$2f01a8c0@GOLHARMOBILE1>

Okay, that's fine.  It does pass the bioperl-live tests.  When I ran the
bp_pairwise_kaks script, it didn't work, the script doesn't work with
3.15.  It looks like the current test suite is not exhaustive.  

When I looked into the code more so, I see that codeml 3.15 generates
some files slightly different than 3.14 which needs to be accounted for.
I'll work on that and post it here...shouldn't be too long.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Tuesday, June 13, 2006 9:42 PM
To: golharam at umdnj.edu; 'Jason Stajich'
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and 
> the comment sections have been changed slightly.  I've modified 
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from 
> YN00. I've attached it to this message.  It passs all the PAML tests 
> from bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  
> Can you (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code 
> as the output has changed substantially as Yang is now provided 
> several different method's simple Ka and Ks calculations.  Downgrade 
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start 
> parsing for the Pairwise data as well as the function 
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the 
> software packages so I am hopeful that other developers that use our 
> software as do molecular evolutionary studies will get involved to 
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week 
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors 
> > from make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix" on

> > an undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Tue Jun 13 21:10:04 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 14 Jun 2006 09:10:04 +0800
Subject: [Bioperl-l] SimpleAlign /Bio::AlignIO; POD code doesn't work for me
Message-ID: <5b6410e0606131810k495d8f55mc6dc73f0cd5a6df5@mail.gmail.com>

>
> Hi,
>
> Two queries with respect to SimpleAlign. I am using the following code
> based on the POD.
>
> my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
> my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
> print $out $_ while <$in>;
>
> 1) is it possible to set set_displayname_flat() globally without doing
> $_->set_displayname_flat() per alignment.
>
> 2) My input files have an ID and description line for each seq in the
> alignment. When the file is converted I loose the description line. I
> know I can get the description of the sequences (e.g.
> $aln->get_seq_by_pos(2)->description()).
> How could I export the complete fasta defline including the
> description (I realize that general clustal format has a limit on the
> number of characters, but still).
>
> Regards,
> Bernd
> _______________________________________________
>
I might be totally wrong here but what I understand about the FASTA format
is that the first word  (ie no spaces) is the only true name of the seq. So
anything other than the first word is discarded. putting underscores for me
works.

on a sidenote does ur 3rd line work?
it doesn't on my 1.5rc1
I had to add the bold line which was missing in the POD doc.
dont' think it was the use strict pragma
    open MYIN,"<$file" or die "Can't open input alignment";
    open MYOUT, ">$file2" or die "can't write to output";
    my $in  = Bio::AlignIO->newFh(-fh     => \*MYIN,
                               -format => 'fasta');
    my $out = Bio::AlignIO->newFh(-fh     =>  \*MYOUT,
                               -format => 'clustalw');
    print $out $_ while <$in>;

Cheers
kevin


From sb at mrc-dunn.cam.ac.uk  Wed Jun 14 03:49:10 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 14 Jun 2006 08:49:10 +0100
Subject: [Bioperl-l] Blast or blat against custom db?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
Message-ID: <448FBF76.1090505@mrc-dunn.cam.ac.uk>

Kevin Brown wrote:
[snip]
> So, my thought was to utilize something like blastall from within the
> script to find other oligos of similar match, but it means that I need
> to dump out the oligos designed, create the db with formatdb. [snip]
> I'm just trying to figure out how to do it all without leaving the
> script, but as yet haven't noticed a way to create a db from within perl
> using bioperl?
> 
> Any thoughts on directions I should look?

AFAIK there's no bioperl interface onto formatdb, but the way to do it 
is make a fasta file (perhaps using bioperl) with all the oligos (what 
you want to become the db), then use a perl system call (or similar) to 
run formatdb. Still in the same script you'd then run and analyse the 
blast with bioperl calls (presumably starting with StandAloneBlast - 
http://bioperl.org/wiki/HOWTO:Beginners#BLAST if you need it).

Just be sure to carefully craft your blast parameters so they're 
suitable for oligo-sized matches and test the 3' base of hits are identical.


From MEC at stowers-institute.org  Wed Jun 14 09:47:59 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 14 Jun 2006 08:47:59 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>

 
Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT; 

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu] 
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; 
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs 
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret 
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the 
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for 
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175 
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 
>6/13/2006
>
>


From oldham at ucla.edu  Tue Jun 13 22:03:04 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Tue, 13 Jun 2006 19:03:04 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDOEOLCJAA.oldham@ucla.edu>

Dear Malcolm, Chris, et al,

Thanks to everyone for your helpful suggestions.  When I run the code
below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
output file is still blank.  If I replace this list with a single ID
("542_at"), it works:

>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
GCGCAGCAGCGAGAATTTCGACGAG
>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
GAATTTCGACGAGCTGCTGAAGGCA
>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
CGACGAGCTGCTGAAGGCACTGGGT
........etc.

If I try a list of two IDs ("542_at" and "31799_at"), only the last one
is present in the output:

>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; Antisense;
GTTCATCACAAATCTATTGTGCTTG
>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
Antisense;
GTCCACTAAATGTAGTAACGAAATG
>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
Antisense;
TCCACTAAATGTAGTAACGAAATGT
........etc.

The same thing seems to happen if I go to 3 IDs, or 4 IDs (only the last
ID is present in the output file).  At this point I have no idea why
this is happening, and I am not sure how to interpret Malcolm's comment:

oops,

s/matches on of/matches one of/
s/nothing that/noting that/

Any ideas?  Thanks again................!

Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
			print OUT scalar(<PROBES>);
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Monday, June 12, 2006 8:48 AM
To: Cook, Malcolm; Chris Fields; Michael Oldham
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


oops,

s/matches on of/matches one of/
s/nothing that/noting that/

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>the regex matches anything)?  If there is nothing printed
>then either
>>the regex isn't working as expected or there is something logically
>>wrong.  The problem may be that the captured string must
>match the id
>>exactly, the id being the key to the %ID hash; any extra characters
>>picked up by the regex outside of your id key and you will not get
>>anything.  Looking at Malcolm's regex it should work just fine, but
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the
>>matched description lines to the outfile (no sequence) even if there
>>is a match?  Or is this what you wanted?   If you want the sequence
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 6/11/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006


From s_maheshwari84 at rediffmail.com  Thu Jun 15 07:42:24 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 Jun 2006 11:42:24 -0000
Subject: [Bioperl-l] simple problem plz look
Message-ID: <20060615114224.21669.qmail@webmail31.rediffmail.com>

I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI


From rkulasekaran at accelrys.com  Thu Jun 15 08:06:30 2006
From: rkulasekaran at accelrys.com (rkulasekaran at accelrys.com)
Date: Thu, 15 Jun 2006 17:36:30 +0530
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <OF88050CF5.C0508A24-ON6525718E.00425D40-6525718E.00428384@accelrys.com>

Hi,

Can you declare the array ( my @data ) before reading the index.

I guess that will work fine.

- Raja


"saurabh maheshwari" <s_maheshwari84 at rediffmail.com> 
Sent by: bioperl-l-bounces at lists.open-bio.org
15/06/2006 17:12
Please respond to
saurabh maheshwari <s_maheshwari84 at rediffmail.com>


To
bioperl-l at lists.open-bio.org
cc

Subject
[Bioperl-l] simple problem plz look


I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Click on the link below to report this email as spam
https://www.mailcontrol.com/sr/behF6u7j0vHYfoNqVfMn0T6lftsSPmT67PBEri3aA93L4mIZnnEsbOOgcm5LPEUItueIAtlw4aAQAjnhffjwxluskn5SCC6PU4sqvHqdy3UBLnb7IgqQIpogrs47CqHnPsig3hjMwg17c5A4zs49QdfwQIXZ3EkZGQpytOaqXTas8SlXA7tRyL!Oh9pq4bqQJsTF3icLnDHTJZLEigD5cPnlrScQD5EK 


From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 08:52:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 13:52:53 +0100
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
References: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <44915825.8040902@mrc-dunn.cam.ac.uk>

saurabh maheshwari wrote:
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>

I don't think general Perl problems are appropriate for this list.
Try subscribing to the beginners mailing list via http://learn.perl.org/

But in any case, say:
my @data;
$data[0][0] = 'P_p';


From cjfields at uiuc.edu  Thu Jun 15 11:18:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 10:18:32 -0500
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <002001c6908e$f8b11b30$15327e82@pyrimidine>

And exactly how is this applicable to BioPerl?

Start here:

http://learn.perl.org/

My guess: you need to declare 'my @data;' first.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Thursday, June 15, 2006 6:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] simple problem plz look
> 
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>
> 
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 13:42:52 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 10:42:52 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
Message-ID: <44919C1C.1060901@email.arizona.edu>

We are unable to parse BLAST 2.2.14 results from the NCBI website using 
SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
bioperl-live, but when users download either plain text or HTML blast 
outputs from the NCBI page, SearchIO cannot parse them.  This used to 
work prior to BLAST 2.2.14.  Should I try installing the entire 
bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
that makes any difference.)

Thanks,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597


From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 15:00:38 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 20:00:38 +0100
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
	<44919C1C.1060901@email.arizona.edu>
Message-ID: <4491AE56.6090505@mrc-dunn.cam.ac.uk>

Susan J. Miller wrote:
> We are unable to parse BLAST 2.2.14 results from the NCBI website using 
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
> bioperl-live, but when users download either plain text or HTML blast 
> outputs from the NCBI page, SearchIO cannot parse them.  This used to 
> work prior to BLAST 2.2.14.  Should I try installing the entire 
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
> that makes any difference.)

Parsing saved results from the website works fine here. Please be more 
specific in what you mean by 'unable to parse'. What error messages do 
you get? What exact code did you use to get those errors? Exactly what 
input data did you use? Exactly how did you generate that data?

Cheers,
Sendu.


From cjfields at uiuc.edu  Thu Jun 15 17:06:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 16:06:13 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
Message-ID: <002701c690bf$8b732410$15327e82@pyrimidine>

Bio::SearchIO can't handle HTML output directly; you have to junk the tags
first, and we can't really guarantee anymore that will work either (I
haven't tried it).  The FAQ tells you how:

http://www.bioperl.org/wiki/FAQ

I would avoid HTML parsing altogether.  The only sure-fire method that will
always work, according to NCBI, is XML output, and that's parsable using
Bio::SearchIO::blastxml.  You can also try tabular format, which
Bio::SearchIO::blasttable can parse as well.

However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
as well using BLASTP (and that's still set up to parse text output using
SearchIO I believe).  Could you give us an example of the type of BLAST you
were running, the sequence you used, and the error you had?  It could be
program-specific output that may be causing the problems.  The last time
text parsing broke it was changes specifically to only BLASTN/TBLASTX output
or something along those lines.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Susan J. Miller
> Sent: Thursday, June 15, 2006 12:43 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
> 
> We are unable to parse BLAST 2.2.14 results from the NCBI website using
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in
> bioperl-live, but when users download either plain text or HTML blast
> outputs from the NCBI page, SearchIO cannot parse them.  This used to
> work prior to BLAST 2.2.14.  Should I try installing the entire
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if
> that makes any difference.)
> 
> Thanks,
> -susan
> 
> Susan J. Miller
> Biotechnology Computing Facility
> Arizona Research Laboratories
> Bio West 228
> University of Arizona
> Tucson, AZ  85721
> (520) 626-2597
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 17:43:59 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 14:43:59 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <002701c690bf$8b732410$15327e82@pyrimidine>
References: <002701c690bf$8b732410$15327e82@pyrimidine>
Message-ID: <4491D49F.4030208@email.arizona.edu>

Chris Fields wrote:
> 
> However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
> to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
> as well using BLASTP (and that's still set up to parse text output using
> SearchIO I believe).  Could you give us an example of the type of BLAST you
> were running, the sequence you used, and the error you had?  It could be
> program-specific output that may be causing the problems.  The last time
> text parsing broke it was changes specifically to only BLASTN/TBLASTX output
> or something along those lines.

Hi Chris and Sendu,

Thanks for your replies.  I am using blastp from the NCBI BLAST page, 
with this input sequence:

MSRLLWRKVAGATVGPGPVPAPGRWVSSSVPASDPSDGQRRRQQQQQQQQQQQQQPQQPQVLSSEGGQLR
HNPLDIQMLSRGLHEQIFGQGGEMPGEAAVRRSVEHLQKHGLWGQPAVPLPDVELRLPPLYGDNLDQHFR
LLAQKQSLPYLEAANLLLQAQLPPKPPAWAWAEGWTRYGPEGEAVPVAIPEERALVFDVEVCLAEGTCPT
LAVAISPSAWYSWCSQRLVEERYSWTSQLSPADLIPLEVPTGASSPTQRDWQEQLVVGHNVSFDRAHIRE
QYLIQGSRMRFLDTMSMHMAISGLSSFQRSLWIAAKQGKHKVQPPTKQGQKSQRKARRGPAISSWDWLDI

I have tried saving HTML (with and without the graphical overview), 
plain text, and XML.  I am parsing with this script:

#!/usr/local/bin/perl -w

use Bio::SearchIO;

while ($fil = shift(@ARGV)) {

   $srchio = new Bio::SearchIO ('-format' => 'blast', '-file' => $fil);
   while ($result = $srchio->next_result) {

         $db = $result->database_name;
         $alg = $result->algorithm;
         print "DB $db\n ALG $alg\n";

         $qid = $result->query_name;
         print "QRY $qid\n";

         while ($hit = $result->next_hit) {

           $hitnam = $hit->name;
           print "\t$hitnam\n";

           $nhsp = 0;
           while ($hit->next_hsp) {
                 $nhsp++;
           }
           print "\tHSPS: $nhsp\n";
         } # end next_hit
   }
}

Interestingly, the results are different (but never correct) for the 
different types of output I've tried.  For xml, the script runs but 
produces no output, for plain text the script hangs with no output, and 
for html, I get these errors:


-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|27502689|gb|AAH42571.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 308.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 308.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|21779923|gb|AAM77583.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 333.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 333.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|1644239|dbj|BAA12223.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 358.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 358.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline Positives = 270/273 (98%), Gaps = 0/273 (0%) 
Query 78
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/blast.pm:1172
STACK: ./srchio.pl:8


At this point I should probably try installing all of bioperl-live, or 
at least get IteratedSearchResultEventBuilder.pm - or would you 
recommend something else?  Let me know if you need more info.

Thanks again,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597


From cjfields at uiuc.edu  Thu Jun 15 19:03:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 18:03:37 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <4491D49F.4030208@email.arizona.edu>
Message-ID: <002b01c690cf$efa05510$15327e82@pyrimidine>

...

> Hi Chris and Sendu,
> 
> Thanks for your replies.  I am using blastp from the NCBI BLAST page,
> with this input sequence:

...

> I have tried saving HTML (with and without the graphical overview),
> plain text, and XML.  I am parsing with this script:


> #!/usr/local/bin/perl -w
> 
> use Bio::SearchIO;
> ...
> }

I got this script to work.  I used your sequence and retrieved BLASTP text
output from NCBI BLASTP 2.2.14, then saved it from the web browser, and just
copied it to three separate files.  Using those files as input, they all
parse fine, with output like this:

DB All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding
environmental samples
 ALG BLASTP
QRY
        gi|27502689|gb|AAH42571.1|
        HSPS: 1
        gi|21779923|gb|AAM77583.1|
        HSPS: 1
...

> Interestingly, the results are different (but never correct) for the
> different types of output I've tried.  For xml, the script runs but
> produces no output, for plain text the script hangs with no output, and
> for html, I get these errors:

What's interesting is that HTML did anything at all.  You MUST strip out the
HTML tags as per the FAQ, which I pointed out before:

http://www.bioperl.org/wiki/FAQ

See the question : Does Bio::SearchIO parse the HTML output that BLAST
creates using the -T option?

Again, I would NOT attempt parsing HTML.  The only reason we have a FAQ
question about it is b/c it popped up on the list many many times in the
past (i.e. it is a FAQ) and someone found out that HTML::Strip works.  We
will never adequately support it beyond suggesting stripping the tags out.
NCBI changes their HTML output more often than their text output.

If you tried parsing XML with the format set to 'blast' you'll get nothing
(the blast text parser looks for text output using regexes, so it just
bypasses all the XML tags).  You must set:

-format => 'blastxml' 

You'll also need to install XML::SAX, and I would suggest installing
XML::SAX::ExpatXS and the Expat XML parser for your system to speed things
up.

The 'hanging' you mention using text parsing sounds like the old bug where
it got caught in an infinite loop.  I don't have this problem.  It could be
a couple of things:

1) You have an old version of bioperl and updated Bio::SearchIO, but you
haven't updated Bio::SearchIO::blast. That's the plugin module where the
error was (not Bio::SearchIO).  Try updating either that or install the
entire distribution from scratch.

2) You have two versions of Bioperl installed (an old one and bioperl-live)
and perl is using the old version of bioperl (and the old version of
SearchIO::blast).  Make sure you only have one version installed and that it
is bioperl-live.

> At this point I should probably try installing all of bioperl-live, or
> at least get IteratedSearchResultEventBuilder.pm - or would you
> recommend something else?  Let me know if you need more info.

If you have the entire distribution installed, you should have ISREB anyway.
ISREB (IteratedSearchResultEventBuilder) has nothing to do with the problems
here, though.

Chris

> Thanks again,
> -susan


From cain at cshl.edu  Thu Jun 15 11:25:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 15 Jun 2006 11:25:54 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
Message-ID: <1150385154.2622.152.camel@localhost.localdomain>

Hi all,

When running make test on a fairly new system, I got the following
failure:

t/Compatible.................No Set::Scalar. Unable to test Bio::Tree::Compatible
Can't locate Set/Scalar.pm in @INC
....
BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Tree/Compatible.pm line 138.
Compilation failed in require at t/Compatible.t line 42.
BEGIN failed--compilation aborted at t/Compatible.t line 42.
t/Compatible.................dubious                                         
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Set::Scalar is mentioned in Makefile.PL as an optional package (but not
required) and isn't mentioned in the INSTALL doc anywhere.  It looks
like the author of the test (t/Compatible.t) is trying to skip this test
if Set::Scalar isn't found, but the 'dubious' result gets marked
ultimately as a failure.

What is the right thing to do here?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/8cb53ee4/attachment-0002.bin>

From hlapp at gmx.net  Fri Jun 16 00:42:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 16 Jun 2006 00:42:25 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
In-Reply-To: <1150385154.2622.152.camel@localhost.localdomain>
References: <1150385154.2622.152.camel@localhost.localdomain>
Message-ID: <D4E96C47-977E-474C-B093-82CDE775F6C1@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Should be fixed on the main trunk. -hilmar

On Jun 15, 2006, at 11:25 AM, Scott Cain wrote:

> Hi all,
>
> When running make test on a fairly new system, I got the following
> failure:
>
> t/Compatible.................No Set::Scalar. Unable to test  
> Bio::Tree::Compatible
> Can't locate Set/Scalar.pm in @INC
> ....
> BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl- 
> live/blib/lib/Bio/Tree/Compatible.pm line 138.
> Compilation failed in require at t/Compatible.t line 42.
> BEGIN failed--compilation aborted at t/Compatible.t line 42.
> t/Compatible.................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Set::Scalar is mentioned in Makefile.PL as an optional package (but  
> not
> required) and isn't mentioned in the INSTALL doc anywhere.  It looks
> like the author of the test (t/Compatible.t) is trying to skip this  
> test
> if Set::Scalar isn't found, but the 'dubious' result gets marked
> ultimately as a failure.
>
> What is the right thing to do here?
>
> Thanks,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFEkja5uV6N2JxL7qsRAjqCAJ9RTgPntJ+dmGHeiovS5FeG3QvZagCeMzmw
sKkizbLUYAsyJqVw/2SplcQ=
=ehd6
-----END PGP SIGNATURE-----


From rmb32 at cornell.edu  Thu Jun 15 21:37:03 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 15 Jun 2006 18:37:03 -0700
Subject: [Bioperl-l] reading and writing GFF3
Message-ID: <44920B3F.90405@cornell.edu>

There is stuff in bioperl for reading and writing GFF3.  There's 
Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
is the 'best' one to use?

Neither of these is working very well for me.

My proximate use case is reading in a RepeatMasker report with 
Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
writing those out to a GFF3 file.

Bio::Tools::GFF will take these things and write out something that 
closely resembles GFF3, but with Target attributes that don't seem to 
comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
commas instead of spaces.  I'm attaching a little script that 
illustrates this.

Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
features contained in them, throwing 'only Bio::SeqFeature::Annotated 
objects are writeable'.  This seems a bit silly, since one of the whole 
points of Bioperl is using polymorphism to make it easy to connect 
things together.  I've attached a little script to illustrate this one too.

So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
deprecated?  Why does Bio::FeatureIO::gff only accept 
Bio::SeqFeature::Annotated objects?

Thanks in advance.

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_featureio_gff_test.pl
Type: application/x-perl
Size: 1455 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_tools_gff_test.pl
Type: application/x-perl
Size: 1436 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment-0005.bin>

From cain at cshl.edu  Fri Jun 16 10:18:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 10:18:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <44920B3F.90405@cornell.edu>
References: <44920B3F.90405@cornell.edu>
Message-ID: <1150467493.2622.209.camel@localhost.localdomain>

Hi Rob,

I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
but that is actually a good thing.  The tighter constraints results in a
better, more consistent file format.

The reason only BSF::Annotated features are writable is that there needs
to be tight control on the 'type' of the feature, to insure that the
type is part of the Sequence Ontology.  It also makes it much easier to
properly write out the attributes in the ninth column, particularly the
ones that are 'reserved', like Parent, Dbxref, and Ontology_term.

BTG is still usable, but the GFF3 it puts out is actually more
'GFF3-like'; that is, it looks like GFF3, but because there are no
constraints on the type and the terms that are used in the ninth column,
you have to be very careful using it to produce GFF3, by making sure
that your feature objects conform to the standard before BTG tries to
write them out.  (Of course, one way to do that would be to convert your
feature objects to BSF::Annotated objects, but then you could use
BFIO::gff :-)

[Long pause while scott goes and monkeys with Bio::Tools::GFF]

OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
this is completely valid.  (I even fixed the escaping the of the stray
'=' in 'hind_R=2046'.)  The output I get is this:

##gff-version 3
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120

Scott


On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> There is stuff in bioperl for reading and writing GFF3.  There's 
> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
> is the 'best' one to use?
> 
> Neither of these is working very well for me.
> 
> My proximate use case is reading in a RepeatMasker report with 
> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
> writing those out to a GFF3 file.
> 
> Bio::Tools::GFF will take these things and write out something that 
> closely resembles GFF3, but with Target attributes that don't seem to 
> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
> commas instead of spaces.  I'm attaching a little script that 
> illustrates this.
> 
> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
> objects are writeable'.  This seems a bit silly, since one of the whole 
> points of Bioperl is using polymorphism to make it easy to connect 
> things together.  I've attached a little script to illustrate this one too.
> 
> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
> deprecated?  Why does Bio::FeatureIO::gff only accept 
> Bio::SeqFeature::Annotated objects?
> 
> Thanks in advance.
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/572bd98b/attachment-0002.bin>

From rmb32 at cornell.edu  Fri Jun 16 14:36:22 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 11:36:22 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
References: <44920B3F.90405@cornell.edu>
	<1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <4492FA26.6030909@cornell.edu>

Thanks for the reply Scott.  It's good that the BSF::Annotated features 
control the type to be in the SO.  I sort of came to the "BTG is only 
gff3-/like/" conclusion myself as I poked around in the two modules in 
question, so I'd much rather use BSF::gff.  So I guess the question now 
is (and this will probably be a pretty common use case) how does one 
take an "old" Bio::SeqFeature::Generic or the like object and make it 
into a Bio::SeqFeature::Annotated?


Rob

Scott Cain wrote:
> Hi Rob,
>
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
>
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
>
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
>
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120
>
> Scott
>
>
>
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>   
>> There is stuff in bioperl for reading and writing GFF3.  There's 
>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
>> is the 'best' one to use?
>>
>> Neither of these is working very well for me.
>>
>> My proximate use case is reading in a RepeatMasker report with 
>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
>> writing those out to a GFF3 file.
>>
>> Bio::Tools::GFF will take these things and write out something that 
>> closely resembles GFF3, but with Target attributes that don't seem to 
>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
>> commas instead of spaces.  I'm attaching a little script that 
>> illustrates this.
>>
>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
>> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
>> objects are writeable'.  This seems a bit silly, since one of the whole 
>> points of Bioperl is using polymorphism to make it easy to connect 
>> things together.  I've attached a little script to illustrate this one too.
>>
>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
>> deprecated?  Why does Bio::FeatureIO::gff only accept 
>> Bio::SeqFeature::Annotated objects?
>>
>> Thanks in advance.
>>
>> Rob
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 16 15:12:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Jun 2006 14:12:28 -0500
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>

Scott, 

Looks like Robert also submitted a bug report related to this as well.
Could you check into it (pretty-please)?  I'm still GFF3-illiterate.

http://bugzilla.open-bio.org/show_bug.cgi?id=2025

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Scott Cain
> Sent: Friday, June 16, 2006 9:18 AM
> To: Robert Buels
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] reading and writing GFF3
> 
> Hi Rob,
> 
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
> 
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> 
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
> 
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> 
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
> 
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> 918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> 488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> 1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> 312     -       .       Target=hind_R%3D2046 59 120
> 
> Scott
> 
> 
> 
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > There is stuff in bioperl for reading and writing GFF3.  There's
> > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > is the 'best' one to use?
> >
> > Neither of these is working very well for me.
> >
> > My proximate use case is reading in a RepeatMasker report with
> > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > writing those out to a GFF3 file.
> >
> > Bio::Tools::GFF will take these things and write out something that
> > closely resembles GFF3, but with Target attributes that don't seem to
> > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > commas instead of spaces.  I'm attaching a little script that
> > illustrates this.
> >
> > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > objects are writeable'.  This seems a bit silly, since one of the whole
> > points of Bioperl is using polymorphism to make it easy to connect
> > things together.  I've attached a little script to illustrate this one
> too.
> >
> > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > deprecated?  Why does Bio::FeatureIO::gff only accept
> > Bio::SeqFeature::Annotated objects?
> >
> > Thanks in advance.
> >
> > Rob
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory


From rmb32 at cornell.edu  Fri Jun 16 15:30:23 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:30:23 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <449306CF.1030301@cornell.edu>

Woops, I should have said something about that.  I submitted it before I 
saw that Scott had already done the escaping in CVS.

Chris Fields wrote:
> Scott, 
>
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>> Sent: Friday, June 16, 2006 9:18 AM
>> To: Robert Buels
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>
>> Hi Rob,
>>
>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>> but that is actually a good thing.  The tighter constraints results in a
>> better, more consistent file format.
>>
>> The reason only BSF::Annotated features are writable is that there needs
>> to be tight control on the 'type' of the feature, to insure that the
>> type is part of the Sequence Ontology.  It also makes it much easier to
>> properly write out the attributes in the ninth column, particularly the
>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>
>> BTG is still usable, but the GFF3 it puts out is actually more
>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>> constraints on the type and the terms that are used in the ninth column,
>> you have to be very careful using it to produce GFF3, by making sure
>> that your feature objects conform to the standard before BTG tries to
>> write them out.  (Of course, one way to do that would be to convert your
>> feature objects to BSF::Annotated objects, but then you could use
>> BFIO::gff :-)
>>
>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>
>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>> this is completely valid.  (I even fixed the escaping the of the stray
>> '=' in 'hind_R=2046'.)  The output I get is this:
>>
>> ##gff-version 3
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>> 918     -       .       Target=Contig151 325 832
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>> 488     -       .       Target=Contig386 1 124
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>> 1718    +       .       Target=Contig358 1 311
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>> 312     -       .       Target=hind_R%3D2046 59 120
>>
>> Scott
>>
>>
>>
>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>     
>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>> is the 'best' one to use?
>>>
>>> Neither of these is working very well for me.
>>>
>>> My proximate use case is reading in a RepeatMasker report with
>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>> writing those out to a GFF3 file.
>>>
>>> Bio::Tools::GFF will take these things and write out something that
>>> closely resembles GFF3, but with Target attributes that don't seem to
>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>> commas instead of spaces.  I'm attaching a little script that
>>> illustrates this.
>>>
>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>> points of Bioperl is using polymorphism to make it easy to connect
>>> things together.  I've attached a little script to illustrate this one
>>>       
>> too.
>>     
>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>> Bio::SeqFeature::Annotated objects?
>>>
>>> Thanks in advance.
>>>
>>> Rob
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Fri Jun 16 15:34:16 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:34:16 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150486453.4412.30.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
Message-ID: <449307B8.5040802@cornell.edu>

So about that converting ye olde feature objects into 
Bio::SeqFeature::Annotated objects.  How do I do it?


Scott Cain wrote:
> That's OK--You added a few items that should be escaped that weren't, so
> I added those too.
>
> Thanks,
> Scott
>
>
> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>   
>> Woops, I should have said something about that.  I submitted it before
>> I saw that Scott had already done the escaping in CVS.
>>
>> Chris Fields wrote: 
>>     
>>> Scott, 
>>>
>>> Looks like Robert also submitted a bug report related to this as well.
>>> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>>>
>>> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>>>
>>> Chris
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>>>> Sent: Friday, June 16, 2006 9:18 AM
>>>> To: Robert Buels
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>>>
>>>> Hi Rob,
>>>>
>>>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>>>> but that is actually a good thing.  The tighter constraints results in a
>>>> better, more consistent file format.
>>>>
>>>> The reason only BSF::Annotated features are writable is that there needs
>>>> to be tight control on the 'type' of the feature, to insure that the
>>>> type is part of the Sequence Ontology.  It also makes it much easier to
>>>> properly write out the attributes in the ninth column, particularly the
>>>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>>>
>>>> BTG is still usable, but the GFF3 it puts out is actually more
>>>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>>>> constraints on the type and the terms that are used in the ninth column,
>>>> you have to be very careful using it to produce GFF3, by making sure
>>>> that your feature objects conform to the standard before BTG tries to
>>>> write them out.  (Of course, one way to do that would be to convert your
>>>> feature objects to BSF::Annotated objects, but then you could use
>>>> BFIO::gff :-)
>>>>
>>>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>>>
>>>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>>>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>>>> this is completely valid.  (I even fixed the escaping the of the stray
>>>> '=' in 'hind_R=2046'.)  The output I get is this:
>>>>
>>>> ##gff-version 3
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>>>> 918     -       .       Target=Contig151 325 832
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>>>> 488     -       .       Target=Contig386 1 124
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>>>> 1718    +       .       Target=Contig358 1 311
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>>>> 312     -       .       Target=hind_R%3D2046 59 120
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>>>     
>>>>         
>>>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>>>> is the 'best' one to use?
>>>>>
>>>>> Neither of these is working very well for me.
>>>>>
>>>>> My proximate use case is reading in a RepeatMasker report with
>>>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>>>> writing those out to a GFF3 file.
>>>>>
>>>>> Bio::Tools::GFF will take these things and write out something that
>>>>> closely resembles GFF3, but with Target attributes that don't seem to
>>>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>>>> commas instead of spaces.  I'm attaching a little script that
>>>>> illustrates this.
>>>>>
>>>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>>>> points of Bioperl is using polymorphism to make it easy to connect
>>>>> things together.  I've attached a little script to illustrate this one
>>>>>       
>>>>>           
>>>> too.
>>>>     
>>>>         
>>>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>>>> Bio::SeqFeature::Annotated objects?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Rob
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>     
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>   
>>>       
>> -- 
>> Robert Buels
>> SGN Bioinformatics Analyst
>> 252A Emerson Hall, Cornell University
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>>
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Fri Jun 16 15:28:52 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:28:52 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <1150486133.4412.25.camel@localhost.localdomain>

I tweaked the patch and applied it, and closed the bug.

Thanks for pointing it out--I doubt I would have noticed it in the
bioper-guts mailing, which I generally don't look too closely at :-o

Scott


On Fri, 2006-06-16 at 14:12 -0500, Chris Fields wrote:
> Scott, 
> 
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > Sent: Friday, June 16, 2006 9:18 AM
> > To: Robert Buels
> > Cc: bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] reading and writing GFF3
> > 
> > Hi Rob,
> > 
> > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > but that is actually a good thing.  The tighter constraints results in a
> > better, more consistent file format.
> > 
> > The reason only BSF::Annotated features are writable is that there needs
> > to be tight control on the 'type' of the feature, to insure that the
> > type is part of the Sequence Ontology.  It also makes it much easier to
> > properly write out the attributes in the ninth column, particularly the
> > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > 
> > BTG is still usable, but the GFF3 it puts out is actually more
> > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > constraints on the type and the terms that are used in the ninth column,
> > you have to be very careful using it to produce GFF3, by making sure
> > that your feature objects conform to the standard before BTG tries to
> > write them out.  (Of course, one way to do that would be to convert your
> > feature objects to BSF::Annotated objects, but then you could use
> > BFIO::gff :-)
> > 
> > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > 
> > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > this is completely valid.  (I even fixed the escaping the of the stray
> > '=' in 'hind_R=2046'.)  The output I get is this:
> > 
> > ##gff-version 3
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > 918     -       .       Target=Contig151 325 832
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > 488     -       .       Target=Contig386 1 124
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > 1718    +       .       Target=Contig358 1 311
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > 312     -       .       Target=hind_R%3D2046 59 120
> > 
> > Scott
> > 
> > 
> > 
> > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > is the 'best' one to use?
> > >
> > > Neither of these is working very well for me.
> > >
> > > My proximate use case is reading in a RepeatMasker report with
> > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > writing those out to a GFF3 file.
> > >
> > > Bio::Tools::GFF will take these things and write out something that
> > > closely resembles GFF3, but with Target attributes that don't seem to
> > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > commas instead of spaces.  I'm attaching a little script that
> > > illustrates this.
> > >
> > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > points of Bioperl is using polymorphism to make it easy to connect
> > > things together.  I've attached a little script to illustrate this one
> > too.
> > >
> > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > Bio::SeqFeature::Annotated objects?
> > >
> > > Thanks in advance.
> > >
> > > Rob
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/912257e8/attachment-0002.bin>

From cain at cshl.edu  Fri Jun 16 15:34:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:34:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449306CF.1030301@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
Message-ID: <1150486453.4412.30.camel@localhost.localdomain>

That's OK--You added a few items that should be escaped that weren't, so
I added those too.

Thanks,
Scott


On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> Woops, I should have said something about that.  I submitted it before
> I saw that Scott had already done the escaping in CVS.
> 
> Chris Fields wrote: 
> > Scott, 
> > 
> > Looks like Robert also submitted a bug report related to this as well.
> > Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> > 
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> > 
> > Chris
> > 
> >   
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > > Sent: Friday, June 16, 2006 9:18 AM
> > > To: Robert Buels
> > > Cc: bioperl-l at bioperl.org
> > > Subject: Re: [Bioperl-l] reading and writing GFF3
> > > 
> > > Hi Rob,
> > > 
> > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > > but that is actually a good thing.  The tighter constraints results in a
> > > better, more consistent file format.
> > > 
> > > The reason only BSF::Annotated features are writable is that there needs
> > > to be tight control on the 'type' of the feature, to insure that the
> > > type is part of the Sequence Ontology.  It also makes it much easier to
> > > properly write out the attributes in the ninth column, particularly the
> > > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > > 
> > > BTG is still usable, but the GFF3 it puts out is actually more
> > > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > > constraints on the type and the terms that are used in the ninth column,
> > > you have to be very careful using it to produce GFF3, by making sure
> > > that your feature objects conform to the standard before BTG tries to
> > > write them out.  (Of course, one way to do that would be to convert your
> > > feature objects to BSF::Annotated objects, but then you could use
> > > BFIO::gff :-)
> > > 
> > > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > > 
> > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > > this is completely valid.  (I even fixed the escaping the of the stray
> > > '=' in 'hind_R=2046'.)  The output I get is this:
> > > 
> > > ##gff-version 3
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > > 918     -       .       Target=Contig151 325 832
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > > 488     -       .       Target=Contig386 1 124
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > > 1718    +       .       Target=Contig358 1 311
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > > 312     -       .       Target=hind_R%3D2046 59 120
> > > 
> > > Scott
> > > 
> > > 
> > > 
> > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > >     
> > > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > > is the 'best' one to use?
> > > > 
> > > > Neither of these is working very well for me.
> > > > 
> > > > My proximate use case is reading in a RepeatMasker report with
> > > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > > writing those out to a GFF3 file.
> > > > 
> > > > Bio::Tools::GFF will take these things and write out something that
> > > > closely resembles GFF3, but with Target attributes that don't seem to
> > > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > > commas instead of spaces.  I'm attaching a little script that
> > > > illustrates this.
> > > > 
> > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > > points of Bioperl is using polymorphism to make it easy to connect
> > > > things together.  I've attached a little script to illustrate this one
> > > >       
> > > too.
> > >     
> > > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > > Bio::SeqFeature::Annotated objects?
> > > > 
> > > > Thanks in advance.
> > > > 
> > > > Rob
> > > > 
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >       
> > > --
> > > ------------------------------------------------------------------------
> > > Scott Cain, Ph. D.                                         cain at cshl.edu
> > > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > > Cold Spring Harbor Laboratory
> > >     
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >   
> 
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/3dfde2ea/attachment-0002.bin>

From cain at cshl.edu  Fri Jun 16 15:55:31 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:55:31 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449307B8.5040802@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
Message-ID: <1150487731.4412.35.camel@localhost.localdomain>

Um, yeah, good question.  The reason I didn't answer you when you wrote
before is that I was hoping for divine inspiration for an answer (or for
somebody else to answer, which would have been really great :-)

The short answer (and easy one for me to type) is that you will probably
need an ad hoc method to do it, which is the same thing I do when I need
to convert gff2 to gff3, to make sure the things I need mapped get
mapped the 'right' way (that is, the way I want them to go).  I don't
have any sample code that does this, but if you want to start working up
an ad hoc method, I will certainly try to help you as much as I can.

Scott


On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> So about that converting ye olde feature objects into 
> Bio::SeqFeature::Annotated objects.  How do I do it?
> 
> 
> Scott Cain wrote:
> > That's OK--You added a few items that should be escaped that weren't, so
> > I added those too.
> >
> > Thanks,
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >   
> >> Woops, I should have said something about that.  I submitted it before
> >> I saw that Scott had already done the escaping in CVS.
> >>
> >> Chris Fields wrote: 
> >>     
> >>> Scott, 
> >>>
> >>> Looks like Robert also submitted a bug report related to this as well

From rmb32 at cornell.edu  Fri Jun 16 16:31:08 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 13:31:08 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150487731.4412.35.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
Message-ID: <4493150C.1080909@cornell.edu>

Rather than cobble together some ad-hoc solution, I would be interested 
in working on a good solution to this problem, because it seems like 
it's just going to get more common as more people start wanting to write 
GFF3.  What about some code in whatever customarily makes these objects 
(probably BSF::Annotated's new() method?) that could take another type 
of Feature object and attempt to shoehorn its data into a new 
BSF::Annotated?  If it failed (because the type isn't in SO or 
whatever), it could throw() some informative error message.

Then, people could write straightforward code something like:

while(my $oldstylefeature = $features_in->next_feature) {
    $oldstylefeature->primary_tag('something_that_is_in_so');
    $oldstylefeature->something_else('some other something that needs to 
be changed for compliance');
    my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
    $gff3_out->write_feature($newfeature);
}

Does that sound like a good idea?  I'd be more than willing to implement 
this, since I'm going to need to do this sort of thing with many more 
things than just RepeatMasker.

Rob

Scott Cain wrote:
> Um, yeah, good question.  The reason I didn't answer you when you wrote
> before is that I was hoping for divine inspiration for an answer (or for
> somebody else to answer, which would have been really great :-)
>
> The short answer (and easy one for me to type) is that you will probably
> need an ad hoc method to do it, which is the same thing I do when I need
> to convert gff2 to gff3, to make sure the things I need mapped get
> mapped the 'right' way (that is, the way I want them to go).  I don't
> have any sample code that does this, but if you want to start working up
> an ad hoc method, I will certainly try to help you as much as I can.
>
> Scott
>
>
> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>   
>> So about that converting ye olde feature objects into 
>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>
>>
>> Scott Cain wrote:
>>     
>>> That's OK--You added a few items that should be escaped that weren't, so
>>> I added those too.
>>>
>>> Thanks,
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> Woops, I should have said something about that.  I submitted it before
>>>> I saw that Scott had already done the escaping in CVS.
>>>>
>>>> Chris Fields wrote: 
>>>>     
>>>>         
>>>>> Scott, 
>>>>>
>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Sat Jun 17 06:36:59 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 03:36:59 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>	
	<1150486453.4412.30.camel@localhost.localdomain>	
	<449307B8.5040802@cornell.edu>	
	<1150487731.4412.35.camel@localhost.localdomain>	
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <4493DB4B.4020509@cornell.edu>

Yep.  I'm almost finished with the first draft of a function that does 
this.  I'll polish it up over the weekend then on Monday I'll submit a 
bugzilla bug and patch with it so you can take a look.

Rob

Scott Cain wrote:
> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>   
>> Rather than cobble together some ad-hoc solution, I would be interested 
>> in working on a good solution to this problem, because it seems like 
>> it's just going to get more common as more people start wanting to write 
>> GFF3.  What about some code in whatever customarily makes these objects 
>> (probably BSF::Annotated's new() method?) that could take another type 
>> of Feature object and attempt to shoehorn its data into a new 
>> BSF::Annotated?  If it failed (because the type isn't in SO or 
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that needs to 
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to implement 
>> this, since I'm going to need to do this sort of thing with many more 
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>     
>>> Um, yeah, good question.  The reason I didn't answer you when you wrote
>>> before is that I was hoping for divine inspiration for an answer (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will probably
>>> need an ad hoc method to do it, which is the same thing I do when I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>> have any sample code that does this, but if you want to start working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> So about that converting ye olde feature objects into 
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>     
>>>>         
>>>>> That's OK--You added a few items that should be escaped that weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Woops, I should have said something about that.  I submitted it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote: 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Scott, 
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>               

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Fri Jun 16 23:56:44 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 23:56:44 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <4493150C.1080909@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
Message-ID: <1150516605.2600.9.camel@localhost.localdomain>

Rob,

I came to the same conclusion as well; I wrote my response as I was
heading out the door and while I was running errands, I realized the
right thing to do is to write a Bio::SeqFeature::Annotated method called
new_from_object, whose usage would be:

  my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);

where you would give it a Bio::SeqFeatureI compliant object and try to
create a BSFA like use suggested below.  You could allow passing in args
to control how different things are handled, like mapping non-SO types
to SO types.  I'll think about this over the weekend and let you know if
brilliance strikes me.

Scott


On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> Rather than cobble together some ad-hoc solution, I would be interested 
> in working on a good solution to this problem, because it seems like 
> it's just going to get more common as more people start wanting to write 
> GFF3.  What about some code in whatever customarily makes these objects 
> (probably BSF::Annotated's new() method?) that could take another type 
> of Feature object and attempt to shoehorn its data into a new 
> BSF::Annotated?  If it failed (because the type isn't in SO or 
> whatever), it could throw() some informative error message.
> 
> Then, people could write straightforward code something like:
> 
> while(my $oldstylefeature = $features_in->next_feature) {
>     $oldstylefeature->primary_tag('something_that_is_in_so');
>     $oldstylefeature->something_else('some other something that needs to 
> be changed for compliance');
>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>     $gff3_out->write_feature($newfeature);
> }
> 
> Does that sound like a good idea?  I'd be more than willing to implement 
> this, since I'm going to need to do this sort of thing with many more 
> things than just RepeatMasker.
> 
> Rob
> 
> Scott Cain wrote:
> > Um, yeah, good question.  The reason I didn't answer you when you wrote
> > before is that I was hoping for divine inspiration for an answer (or for
> > somebody else to answer, which would have been really great :-)
> >
> > The short answer (and easy one for me to type) is that you will probably
> > need an ad hoc method to do it, which is the same thing I do when I need
> > to convert gff2 to gff3, to make sure the things I need mapped get
> > mapped the 'right' way (that is, the way I want them to go).  I don't
> > have any sample code that does this, but if you want to start working up
> > an ad hoc method, I will certainly try to help you as much as I can.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >   
> >> So about that converting ye olde feature objects into 
> >> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>
> >>
> >> Scott Cain wrote:
> >>     
> >>> That's OK--You added a few items that should be escaped that weren't, so
> >>> I added those too.
> >>>
> >>> Thanks,
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>   
> >>>       
> >>>> Woops, I should have said something about that.  I submitted it before
> >>>> I saw that Scott had already done the escaping in CVS.
> >>>>
> >>>> Chris Fields wrote: 
> >>>>     
> >>>>         
> >>>>> Scott, 
> >>>>>
> >>>>> Looks like Robert also submitted a bug report related to this as well=
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/7ff49e0d/attachment-0002.bin>

From hlapp at gmx.net  Sat Jun 17 12:20:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 17 Jun 2006 12:20:08 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You don't need a new method for this. Instead, support a -feature  
argument.

	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);

This should work for any instance of Bio::SeqFeatureI. If it is a  
B::SF::Annotated already it is obviously just a deep copy (if copy is  
desired - could be another parameter). Otherwise more will be involved.

Alternatively, and possibly better, is to write a specialized  
SeqFeatureI factory (that would implement  
Bio::Factory::ObjectFactoryI) and then delegate this job to it:

	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
		-type_ontology => $sequence_ontology,
		-source_ontology => $feature_source_ontology,
		-unflatten => 1);
	my $bsfa = $feat_factory->create_object({-feature => $feature});

This is preferable because it separates business logic that isn't  
necessarily related into defined units. I.e., the logic necessary to  
convert an ordinary feature into a strongly typed one is different  
from how to represent a strongly typed feature. IMHO anyway ...

Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
started as the result of a discussion thread earlier this (or last?)  
year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
though not in concept.

Maybe we need to get together again and thrash out a strategy; or a  
BOF at the GMOD meeting? I feel this does need a core group of people  
who care, hash out a strategy that will also solve the backwards  
compatibility problem with the current Bio::SeqFeatureI state-of- 
limbo, and allow us to implement the decisions with a few people in a  
concentrated effort. This will then also remove the only real large  
stumbling block towards a 1.6 release.

Maybe we should think about a little pre-GMOD hackathon to clear up  
this mess? Scott, you'll be there a day early? I'll be already back  
and Jason I believe will still be in town, although he may have other  
commitments already. Nonetheless, it shouldn't really take that much  
but rather dedicated time, a whiteboard, and a few people who care  
thrashing this out and then do it.

Thoughts?

	-hilmar

On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:

> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method  
> called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> ($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in  
> args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you  
> know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>> Rather than cobble together some ad-hoc solution, I would be  
>> interested
>> in working on a good solution to this problem, because it seems like
>> it's just going to get more common as more people start wanting to  
>> write
>> GFF3.  What about some code in whatever customarily makes these  
>> objects
>> (probably BSF::Annotated's new() method?) that could take another  
>> type
>> of Feature object and attempt to shoehorn its data into a new
>> BSF::Annotated?  If it failed (because the type isn't in SO or
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that  
>> needs to
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>> ($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to  
>> implement
>> this, since I'm going to need to do this sort of thing with many more
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>> wrote
>>> before is that I was hoping for divine inspiration for an answer  
>>> (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will  
>>> probably
>>> need an ad hoc method to do it, which is the same thing I do when  
>>> I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>> don't
>>> have any sample code that does this, but if you want to start  
>>> working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>
>>>> So about that converting ye olde feature objects into
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>>> That's OK--You added a few items that should be escaped that  
>>>>> weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>
>>>>>
>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>> it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>> Scott,
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>> as well=
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> --------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
ImoAXD/jrbF0gXzSr2CY4tQ=
=XfDq
-----END PGP SIGNATURE-----


From rmb32 at cornell.edu  Sat Jun 17 14:36:28 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 11:36:28 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <44944BAC.7000302@cornell.edu>

I'd love to help more with this, since with the new tomato genome coming 
in my job is going to be working more and more with annotations, but I'm 
not a core person and I can't go to the meeting in NC.  In the interests 
of getting my job done right now, I've implemented a -feature argument 
to Bio::SeqFeature::Annotated's constructor, which calls uses a method 
from_feature() I added.  If you guys want it, it's attached to bug 2026.

 From the perspective of a casual bioperl user, anything you guys can do 
to make the handling of features and annotations less fragmented and 
more robust would be wonderful.  I'd be happy to help with 
implementation if one of you grizzled veterans would give me marching 
orders. :-)

Rob

Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> You don't need a new method for this. Instead, support a -feature 
> argument.
>
>     my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>
> This should work for any instance of Bio::SeqFeatureI. If it is a 
> B::SF::Annotated already it is obviously just a deep copy (if copy is 
> desired - could be another parameter). Otherwise more will be involved.
>
> Alternatively, and possibly better, is to write a specialized 
> SeqFeatureI factory (that would implement 
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>
>     my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>         -type_ontology => $sequence_ontology,
>         -source_ontology => $feature_source_ontology,
>         -unflatten => 1);
>     my $bsfa = $feat_factory->create_object({-feature => $feature});
>
> This is preferable because it separates business logic that isn't 
> necessarily related into defined units. I.e., the logic necessary to 
> convert an ordinary feature into a strongly typed one is different 
> from how to represent a strongly typed feature. IMHO anyway ...
>
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan 
> started as the result of a discussion thread earlier this (or last?) 
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted, 
> though not in concept.
>
> Maybe we need to get together again and thrash out a strategy; or a 
> BOF at the GMOD meeting? I feel this does need a core group of people 
> who care, hash out a strategy that will also solve the backwards 
> compatibility problem with the current Bio::SeqFeatureI 
> state-of-limbo, and allow us to implement the decisions with a few 
> people in a concentrated effort. This will then also remove the only 
> real large stumbling block towards a 1.6 release.
>
> Maybe we should think about a little pre-GMOD hackathon to clear up 
> this mess? Scott, you'll be there a day early? I'll be already back 
> and Jason I believe will still be in town, although he may have other 
> commitments already. Nonetheless, it shouldn't really take that much 
> but rather dedicated time, a whiteboard, and a few people who care 
> thrashing this out and then do it.
>
> Thoughts?
>
>     -hilmar
>
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>
>> Rob,
>>
>> I came to the same conclusion as well; I wrote my response as I was
>> heading out the door and while I was running errands, I realized the
>> right thing to do is to write a Bio::SeqFeature::Annotated method called
>> new_from_object, whose usage would be:
>>
>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, 
>> %args);
>>
>> where you would give it a Bio::SeqFeatureI compliant object and try to
>> create a BSFA like use suggested below.  You could allow passing in args
>> to control how different things are handled, like mapping non-SO types
>> to SO types.  I'll think about this over the weekend and let you know if
>> brilliance strikes me.
>>
>> Scott
>>
>>
>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>> Rather than cobble together some ad-hoc solution, I would be interested
>>> in working on a good solution to this problem, because it seems like
>>> it's just going to get more common as more people start wanting to 
>>> write
>>> GFF3.  What about some code in whatever customarily makes these objects
>>> (probably BSF::Annotated's new() method?) that could take another type
>>> of Feature object and attempt to shoehorn its data into a new
>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>> whatever), it could throw() some informative error message.
>>>
>>> Then, people could write straightforward code something like:
>>>
>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>     $oldstylefeature->something_else('some other something that 
>>> needs to
>>> be changed for compliance');
>>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>>     $gff3_out->write_feature($newfeature);
>>> }
>>>
>>> Does that sound like a good idea?  I'd be more than willing to 
>>> implement
>>> this, since I'm going to need to do this sort of thing with many more
>>> things than just RepeatMasker.
>>>
>>> Rob
>>>
>>> Scott Cain wrote:
>>>> Um, yeah, good question.  The reason I didn't answer you when you 
>>>> wrote
>>>> before is that I was hoping for divine inspiration for an answer 
>>>> (or for
>>>> somebody else to answer, which would have been really great :-)
>>>>
>>>> The short answer (and easy one for me to type) is that you will 
>>>> probably
>>>> need an ad hoc method to do it, which is the same thing I do when I 
>>>> need
>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>>> have any sample code that does this, but if you want to start 
>>>> working up
>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>
>>>>> So about that converting ye olde feature objects into
>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> That's OK--You added a few items that should be escaped that 
>>>>>> weren't, so
>>>>>> I added those too.
>>>>>>
>>>>>> Thanks,
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> Woops, I should have said something about that.  I submitted it 
>>>>>>> before
>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Scott,
>>>>>>>>
>>>>>>>> Looks like Robert also submitted a bug report related to this 
>>>>>>>> as well=
>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> -------------------------------------------------------------------------- 
>>
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
>
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Sat Jun 17 16:21:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 17 Jun 2006 15:21:37 -0500
Subject: [Bioperl-l] OT : Re:  reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1D0C8412-3705-47EF-9AAA-1DD0B09AD6B5@uiuc.edu>


On Jun 17, 2006, at 11:20 AM, Hilmar Lapp wrote:
>
> Maybe we need to get together again and thrash out a strategy; or a
> BOF at the GMOD meeting? I feel this does need a core group of people
> who care, hash out a strategy that will also solve the backwards
> compatibility problem with the current Bio::SeqFeatureI state-of-
> limbo, and allow us to implement the decisions with a few people in a
> concentrated effort. This will then also remove the only real large
> stumbling block towards a 1.6 release.

That would be fantastic!

A bit OT, but if plans are afoot for a 1.6 release maybe the 'core  
group' that meets at NC could start drawing up a list of ideas/plans  
towards that release, even if it is still a ways off.  A roadmap of  
sorts so the community knows where to put forth the majority of their  
effort and focus.

Chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Mon Jun 19 06:16:57 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 19 Jun 2006 12:16:57 +0200
Subject: [Bioperl-l] doc.bioperl
Message-ID: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>

Hi,

I just noted that it can happen that the pages at doc.bioperl.org
state "No synopsis" whereas there is one in the PM file (use perldoc
or the CVS).
An example:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Fasta.html
No synopsis, No description, but

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup

shows both.

So, if you're looking for documentation don't forget to do e.g.
"perldoc Bio::DB::Fasta"

regards,
bernd


From cjfields at uiuc.edu  Mon Jun 19 10:38:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 09:38:01 -0500
Subject: [Bioperl-l] doc.bioperl
In-Reply-To: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>
Message-ID: <001501c693ad$f7689790$15327e82@pyrimidine>

This has been reported as a bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1926

Jason mentions in the bug report that the POD may contain something that
messes with the way PDOC deals with code so should be rewritten.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Monday, June 19, 2006 5:17 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] doc.bioperl
> 
> Hi,
> 
> I just noted that it can happen that the pages at doc.bioperl.org
> state "No synopsis" whereas there is one in the PM file (use perldoc
> or the CVS).
> An example:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-
> live/Bio/DB/Fasta.html
> No synopsis, No description, but
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-
> live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup
> 
> shows both.
> 
> So, if you're looking for documentation don't forget to do e.g.
> "perldoc Bio::DB::Fasta"
> 
> regards,
> bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Mon Jun 19 10:59:23 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 19 Jun 2006 09:59:23 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <83485BEB-2457-4FD6-90B8-353228868C3A@wustl.edu>

Hi,

Just curious if any other BioPerlers will be at the YAPC conference  
in Chicago next week (http://yapcchicago.org/). Some of us from the  
WashU GSC will be there, and it might be fun to meet some other  
BioPerl people over lunch or something. If there's enough interest, I  
will organize.

By the way, if you're unfamiliar with the conference and are  
interested in attending, I think registration is still open. The fee  
is low ($100).

Dave


-- 
Dave Messina
Informatics Analyst
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1415


From ClarkeW at AGR.GC.CA  Mon Jun 19 18:34:37 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 19 Jun 2006 18:34:37 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>

Hi,

I am getting the following warning and then exception 

 
-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is 1

---------------------------------------------------

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to [ACTG*] which does not look
healthy

 
NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
sequence)

 
when extracting display name and sequence from a MYSQL database. My code
is as follows:

 
my $sql = "select Clone_Name,Sequence from tbl_bgene";

     my $sth = $dbh->prepare($sql);

     $sth->execute();

     while (my $hash = $sth->fetchrow_hashref()) {

          # print("Name: ".$hash->{'Clone_Name'}."\n");

          my $seq = new Bio::Seq(  -display_id     =>
$hash->{'Clone_Name'},

                                   -seq      =>   $hash->{'Sequence'});

          $handle->write_seq($seq);

          # print("Sequence: ".$hash->{'Sequence'}."\n");

     }

 
For some reason it is failing on a particular sequence, which is a valid
DNA sequence. If anyone has any ideas on why this is I would appreciate
it.

 
Thanks, Wayne


From torsten.seemann at infotech.monash.edu.au  Mon Jun 19 19:30:19 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 09:30:19 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <4497338B.3030609@infotech.monash.edu.au>

> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 1
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.

Usually a '*' indicates a STOP codon in a protein sequence.
I don't think it is valid in a DNA sequence?

So my guess is that BioPerl is auto-detecting it as Protein sequence,
as A,C,T,G are all valid amino acids, and * is a stop codon.

So I think BioPerl is doing the right thing.

If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
Bio:Seq constructor.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From taerwin at gmail.com  Mon Jun 19 21:38:14 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 11:38:14 +1000
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>

Hi all,

Does anyone have a runnable for cap3? There seems to be some discussion
about one in the mailing archives (
http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
find any code.


Regards,

Tim


From osborne1 at optonline.net  Mon Jun 19 22:23:43 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 19 Jun 2006 22:23:43 -0400
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
Message-ID: <C0BCD46F.8EA5%osborne1@optonline.net>

Tim,

The code seems to be here, not clear if there's an executable:

http://seq.cs.iastate.edu/download.html


Brian O.


On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:

> Hi all,
> 
> Does anyone have a runnable for cap3? There seems to be some discussion
> about one in the mailing archives (
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
> find any code.
> 
> 
> 
> Regards,
> 
> Tim
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jun 19 23:23:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 22:23:26 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <000701c69418$e53b9110$15327e82@pyrimidine>

You really haven't given us much to work with more than "this doesn't work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array; hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?  I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>   $hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From taerwin at gmail.com  Mon Jun 19 23:05:13 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 13:05:13 +1000
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <C0BCD46F.8EA5%osborne1@optonline.net>
References: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
	<C0BCD46F.8EA5%osborne1@optonline.net>
Message-ID: <c7d2b5330606192005o63ed5d6i608d6b2076399932@mail.gmail.com>

Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3

Regards,

Tim

On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
>
> Tim,
>
> The code seems to be here, not clear if there's an executable:
>
> http://seq.cs.iastate.edu/download.html
>
>
> Brian O.
>
>
> On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
>
> > Hi all,
> >
> > Does anyone have a runnable for cap3? There seems to be some discussion
> > about one in the mailing archives (
> > http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> cannot
> > find any code.
> >
> >
> >
> > Regards,
> >
> > Tim
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


From torsten.seemann at infotech.monash.edu.au  Mon Jun 19 23:07:12 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 13:07:12 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <4497338B.3030609@infotech.monash.edu.au>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
	<4497338B.3030609@infotech.monash.edu.au>
Message-ID: <44976660.7030107@infotech.monash.edu.au>

> If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
> Bio:Seq constructor.

That should be -alphabet => 'dna'.
D'oh!

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From Marc.Logghe at DEVGEN.com  Tue Jun 20 03:13:22 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 20 Jun 2006 09:13:22 +0200
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6D3D60B@ANTARESIA.be.devgen.com>

It is about 3 years old and did not test it with the current bioperl
release.
Feel free to play with it.
Cheers,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Tim Erwin
> Sent: Tuesday, June 20, 2006 5:05 AM
> To: Brian Osborne
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] cap3 runnable?
> 
> Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3
> 
> Regards,
> 
> Tim
> 
> On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
> >
> > Tim,
> >
> > The code seems to be here, not clear if there's an executable:
> >
> > http://seq.cs.iastate.edu/download.html
> >
> >
> > Brian O.
> >
> >
> > On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Does anyone have a runnable for cap3? There seems to be some 
> > > discussion about one in the mailing archives (
> > > 
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> > cannot
> > > find any code.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Tim
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cap3.pm
Type: application/octet-stream
Size: 3374 bytes
Desc: Cap3.pm
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/0976a7d9/attachment-0002.obj>

From G.Tzotzos at unido.org  Tue Jun 20 05:18:48 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 11:18:48 +0200
Subject: [Bioperl-l] Error message
Message-ID: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D

Wagramerstrasse 5
A-1400 Vienna
Austria

Email: g.tzotzos at unido.org


From G.Tzotzos at unido.org  Tue Jun 20 07:36:18 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 13:36:18 +0200
Subject: [Bioperl-l] Error message
Message-ID: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D
Vienna, Austria


From s-merchant at northwestern.edu  Tue Jun 20 10:41:33 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Tue, 20 Jun 2006 09:41:33 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <002701c69477$9ffa7c10$c2987ca5@pc13>

Hey Dave,
  I am doing a talk on dictyBase at the YAPC . I think it would be great to
meet for lunch. 

Cheers,
Sohel Merchant.

dictyBase
Northwestern University,
Chicago

>

>Just curious if any other BioPerlers will be at the YAPC conference in 

>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
Some of us from the WashU 

>GSC will be there, and it might be fun to meet some other BioPerl 

>people over lunch or something. If there's enough interest, I will 

>organize.

>

>By the way, if you're unfamiliar with the conference and are interested 

>in attending, I think registration is still open. The fee is low 

>($100).

>

>Dave

>

>

>--


From cain at cshl.edu  Tue Jun 20 12:03:26 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 20 Jun 2006 12:03:26 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1150819406.2585.27.camel@localhost.localdomain>

Hi Hilmar,

Of course you are right--I was under the influence of a perl module that
I work with that does something similar, but both of your solutions are
better.

I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
look this week.

As for next week, I plan on spending the day at NESCent on Wednesday
(though I haven't told Todd or Jeff that I am arriving early yet) just
to make sure all the details are in place.  I imagine I'll have a fair
amount of free time to hash this stuff out.  Anyone else who is in town
(that is, in Durham, NC, USA) is welcome to come draw on a white board
too. :-)

Scott


On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> You don't need a new method for this. Instead, support a -feature  
> argument.
> 
> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
> 
> This should work for any instance of Bio::SeqFeatureI. If it is a  
> B::SF::Annotated already it is obviously just a deep copy (if copy is  
> desired - could be another parameter). Otherwise more will be involved.
> 
> Alternatively, and possibly better, is to write a specialized  
> SeqFeatureI factory (that would implement  
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
> 
> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
> 		-type_ontology => $sequence_ontology,
> 		-source_ontology => $feature_source_ontology,
> 		-unflatten => 1);
> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
> 
> This is preferable because it separates business logic that isn't  
> necessarily related into defined units. I.e., the logic necessary to  
> convert an ordinary feature into a strongly typed one is different  
> from how to represent a strongly typed feature. IMHO anyway ...
> 
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
> started as the result of a discussion thread earlier this (or last?)  
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
> though not in concept.
> 
> Maybe we need to get together again and thrash out a strategy; or a  
> BOF at the GMOD meeting? I feel this does need a core group of people  
> who care, hash out a strategy that will also solve the backwards  
> compatibility problem with the current Bio::SeqFeatureI state-of- 
> limbo, and allow us to implement the decisions with a few people in a  
> concentrated effort. This will then also remove the only real large  
> stumbling block towards a 1.6 release.
> 
> Maybe we should think about a little pre-GMOD hackathon to clear up  
> this mess? Scott, you'll be there a day early? I'll be already back  
> and Jason I believe will still be in town, although he may have other  
> commitments already. Nonetheless, it shouldn't really take that much  
> but rather dedicated time, a whiteboard, and a few people who care  
> thrashing this out and then do it.
> 
> Thoughts?
> 
> 	-hilmar
> 
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
> 
> > Rob,
> >
> > I came to the same conclusion as well; I wrote my response as I was
> > heading out the door and while I was running errands, I realized the
> > right thing to do is to write a Bio::SeqFeature::Annotated method  
> > called
> > new_from_object, whose usage would be:
> >
> >   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> > ($my_BSFI, %args);
> >
> > where you would give it a Bio::SeqFeatureI compliant object and try to
> > create a BSFA like use suggested below.  You could allow passing in  
> > args
> > to control how different things are handled, like mapping non-SO types
> > to SO types.  I'll think about this over the weekend and let you  
> > know if
> > brilliance strikes me.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> >> Rather than cobble together some ad-hoc solution, I would be  
> >> interested
> >> in working on a good solution to this problem, because it seems like
> >> it's just going to get more common as more people start wanting to  
> >> write
> >> GFF3.  What about some code in whatever customarily makes these  
> >> objects
> >> (probably BSF::Annotated's new() method?) that could take another  
> >> type
> >> of Feature object and attempt to shoehorn its data into a new
> >> BSF::Annotated?  If it failed (because the type isn't in SO or
> >> whatever), it could throw() some informative error message.
> >>
> >> Then, people could write straightforward code something like:
> >>
> >> while(my $oldstylefeature = $features_in->next_feature) {
> >>     $oldstylefeature->primary_tag('something_that_is_in_so');
> >>     $oldstylefeature->something_else('some other something that  
> >> needs to
> >> be changed for compliance');
> >>     my $newfeature = Bio::SeqFeature::Annotated->new 
> >> ($oldstylefeature);
> >>     $gff3_out->write_feature($newfeature);
> >> }
> >>
> >> Does that sound like a good idea?  I'd be more than willing to  
> >> implement
> >> this, since I'm going to need to do this sort of thing with many more
> >> things than just RepeatMasker.
> >>
> >> Rob
> >>
> >> Scott Cain wrote:
> >>> Um, yeah, good question.  The reason I didn't answer you when you  
> >>> wrote
> >>> before is that I was hoping for divine inspiration for an answer  
> >>> (or for
> >>> somebody else to answer, which would have been really great :-)
> >>>
> >>> The short answer (and easy one for me to type) is that you will  
> >>> probably
> >>> need an ad hoc method to do it, which is the same thing I do when  
> >>> I need
> >>> to convert gff2 to gff3, to make sure the things I need mapped get
> >>> mapped the 'right' way (that is, the way I want them to go).  I  
> >>> don't
> >>> have any sample code that does this, but if you want to start  
> >>> working up
> >>> an ad hoc method, I will certainly try to help you as much as I can.
> >>>
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >>>
> >>>> So about that converting ye olde feature objects into
> >>>> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>>>
> >>>>
> >>>> Scott Cain wrote:
> >>>>
> >>>>> That's OK--You added a few items that should be escaped that  
> >>>>> weren't, so
> >>>>> I added those too.
> >>>>>
> >>>>> Thanks,
> >>>>> Scott
> >>>>>
> >>>>>
> >>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>>>
> >>>>>
> >>>>>> Woops, I should have said something about that.  I submitted  
> >>>>>> it before
> >>>>>> I saw that Scott had already done the escaping in CVS.
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Scott,
> >>>>>>>
> >>>>>>> Looks like Robert also submitted a bug report related to this  
> >>>>>>> as well=
> >>>>>>> ---------------------------------------------------------------- 
> >>>>>>> --------
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
> 
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/4b71554e/attachment-0002.bin>

From osborne1 at optonline.net  Tue Jun 20 12:13:51 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 12:13:51 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>
Message-ID: <C0BD96FF.8EC3%osborne1@optonline.net>

George,

The docs I'm reading say to use 'swiss', not 'swissprot' but I think there's
some other problem that may be specific to SwissProt. Can you retrieve from
GenBank? E.g.:

my $seq_object = get_sequence('genbank', 2);

Brian O.


On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:

> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
> 
> use Bio::Perl;
> use strict;
> use warnings;
> 
> my $seq_object = get_sequence('swissprot', "P09651");
> 
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
> 
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
> 
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
> 
> Thanks
> 
> George
> 
> 
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
> 
> 
> 
> George T. Tzotzos Ph.D
> Vienna, Austria
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Tzotzos at unido.org  Tue Jun 20 12:21:32 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 18:21:32 +0200
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
References: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <76750E11-3BD6-42EB-832D-3A12BC6B4BEE@unido.org>

Brian

Neither <swiss> nor <swissprot> work. However, your suggestion does  
work fine. So does Chandan's.  Many thanks to both.

Cheers

George


On 20 Jun 2006, at 18:13, Brian Osborne wrote:

> George,
>
> The docs I'm reading say to use 'swiss', not 'swissprot' but I  
> think there's
> some other problem that may be specific to SwissProt. Can you  
> retrieve from
> GenBank? E.g.:
>
> my $seq_object = get_sequence('genbank', 2);
>
> Brian O.
>
>
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>
>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>> following script to test the installation:
>>
>> use Bio::Perl;
>> use strict;
>> use warnings;
>>
>> my $seq_object = get_sequence('swissprot', "P09651");
>>
>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>
>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>> get the message below.
>>
>> Any help on the nature of the problem and how to overcome it would be
>> greatly appreciated.
>>
>> Thanks
>>
>> George
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>> swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>> WebDBSeqI.pm:153
>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>> STACK toplevel tut2.pl:5
>>
>>
>>
>> George T. Tzotzos Ph.D
>> Vienna, Austria
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From ClarkeW at AGR.GC.CA  Tue Jun 20 12:57:34 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 12:57:34 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>


The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
trace is 
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
STACK: /home/wayne/bin/mast_fasta.pl:59

And the full script is attached. 

However I would like to clarify that the actual sequence is not ACTG*,
this was a notation to represent that I had checked it to be sure that
it was a valid DNA sequence but due to confidentiality I cannot disclose
the actual sequence. I know this makes it more difficult and that I
perhaps should have been clearer about this originally. The $handle is a
Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
name  

'Clone_Name' => 'sJ1485'
        };
then the error message. I hope this is more helpful than my last
message.

Thanks, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mast_fasta.pl
Type: application/octet-stream
Size: 1998 bytes
Desc: mast_fasta.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/53770697/attachment-0002.obj>

From cjfields at uiuc.edu  Tue Jun 20 13:16:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:16:32 -0500
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <000c01c6948d$46e992d0$15327e82@pyrimidine>

Brian,

Brian,

Looks like EBI switched the url parameter for swissprot 'swall' to
'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
this and solves the issue.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Tuesday, June 20, 2006 11:14 AM
> To: George Tzotzos; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Error message
> 
> George,
> 
> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
> there's
> some other problem that may be specific to SwissProt. Can you retrieve
> from
> GenBank? E.g.:
> 
> my $seq_object = get_sequence('genbank', 2);
> 
> Brian O.
> 
> 
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
> 
> > I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> > following script to test the installation:
> >
> > use Bio::Perl;
> > use strict;
> > use warnings;
> >
> > my $seq_object = get_sequence('swissprot', "P09651");
> >
> > write_sequence(">roa1.fasta", 'fasta', $seq_object);
> >
> > I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> > get the message below.
> >
> > Any help on the nature of the problem and how to overcome it would be
> > greatly appreciated.
> >
> > Thanks
> >
> > George
> >
> >
> > ------------- EXCEPTION  -------------
> > MSG: swissprot stream with no ID. Not swissprot in my book
> > STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> > swiss.pm:179
> > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> > WebDBSeqI.pm:153
> > STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> > STACK toplevel tut2.pl:5
> >
> >
> >
> > George T. Tzotzos Ph.D
> > Vienna, Austria
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chandan.kr.singh at gmail.com  Tue Jun 20 10:46:01 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Tue, 20 Jun 2006 20:16:01 +0530
Subject: [Bioperl-l] Error message
In-Reply-To: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
References: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
Message-ID: <2d4f320606200746ja53cebs73923c510b535c44@mail.gmail.com>

Hi
It seems the 'swall' servertype on EBI no longer exists. May be this  has
already been reported  and debugged. I hope somebody throws light on it.

As for George, if u r in hurry u can use Bio::DB::SwissProt module directly.
Here is a typical code to do this

use strict ;
use warnings ;
use Bio::DB::SwissProt ;
use Bio::Perl ;
my $seq_obj = new Bio::DB::SwissProt('-servertype' => 'expasy' ,
'-hostlocation' => 'us') ;
my $seq = $seq_obj->get_Seq_by_id('ROA1_HUMAN') ;
write_sequence("> roa.sp" , 'fasta' , $seq) ;


See the module for any help .

cheers
Chandan


On 6/20/06, George Tzotzos <G.Tzotzos at unido.org> wrote:
>
> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
>
> use Bio::Perl;
> use strict;
> use warnings;
>
> my $seq_object = get_sequence('swissprot', "P09651");
>
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
>
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
>
> Thanks
>
> George
>
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
>
>
>
>
>
> George T. Tzotzos Ph.D
>
> Wagramerstrasse 5
> A-1400 Vienna
> Austria
>
> Email: g.tzotzos at unido.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From osborne1 at optonline.net  Tue Jun 20 13:33:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 13:33:07 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <000c01c6948d$46e992d0$15327e82@pyrimidine>
Message-ID: <C0BDA993.8ED3%osborne1@optonline.net>

Chris,

You beat me to it!

Brian O.


On 6/20/06 1:16 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Brian,
> 
> Brian,
> 
> Looks like EBI switched the url parameter for swissprot 'swall' to
> 'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
> this and solves the issue.
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Tuesday, June 20, 2006 11:14 AM
>> To: George Tzotzos; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Error message
>> 
>> George,
>> 
>> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
>> there's
>> some other problem that may be specific to SwissProt. Can you retrieve
>> from
>> GenBank? E.g.:
>> 
>> my $seq_object = get_sequence('genbank', 2);
>> 
>> Brian O.
>> 
>> 
>> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>> 
>>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>>> following script to test the installation:
>>> 
>>> use Bio::Perl;
>>> use strict;
>>> use warnings;
>>> 
>>> my $seq_object = get_sequence('swissprot', "P09651");
>>> 
>>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>> 
>>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>>> get the message below.
>>> 
>>> Any help on the nature of the problem and how to overcome it would be
>>> greatly appreciated.
>>> 
>>> Thanks
>>> 
>>> George
>>> 
>>> 
>>> ------------- EXCEPTION  -------------
>>> MSG: swissprot stream with no ID. Not swissprot in my book
>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>>> swiss.pm:179
>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>>> WebDBSeqI.pm:153
>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>>> STACK toplevel tut2.pl:5
>>> 
>>> 
>>> 
>>> George T. Tzotzos Ph.D
>>> Vienna, Austria
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue Jun 20 13:44:42 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 13:44:42 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A66@onncrxms4.agr.gc.ca>

Hi all, 

It seems that there is a newline character which is causing the problem,
this wasn't obvious at first due to the size of my shell window but that
is what is giving the mismatch error. Thanks to Chris and Torsten for
the help and for pointing me in the direction of validate_seq which was
helpful in finding the problem.

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun 20 13:55:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:55:28 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>
Message-ID: <000e01c69492$b74e0ec0$15327e82@pyrimidine>

> -----Original Message-----
> From: Clarke, Wayne [mailto:ClarkeW at AGR.GC.CA]
> Sent: Tuesday, June 20, 2006 11:58 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> 
> The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
> trace is
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
> STACK: /home/wayne/bin/mast_fasta.pl:59
> 
> And the full script is attached.

Have you tried a newer version of Bioperl to see if it fixed the issue?  v.
1.5.1 has been out for a bit now and it's pretty stable.

> However I would like to clarify that the actual sequence is not ACTG*,
> this was a notation to represent that I had checked it to be sure that
> it was a valid DNA sequence but due to confidentiality I cannot disclose
> the actual sequence. I know this makes it more difficult and that I
> perhaps should have been clearer about this originally. 

That's not a problem.  We run into that here a bit.  Example data is fine.

> The $handle is a
> Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
> name
> 
> 'Clone_Name' => 'sJ1485'
>         };
> then the error message. I hope this is more helpful than my last
> message.
> 
> Thanks, Wayne

Make sure you aren't using bioperl-specific methods when you run
Data::Dumper on your hash or the script crashes.

Okay, I was able to reproduce your error using PrimarySeq from v. 1.4 (BTW,
the error message changes if you use a newer version of Bioperl but it is
still there).  See if you can follow me here...

I used this script:
-------------------------
use Bio::Seq;
use Bio::SeqIO;
use Data::Dumper;

my $hash = {'Clone'     => 'test',
            'Sequence'  => 'ACTG*'};

my $seqout = Bio::SeqIO->new (-format   => 'fasta',
                              -fh       => \*STDOUT);

print Dumper($hash);

my $seq = Bio::Seq->new(-seq            => $hash->{'Sequence'},
                        -display_id     => $hash->{'Clone'});

$seqout->write_seq($seq);
-------------------------

And everything works fine, with this output:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => 'ACTG*'
        };
>test
ACTG*

Changing the anonymous hash to this causes the crash and error.

my $hash = {'Clone'     => 'test',
            'Sequence'  => ['ACTG*']};

Gets this:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => [
                          'ACTG*'
                        ]
        };

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 1
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [ARRAY(0x2354b0)] which does not look
healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\core/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:268
STACK: Bio::PrimarySeq::new C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new C:\Perl\src\bioperl\core/Bio/Seq.pm:497
STACK: C:\Perl\Scripts\seq-test\test.pl:17
-----------------------------------------------------------

It could be that the sequence data is stored in another complex data type
(object, hash) that's causing the problem.  Looks like you retrieve your
hash from another method ('my $hash = $sth->fetchrow_hashref()'); you might
want to check that method to make sure you're getting the right kind of data
into your hash.
 
Chris


From rmb32 at cornell.edu  Tue Jun 20 14:09:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 20 Jun 2006 12:09:38 -0600
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150819406.2585.27.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
Message-ID: <449839E2.5080402@cornell.edu>

Getting to know this code a little better, I notice a couple of little 
things: 

1.) my patch attached to bug 2026 draws unnecessary distinctions between 
feature types that use tags, and those that use annotations, since all 
features are now Bio::AnnotatableI's and the *_tags_* methods are 
implemented in AnnotatableI in terms of annotation objects now.  You 
guys should probably just ignore it, since from the sound of it you're 
going to be changing all of this around anyway.  Wish I could be there 
to help and learn more.

2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar 
accessors to use when translating Bio::Annotation::* objects to and from 
scalar tags.  Seems to me, this would be much better accomplished by 
using polymorphism of some sort, probably adding a multipurpose as_tag() 
accessor in Bio::AnnotationI and the objects that implement it, then 
using that in Bio::AnnotatableI instead of %tag2text.  Does this make 
sense, or am I misinterpreting something here?  Reason I've noticed this 
is because I've been wrestling with how to translate  
Bio::Annotation::Target objects to and from scalar tag values, since a 
Target is being represented as an ordered list of 3 or 4 scalar tags in 
old things that were designed to interoperate with gff2, and I can't 
figure out a nice way to do it using the rather inflexible %tag2text 
mechanism.

Sorry to be a pain, just wanted to get that in there before you guys 
start your jam session in Durham.

Rob

Scott Cain wrote:
> Hi Hilmar,
>
> Of course you are right--I was under the influence of a perl module that
> I work with that does something similar, but both of your solutions are
> better.
>
> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
> look this week.
>
> As for next week, I plan on spending the day at NESCent on Wednesday
> (though I haven't told Todd or Jeff that I am arriving early yet) just
> to make sure all the details are in place.  I imagine I'll have a fair
> amount of free time to hash this stuff out.  Anyone else who is in town
> (that is, in Durham, NC, USA) is welcome to come draw on a white board
> too. :-)
>
> Scott
>
>
> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>   
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> You don't need a new method for this. Instead, support a -feature  
>> argument.
>>
>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>
>> This should work for any instance of Bio::SeqFeatureI. If it is a  
>> B::SF::Annotated already it is obviously just a deep copy (if copy is  
>> desired - could be another parameter). Otherwise more will be involved.
>>
>> Alternatively, and possibly better, is to write a specialized  
>> SeqFeatureI factory (that would implement  
>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>
>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>> 		-type_ontology => $sequence_ontology,
>> 		-source_ontology => $feature_source_ontology,
>> 		-unflatten => 1);
>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>
>> This is preferable because it separates business logic that isn't  
>> necessarily related into defined units. I.e., the logic necessary to  
>> convert an ordinary feature into a strongly typed one is different  
>> from how to represent a strongly typed feature. IMHO anyway ...
>>
>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
>> started as the result of a discussion thread earlier this (or last?)  
>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
>> though not in concept.
>>
>> Maybe we need to get together again and thrash out a strategy; or a  
>> BOF at the GMOD meeting? I feel this does need a core group of people  
>> who care, hash out a strategy that will also solve the backwards  
>> compatibility problem with the current Bio::SeqFeatureI state-of- 
>> limbo, and allow us to implement the decisions with a few people in a  
>> concentrated effort. This will then also remove the only real large  
>> stumbling block towards a 1.6 release.
>>
>> Maybe we should think about a little pre-GMOD hackathon to clear up  
>> this mess? Scott, you'll be there a day early? I'll be already back  
>> and Jason I believe will still be in town, although he may have other  
>> commitments already. Nonetheless, it shouldn't really take that much  
>> but rather dedicated time, a whiteboard, and a few people who care  
>> thrashing this out and then do it.
>>
>> Thoughts?
>>
>> 	-hilmar
>>
>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>
>>     
>>> Rob,
>>>
>>> I came to the same conclusion as well; I wrote my response as I was
>>> heading out the door and while I was running errands, I realized the
>>> right thing to do is to write a Bio::SeqFeature::Annotated method  
>>> called
>>> new_from_object, whose usage would be:
>>>
>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
>>> ($my_BSFI, %args);
>>>
>>> where you would give it a Bio::SeqFeatureI compliant object and try to
>>> create a BSFA like use suggested below.  You could allow passing in  
>>> args
>>> to control how different things are handled, like mapping non-SO types
>>> to SO types.  I'll think about this over the weekend and let you  
>>> know if
>>> brilliance strikes me.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>       
>>>> Rather than cobble together some ad-hoc solution, I would be  
>>>> interested
>>>> in working on a good solution to this problem, because it seems like
>>>> it's just going to get more common as more people start wanting to  
>>>> write
>>>> GFF3.  What about some code in whatever customarily makes these  
>>>> objects
>>>> (probably BSF::Annotated's new() method?) that could take another  
>>>> type
>>>> of Feature object and attempt to shoehorn its data into a new
>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>> whatever), it could throw() some informative error message.
>>>>
>>>> Then, people could write straightforward code something like:
>>>>
>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>     $oldstylefeature->something_else('some other something that  
>>>> needs to
>>>> be changed for compliance');
>>>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>>>> ($oldstylefeature);
>>>>     $gff3_out->write_feature($newfeature);
>>>> }
>>>>
>>>> Does that sound like a good idea?  I'd be more than willing to  
>>>> implement
>>>> this, since I'm going to need to do this sort of thing with many more
>>>> things than just RepeatMasker.
>>>>
>>>> Rob
>>>>
>>>> Scott Cain wrote:
>>>>         
>>>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>>>> wrote
>>>>> before is that I was hoping for divine inspiration for an answer  
>>>>> (or for
>>>>> somebody else to answer, which would have been really great :-)
>>>>>
>>>>> The short answer (and easy one for me to type) is that you will  
>>>>> probably
>>>>> need an ad hoc method to do it, which is the same thing I do when  
>>>>> I need
>>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>>>> don't
>>>>> have any sample code that does this, but if you want to start  
>>>>> working up
>>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>
>>>>>           
>>>>>> So about that converting ye olde feature objects into
>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>
>>>>>>
>>>>>> Scott Cain wrote:
>>>>>>
>>>>>>             
>>>>>>> That's OK--You added a few items that should be escaped that  
>>>>>>> weren't, so
>>>>>>> I added those too.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Scott
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>>>> it before
>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Scott,
>>>>>>>>>
>>>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>>>> as well=
>>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>>> --------
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>                   
>>> -- 
>>> ---------------------------------------------------------------------- 
>>> --
>>> Scott Cain, Ph. D.                                          
>>> cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> - --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (Darwin)
>>
>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>> ImoAXD/jrbF0gXzSr2CY4tQ=
>> =XfDq
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From hlapp at gmx.net  Tue Jun 20 14:24:45 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 20 Jun 2006 14:24:45 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449839E2.5080402@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
	<449839E2.5080402@cornell.edu>
Message-ID: <A3627468-CCA4-41FD-8C09-F5E1BFCE67D0@gmx.net>

Yes, this is the sore problem area. AnnotatableI used to have only a  
single method (annotation()), the *_tag_* methods are new since 1.5  
(and truly a developer release feature - don't rely on them staying).

Likewise, the tag2text is an utterly ugly artifact (after all, this  
is an interface) rooted in the above addition. If we can't manage to  
remove it I'll remove my name from that module ;)

	-hilmar

On Jun 20, 2006, at 2:09 PM, Robert Buels wrote:

> Getting to know this code a little better, I notice a couple of little
> things:
>
> 1.) my patch attached to bug 2026 draws unnecessary distinctions  
> between
> feature types that use tags, and those that use annotations, since all
> features are now Bio::AnnotatableI's and the *_tags_* methods are
> implemented in AnnotatableI in terms of annotation objects now.  You
> guys should probably just ignore it, since from the sound of it you're
> going to be changing all of this around anyway.  Wish I could be there
> to help and learn more.
>
> 2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar
> accessors to use when translating Bio::Annotation::* objects to and  
> from
> scalar tags.  Seems to me, this would be much better accomplished by
> using polymorphism of some sort, probably adding a multipurpose  
> as_tag()
> accessor in Bio::AnnotationI and the objects that implement it, then
> using that in Bio::AnnotatableI instead of %tag2text.  Does this make
> sense, or am I misinterpreting something here?  Reason I've noticed  
> this
> is because I've been wrestling with how to translate
> Bio::Annotation::Target objects to and from scalar tag values, since a
> Target is being represented as an ordered list of 3 or 4 scalar  
> tags in
> old things that were designed to interoperate with gff2, and I can't
> figure out a nice way to do it using the rather inflexible %tag2text
> mechanism.
>
> Sorry to be a pain, just wanted to get that in there before you guys
> start your jam session in Durham.
>
> Rob
>
> Scott Cain wrote:
>> Hi Hilmar,
>>
>> Of course you are right--I was under the influence of a perl  
>> module that
>> I work with that does something similar, but both of your  
>> solutions are
>> better.
>>
>> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
>> look this week.
>>
>> As for next week, I plan on spending the day at NESCent on Wednesday
>> (though I haven't told Todd or Jeff that I am arriving early yet)  
>> just
>> to make sure all the details are in place.  I imagine I'll have a  
>> fair
>> amount of free time to hash this stuff out.  Anyone else who is in  
>> town
>> (that is, in Durham, NC, USA) is welcome to come draw on a white  
>> board
>> too. :-)
>>
>> Scott
>>
>>
>> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> You don't need a new method for this. Instead, support a -feature
>>> argument.
>>>
>>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>>
>>> This should work for any instance of Bio::SeqFeatureI. If it is a
>>> B::SF::Annotated already it is obviously just a deep copy (if  
>>> copy is
>>> desired - could be another parameter). Otherwise more will be  
>>> involved.
>>>
>>> Alternatively, and possibly better, is to write a specialized
>>> SeqFeatureI factory (that would implement
>>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>>
>>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>>> 		-type_ontology => $sequence_ontology,
>>> 		-source_ontology => $feature_source_ontology,
>>> 		-unflatten => 1);
>>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>>
>>> This is preferable because it separates business logic that isn't
>>> necessarily related into defined units. I.e., the logic necessary to
>>> convert an ordinary feature into a strongly typed one is different
>>> from how to represent a strongly typed feature. IMHO anyway ...
>>>
>>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan
>>> started as the result of a discussion thread earlier this (or last?)
>>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,
>>> though not in concept.
>>>
>>> Maybe we need to get together again and thrash out a strategy; or a
>>> BOF at the GMOD meeting? I feel this does need a core group of  
>>> people
>>> who care, hash out a strategy that will also solve the backwards
>>> compatibility problem with the current Bio::SeqFeatureI state-of-
>>> limbo, and allow us to implement the decisions with a few people  
>>> in a
>>> concentrated effort. This will then also remove the only real large
>>> stumbling block towards a 1.6 release.
>>>
>>> Maybe we should think about a little pre-GMOD hackathon to clear up
>>> this mess? Scott, you'll be there a day early? I'll be already back
>>> and Jason I believe will still be in town, although he may have  
>>> other
>>> commitments already. Nonetheless, it shouldn't really take that much
>>> but rather dedicated time, a whiteboard, and a few people who care
>>> thrashing this out and then do it.
>>>
>>> Thoughts?
>>>
>>> 	-hilmar
>>>
>>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>>
>>>
>>>> Rob,
>>>>
>>>> I came to the same conclusion as well; I wrote my response as I was
>>>> heading out the door and while I was running errands, I realized  
>>>> the
>>>> right thing to do is to write a Bio::SeqFeature::Annotated method
>>>> called
>>>> new_from_object, whose usage would be:
>>>>
>>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object
>>>> ($my_BSFI, %args);
>>>>
>>>> where you would give it a Bio::SeqFeatureI compliant object and  
>>>> try to
>>>> create a BSFA like use suggested below.  You could allow passing in
>>>> args
>>>> to control how different things are handled, like mapping non-SO  
>>>> types
>>>> to SO types.  I'll think about this over the weekend and let you
>>>> know if
>>>> brilliance strikes me.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>>
>>>>> Rather than cobble together some ad-hoc solution, I would be
>>>>> interested
>>>>> in working on a good solution to this problem, because it seems  
>>>>> like
>>>>> it's just going to get more common as more people start wanting to
>>>>> write
>>>>> GFF3.  What about some code in whatever customarily makes these
>>>>> objects
>>>>> (probably BSF::Annotated's new() method?) that could take another
>>>>> type
>>>>> of Feature object and attempt to shoehorn its data into a new
>>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>>> whatever), it could throw() some informative error message.
>>>>>
>>>>> Then, people could write straightforward code something like:
>>>>>
>>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>>     $oldstylefeature->something_else('some other something that
>>>>> needs to
>>>>> be changed for compliance');
>>>>>     my $newfeature = Bio::SeqFeature::Annotated->new
>>>>> ($oldstylefeature);
>>>>>     $gff3_out->write_feature($newfeature);
>>>>> }
>>>>>
>>>>> Does that sound like a good idea?  I'd be more than willing to
>>>>> implement
>>>>> this, since I'm going to need to do this sort of thing with  
>>>>> many more
>>>>> things than just RepeatMasker.
>>>>>
>>>>> Rob
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> Um, yeah, good question.  The reason I didn't answer you when you
>>>>>> wrote
>>>>>> before is that I was hoping for divine inspiration for an answer
>>>>>> (or for
>>>>>> somebody else to answer, which would have been really great :-)
>>>>>>
>>>>>> The short answer (and easy one for me to type) is that you will
>>>>>> probably
>>>>>> need an ad hoc method to do it, which is the same thing I do when
>>>>>> I need
>>>>>> to convert gff2 to gff3, to make sure the things I need mapped  
>>>>>> get
>>>>>> mapped the 'right' way (that is, the way I want them to go).  I
>>>>>> don't
>>>>>> have any sample code that does this, but if you want to start
>>>>>> working up
>>>>>> an ad hoc method, I will certainly try to help you as much as  
>>>>>> I can.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> So about that converting ye olde feature objects into
>>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>>
>>>>>>>
>>>>>>> Scott Cain wrote:
>>>>>>>
>>>>>>>
>>>>>>>> That's OK--You added a few items that should be escaped that
>>>>>>>> weren't, so
>>>>>>>> I added those too.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Woops, I should have said something about that.  I submitted
>>>>>>>>> it before
>>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>>
>>>>>>>>> Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Scott,
>>>>>>>>>>
>>>>>>>>>> Looks like Robert also submitted a bug report related to this
>>>>>>>>>> as well=
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> ---
>>>>>>>>>> --------
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------- 
>>>> ---
>>>> --
>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> - --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (Darwin)
>>>
>>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>>> ImoAXD/jrbF0gXzSr2CY4tQ=
>>> =XfDq
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> -------------------------------------------------------------------- 
>>> ----
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 20 16:22:45 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Jun 2006 21:22:45 +0100
Subject: [Bioperl-l] Bio::Map changes
Message-ID: <44985915.8010607@sendu.me.uk>

Some initial changes have been made to some modules in Bio::Map to allow 
Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
(see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)

Further changes are needed in some remaining Bio::Map modules for this 
addition to be complete (a number of Bio::Map related tests in the test 
suite currently fail), notably Bio::Map::Cyto* since they had 
implemented their own Range-related features.

I propose bringing all Bio::Map into line so it behaves with and makes 
good use of the RangeI nature of Position. Beyond this initial change I 
want to add relative positioning and more, but I'll describe that in a 
future post to this thread.

Can anyone see any issues with ranged positions (it's done in a backward 
compatible way)? Do any developers want to maintain control of a 
Bio::Map module or shall I just dive in?


From cjfields at uiuc.edu  Tue Jun 20 23:50:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 22:50:55 -0500
Subject: [Bioperl-l] EUtilities interface
Message-ID: <002301c694e5$e5f3a750$15327e82@pyrimidine>

I'm working on a new eutilities interface which I hope to commit by late
summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
generic web database interface, which I call Bio::DB::WebDBI, and the
EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
NCBI for any information available via Entrez Utilities (i.e. taxonomy,
pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
info like Bio::DB::WebDBSeqI.  

My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
Does anyone think this will be an issue?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From bix at sendu.me.uk  Wed Jun 21 04:20:37 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 09:20:37 +0100
Subject: [Bioperl-l] Bio::RangeI intersection proposal
Message-ID: <44990155.6050501@sendu.me.uk>

Bio::Map::PositionI (in bioperl-live) needs intersections of a list of 
ranges. It inherits from Bio::RangeI but unlike RangeI's union, 
intersection does not take a list. PositionI currently calls 
intersection repeatedly to handle a list.

If there is no particular reason for this limitation, I propose making 
RangeI intersection handle lists natively. This won't do any harm to 
existing code at the time of the change, but its possible that someone 
has written a module that implements RangeI but overrides intersection 
(without making it accept a list), so that future code written that 
expects a RangeI to handle lists will break when getting a RangeI from 
that module.

So the question is, has anyone overridden intersection in RangeI? Is the 
small risk of possible breakage compensated by the benefit of 
intersections of a list of ranges (which is surely useful in lots of 
situations, not just for PositionI)?

I'm tempted to go ahead with this unless there are objections.


From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 03:19:12 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 09:19:12 +0200
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <002701c69477$9ffa7c10$c2987ca5@pc13>
References: <002701c69477$9ffa7c10$c2987ca5@pc13>
Message-ID: <4498F2F0.7010203@biomax.com>

Sohel Merchant wrote:

> 
>>Just curious if any other BioPerlers will be at the YAPC conference in 
> 
> 
>>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).

Not in chicago, but yesterday I got the OK from Biomax management to go 
the YAPC::Europe, http://www.birmingham2006.com/. So in the end of 
August I'll be in Birmingham. Yeah!

Is anybody interested in writing parsers for Perl 6 there?

CU, Bernhard


-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************


From cjfields at uiuc.edu  Wed Jun 21 11:08:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:08:28 -0500
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <4498F2F0.7010203@biomax.com>
Message-ID: <000301c69544$8d537710$15327e82@pyrimidine>

Speaking of Perl6, there was interest here at one point in getting a
bioperl-experimental going, which at this point in the game should involve
Perl6.  If there were enough interest in it we could probably get it set up
via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
experimental modules in some way to prevent confusion (bioperl6-live???),
though I'm not up to speed Perl6-wise so I'm not sure about namespace
collisions and so on.

bioperl-experimental would be, like the name implies, a sort of testing
ground for ideas (good and bad).  It seemed like it was going to take off a
few years ago but it lost steam, I'm guess.

As for your parsers, would you build them from the ground up (i.e. from
Bio::Root::Root on up)?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernhard Schmalhofer
> Sent: Wednesday, June 21, 2006 2:19 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Sohel Merchant
> Subject: Re: [Bioperl-l] YAPC anyone?
> 
> Sohel Merchant wrote:
> 
> >
> >>Just curious if any other BioPerlers will be at the YAPC conference in
> >
> >
> >>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
> 
> Not in chicago, but yesterday I got the OK from Biomax management to go
> the YAPC::Europe, http://www.birmingham2006.com/. So in the end of
> August I'll be in Birmingham. Yeah!
> 
> Is anybody interested in writing parsers for Perl 6 there?
> 
> CU, Bernhard
> 
> 
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 11:16:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:16:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <44990155.6050501@sendu.me.uk>
Message-ID: <000401c69545$a4a3ad30$15327e82@pyrimidine>

I personally have no objections as long as it doesn't break API.  Don't know
how the senior guys feel (Jason, Brian, Heikki, Hilmar...); I'm not a user
of Bio::Map modules myself.

Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, June 21, 2006 3:21 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::RangeI intersection proposal
> 
> Bio::Map::PositionI (in bioperl-live) needs intersections of a list of
> ranges. It inherits from Bio::RangeI but unlike RangeI's union,
> intersection does not take a list. PositionI currently calls
> intersection repeatedly to handle a list.
> 
> If there is no particular reason for this limitation, I propose making
> RangeI intersection handle lists natively. This won't do any harm to
> existing code at the time of the change, but its possible that someone
> has written a module that implements RangeI but overrides intersection
> (without making it accept a list), so that future code written that
> expects a RangeI to handle lists will break when getting a RangeI from
> that module.
> 
> So the question is, has anyone overridden intersection in RangeI? Is the
> small risk of possible breakage compensated by the benefit of
> intersections of a list of ranges (which is surely useful in lots of
> situations, not just for PositionI)?
> 
> I'm tempted to go ahead with this unless there are objections.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed Jun 21 11:24:47 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:24:47 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000401c69545$a4a3ad30$15327e82@pyrimidine>
References: <000401c69545$a4a3ad30$15327e82@pyrimidine>
Message-ID: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>


On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:

> Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Actually, it doesn't go by age but by the amount of hair you still  
have. ;)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 11:28:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:28:58 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>
Message-ID: <000501c69547$6a9f28b0$15327e82@pyrimidine>

Then I'm really a senior guy...

; {

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:25 AM
> To: Chris Fields
> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> 
> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> 
> > Actually, sounds weird to have me say "senior guys"; I'm 35 years old!
> 
> Actually, it doesn't go by age but by the amount of hair you still
> have. ;)
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed Jun 21 11:53:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:53:08 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000501c69547$6a9f28b0$15327e82@pyrimidine>
References: <000501c69547$6a9f28b0$15327e82@pyrimidine>
Message-ID: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>

We could run a Mr Seniority competition at BOSC with the attendees  
judging who got the weirdest looking hair loss. You'd take the  
challenge? The judging panel would need to be gender-mixed though.

On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:

> Then I'm really a senior guy...
>
> ; {
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, June 21, 2006 10:25 AM
>> To: Chris Fields
>> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
>>
>>
>> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
>>
>>> Actually, sounds weird to have me say "senior guys"; I'm 35 years  
>>> old!
>>
>> Actually, it doesn't go by age but by the amount of hair you still
>> have. ;)
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 12:08:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 11:08:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>
Message-ID: <000301c6954c$e89c7a60$15327e82@pyrimidine>

I'd love to be at BOSC but I can't go (finishing up my postdoc this year,
which is probably the primary cause of my hair loss).  Would the judges
accept a recent picture?

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:53 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> We could run a Mr Seniority competition at BOSC with the attendees
> judging who got the weirdest looking hair loss. You'd take the
> challenge? The judging panel would need to be gender-mixed though.
> 
> On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:
> 
> > Then I'm really a senior guy...
> >
> > ; {
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Wednesday, June 21, 2006 10:25 AM
> >> To: Chris Fields
> >> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> >>
> >>
> >> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> >>
> >>> Actually, sounds weird to have me say "senior guys"; I'm 35 years
> >>> old!
> >>
> >> Actually, it doesn't go by age but by the amount of hair you still
> >> have. ;)
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 12:25:50 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 18:25:50 +0200
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <000301c69544$8d537710$15327e82@pyrimidine>
References: <000301c69544$8d537710$15327e82@pyrimidine>
Message-ID: <4499730E.8090800@biomax.com>

Chris Fields wrote:
> Speaking of Perl6, there was interest here at one point in getting a
> bioperl-experimental going, which at this point in the game should involve
> Perl6.  If there were enough interest in it we could probably get it set up
> via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
> experimental modules in some way to prevent confusion (bioperl6-live???),
> though I'm not up to speed Perl6-wise so I'm not sure about namespace
> collisions and so on.

As far as I understood it, the plan is to have a very smooth migration 
path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When 
new stuff is coming along, or when refactoring is done, you drop in

   use v6;

or

   use v6-pugs;

and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm 
or Audrey Tangs presentation at the Nordic Perl Workshop: 
http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
So I would argue against having a completely seperate Perl6 experimental
repository.

> bioperl-experimental would be, like the name implies, a sort of testing
> ground for ideas (good and bad).  It seemed like it was going to take off a
> few years ago but it lost steam, I'm guess.
> 
> As for your parsers, would you build them from the ground up (i.e. from
> Bio::Root::Root on up)?

I'm just a casual Bio::Perl user and never hacked on any internals. So I 
don't know whether the current Bio::Perl framework is a good fit.

The idea that is floating in my mind is to make a showcase of Perl 6 
parsing, by tackling the various sequences and alignment formats.
So this would involve shopping around for the cleanest parser 
implementations and porting that to Perl6.

Which repository to use is more a question of social engineering.
Are there more Pugs/Perl6 hackers interested in cool biological hacking,
or biologist aching to try out Perl6?

Regards,
   Bernhard Schmalhofer

-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************


From cjfields at uiuc.edu  Wed Jun 21 14:01:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 13:01:02 -0500
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <4499730E.8090800@biomax.com>
Message-ID: <000b01c6955c$ad0e6750$15327e82@pyrimidine>

> Chris Fields wrote:
> > Speaking of Perl6, there was interest here at one point in getting a
> > bioperl-experimental going, which at this point in the game should
> involve
> > Perl6.  If there were enough interest in it we could probably get it set
> up
> > via CVS and moving along.  We might need to split the Perl6 stuff from
> Perl5
> > experimental modules in some way to prevent confusion (bioperl6-
> live???),
> > though I'm not up to speed Perl6-wise so I'm not sure about namespace
> > collisions and so on.
> 
> As far as I understood it, the plan is to have a very smooth migration
> path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When
> new stuff is coming along, or when refactoring is done, you drop in
> 
>    use v6;
> 
> or
> 
>    use v6-pugs;
> 
> and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm
> or Audrey Tangs presentation at the Nordic Perl Workshop:
> http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
> So I would argue against having a completely seperate Perl6 experimental
> repository.

Makes sense.  I know Pugs is the Perl6 implementation in Haskell but I also
know eventually Parrot will be taking over as the compiler (hopefully).
Perl6 is pretty exciting since it's built to support OOP from the ground up,
unlike the bolted-on OOP for Perl5, and has several other features that make
it very useful (the new way regexes are handled).  I just haven't had time
to play around with it seriously enough.  I may try using Pugs a bit more,
though.

So, as long as Perl5-Perl6 work together a separate repository wouldn't be
necessary.  

> > bioperl-experimental would be, like the name implies, a sort of testing
> > ground for ideas (good and bad).  It seemed like it was going to take
> off a
> > few years ago but it lost steam, I'm guess.
> >
> > As for your parsers, would you build them from the ground up (i.e. from
> > Bio::Root::Root on up)?
>
> I'm just a casual Bio::Perl user and never hacked on any internals. So I
> don't know whether the current Bio::Perl framework is a good fit.
> 
> The idea that is floating in my mind is to make a showcase of Perl 6
> parsing, by tackling the various sequences and alignment formats.
> So this would involve shopping around for the cleanest parser
> implementations and porting that to Perl6.
> 
> Which repository to use is more a question of social engineering.
> Are there more Pugs/Perl6 hackers interested in cool biological hacking,
> or biologist aching to try out Perl6?

I suppose the best way is initially to use a non-bioperl approach using
Perl6, then try working the parsers in using 'use v6-pugs;'.  Bioperl is
heavily object-oriented so the code would probably need to be refactored
from the bottom up (or top down, depending on your view) to fit Perl6.
Having a perl5->perl6 translator helps, though.  And, again, having Perl5
and Perl6 work together helps as well.

Chris

> Regards,
>    Bernhard Schmalhofer
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************


From dwaner at scitegic.com  Wed Jun 21 14:14:00 2006
From: dwaner at scitegic.com (dwaner at scitegic.com)
Date: Wed, 21 Jun 2006 11:14:00 -0700
Subject: [Bioperl-l] EMBL release 87 format changes.
Message-ID: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>

With release 87 of EMBL (June 19th, 2006), there have been some minor 
changes to the flat file record format. In particular, the SV (sequence 
version) tag has been moved from its own line to a field in the ID line. 
See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.

Is somone already working on updating the SeqIO::embl parser, or should I 
volunteer?

- David


From bix at sendu.me.uk  Wed Jun 21 14:23:28 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 19:23:28 +0100
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <002301c694e5$e5f3a750$15327e82@pyrimidine>
References: <002301c694e5$e5f3a750$15327e82@pyrimidine>
Message-ID: <44998EA0.1010406@sendu.me.uk>

Chris Fields wrote:
> I'm working on a new eutilities interface which I hope to commit by late
> summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> generic web database interface, which I call Bio::DB::WebDBI, and the
> EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
> NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
> info like Bio::DB::WebDBSeqI.  
> 
> My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
> Does anyone think this will be an issue?

Well, I don't. Sounds good to me. What's the intended relationship 
between WebDBI and EUtilitiesI? Would your work end up in the removal of 
direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just 
convert the code that gets the XML to a one line statement or so?


From cjfields at uiuc.edu  Wed Jun 21 15:00:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 14:00:02 -0500
Subject: [Bioperl-l] EMBL release 87 format changes.
In-Reply-To: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>
Message-ID: <000c01c69564$e68b39b0$15327e82@pyrimidine>

That would be great!  Post a patch/fix via bugzilla:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

and we can add it and test it out.  Or if you have CVS access you can do it
yourself.  Not sure who's taking care of SeqIO::embl at the moment....

Added bit : you'll need to update both next_seq and write_seq.  next_seq
should probably handle both old and new EMBL format and write_seq should
only write new format (unless someone else disagrees???)

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of dwaner at scitegic.com
> Sent: Wednesday, June 21, 2006 1:14 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] EMBL release 87 format changes.
> 
> With release 87 of EMBL (June 19th, 2006), there have been some minor
> changes to the flat file record format. In particular, the SV (sequence
> version) tag has been moved from its own line to a field in the ID line.
> See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.
> 
> Is somone already working on updating the SeqIO::embl parser, or should I
> volunteer?
> 
> - David
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 17:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 16:16:38 -0500
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <44998EA0.1010406@sendu.me.uk>
Message-ID: <001b01c69577$fc7068f0$15327e82@pyrimidine>

> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Wednesday, June 21, 2006 1:23 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] EUtilities interface
> 
> Chris Fields wrote:
> > I'm working on a new eutilities interface which I hope to commit by late
> > summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> > generic web database interface, which I call Bio::DB::WebDBI, and the
> > EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can
> query
> > NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> > pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-
> only
> > info like Bio::DB::WebDBSeqI.
> >
> > My only concern is confusion over names, particularly WebDBI vs.
> WebDBSeqI.
> > Does anyone think this will be an issue?
> 
> Well, I don't. Sounds good to me. What's the intended relationship
> between WebDBI and EUtilitiesI? Would your work end up in the removal of
> direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just
> convert the code that gets the XML to a one line statement or so?

Well, right now all it does is use URI to build queries, submit them to
Entrez Utilities, then grab the response; I've been hacking at it on and off
for a few months now.  It needs some error handling and added methods
(mainly for proxies and handling WebEnv/query_key), though once I have it in
decent enough shape I'll go ahead and add it to CVS.  

Theoretically once the response is returned it can be parsed like any stream
(see WebDBSeqI/NCBIHelper for an idea of how sequences are parsed and
returned using SeqIO).  This should work as long as there is an appropriate
class to handle the data stream and the appropriate 'plugin' to parse the
data into objects; i.e. dbSNP can be handled by ClusterIO::dbSNP, sequences
by SeqIO::genbank/fasta, pubmed by Bio::Biblio::IO::pubmedxml, and so on.
If you don't have an object or want the raw data stream, you could submit a
request using the various eutility (efetch, epost, esearch) and save as raw
format to an output file or STDOUT.  

Here's a rough diagram:

                      |------------------->Bio::DB::DBFetch (EBI
interface)----->plugins for Bio* classes
Bio::Root::Root       |
LWP::UserAgent ------Bio::DB::WebDBI------>Bio::DB::EUtilitiesI (NCBI
interface)----->plugins for Bio* classes
                      |
                      |------------------->others?

You probably don't need a Bio::*IO::plugin for each type; tax data in
Bioperl seems to primarily utilizes the NCBI Tax database, so
Bio::DB::Taxonomy::entrez shouldn't be too hard to adapt to act as a plugin.
Bio::DB::Taxonomy::entrez uses XML::Twig to parse everything into
Bio::Taxonomy::Node objects and is able to retrieve single and multiple ID's
using the same method, though I would probably use XML::SAX instead.  If I
remember correctly there were issues with Bio::DB::Taxonomy that you brought
up...

Chris


From bix at sendu.me.uk  Thu Jun 22 09:28:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Jun 2006 14:28:25 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <44985915.8010607@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk>
Message-ID: <449A9AF9.2000305@sendu.me.uk>

Sendu Bala wrote:
> Some initial changes have been made to some modules in Bio::Map to allow 
> Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
> (see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)
> 
> Further changes are needed in some remaining Bio::Map modules for this 
> addition to be complete

Range is now done.

The next step is to tidy up all of Bio::Map*, which involves a major 
reimplementation of the whole system (but with no significant API 
change). Basically, the current system is a awkward mix of older 'marker 
has a single position on a map' and new 'markers have multiple positions 
on multiple maps'. This gives us strange things like SimpleMap's 
add_element method which adds a reference to the element to the map 
without the element itself knowing it is now on the map (because it is 
Position that defines what maps an element is on).

The reimplementation will make Position central to the model, allowing 
for lots of other things to work properly without anything becoming 
inconsistent (as is currently the case).

The general tidy up will involve redoing and perhaps even removing 
things. For instance, OrderedPositionWithDistance has never worked so 
will be deleted (with OrderedPosition gaining the distance functionality 
its docs says it already has).

But now is the time to speak up and change my mind if necessary!


From golharam at umdnj.edu  Thu Jun 22 17:05:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 22 Jun 2006 17:05:00 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package)
Message-ID: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>

Hi all,

I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
baseml in the PAML package to measure the distances of some non-coding
regions.  

I started with the coding regions, and used the script
bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
something similar for non-coding regions.  However, when I call
Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
meaning matrix was never defined.  

I wanted to find out if anyone on here has done this before or knows a
way to measure substitution frequencies of non-coding regions with the
PAML package.  The documentation with PAML is sparse so I'm not sure how
to interpret its output directly - that's why I'm using Bioperl.  

Hopefully someone can help me before I start digging into the
code...Thanks.

Ryan


From n.haigh at sheffield.ac.uk  Fri Jun 23 02:43:48 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 23 Jun 2006 07:43:48 +0100
Subject: [Bioperl-l] CVS Export
Message-ID: <000001c69690$61afb540$b07f6f58@nathan243dd61f>

I may have asked this previously, but I can?t find the answer to my question
anywhere so I?ll have to ask it again ? sorry.

Is it possible to export files/directories from cvs that have changed
between to tags/branches/head? Specifically, I?d like to export (as I don?t
want the cvs administrative directories) files that have been added to
Bioperl since the 1.4 release.

Cheers
Nath

----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh MPharmacol. Ph.D.
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????????????????????????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 
----------------------------------------------------------------------------
------


From cjfields at uiuc.edu  Fri Jun 23 10:58:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Jun 2006 09:58:24 -0500
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c696d5$7da6c640$15327e82@pyrimidine>

***sounds of crickets***

Ryan,

It's a pretty good possibility that Jason and the rest are on the road to
conferences and such.  There's been mention of a Durham, NC meeting and, of
course, YAPC is happening soon as well.  I wish I could help but I know
diddly about PAML besides the HOWTO on the wiki (though I may be using it
myself soon).  Sorry, you may have to be a bit patient for a more productive
response.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Thursday, June 22, 2006 4:05 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
> package)
> 
> Hi all,
> 
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
> 
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
> 
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not sure how
> to interpret its output directly - that's why I'm using Bioperl.
> 
> Hopefully someone can help me before I start digging into the
> code...Thanks.
> 
> Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Fri Jun 23 14:27:19 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Jun 2006 13:27:19 -0500
Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output
Message-ID: <CED81D34E37D5043A1211565277A51E50563FC85@exchkc02.stowers-institute.org>

Guy,

I've just downloaded and installed your latest 1.1.0 version of
exonerate but unfortunately did not find any mention in the ChangeLog of
addressing this bug, though I still see in the TODO:

    o Should GFF show all coordinates on the +ve strand? (jason_p2g eg)

I was half expecting to see this fixed in this version based on this old
thread.  

Can you please confirm that it has not yet been addressed, and accept my
request that you continue to keep this change on your list for future
versions...

Also, might you elaborate on this entry from the ChangeLog.  I don't see
it mentioned in the manpage.

    o Added %tcs etc to --ryo for dumping coding sequences 

Thanks,

Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at portal.open-bio.org 
>[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Guy Slater
>Sent: Friday, September 02, 2005 11:52 AM
>To: Cook, Malcolm
>Cc: bioperl-l
>Subject: RE: [Bioperl-l] methods, etc. for Bio::SearchIO on 
>exonerate output
>
>On Fri, 2 Sep 2005, Cook, Malcolm wrote:
>
>> Hmmmm - I'd better get some clarification from Guy too.  
>>  
>> Guy, if you don't mind reading the thread below and chiming in on our
>> discussion of interpreting the output of your excellent exonerate
>> program:
>>  
>> The sections of the manpage (
>> <http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html>
>> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear
>> relevant are these 2 excerpts:
>>  
>>  1) When an alignment is reported on the reverse complement of a
>> sequence, the coordinates are simply given on the reverse complement
>> copy of the sequence. Hence positions on the sequences are never
>> negative. Generally, the forward strand is indicated by '+', 
>the reverse
>> strand by '-', and an unknown or not-applicable strand (as 
>in the case
>> of a protein sequence) is indicated by '.' "
>>  
>> 2)  --forwardcoordinates <boolean> By default, all coordinates are
>> reported on the forward strand. Setting this option to false 
>reverts to
>> the old behaviour (pre-0.8.3) whereby alignments on the reverse
>> complement of a sequence are reported using coordinates on 
>the reverse
>> complement. 
>>  
>> We see GFF DUMP coordinates still reported on the reverse stand
>> regardless of the setting of --forwardcoordinates.  So these two
>> excerpts from you manpage seem contradictory to me.     Unless I
>> understand `--forwardcoordinates FALSE` to only effect the 
>coordinates
>> reported in the alignment section, not in the GFF DUMP 
>section, which is
>> what it appears to do in practice.
>>  
>> Guy, can you confirm that the --forwardcoordinates option 
>has no effect
>> on GFF output?
>>  
>
>Hi,
>
>Yes, it has no effect, and this is a bug
>(sorry - it was due to my misinterpretation of the GFF2 spec)
>- its on the list of things to be fixed for exonerate 1.1 (soon)
>
>> Further, can you tell us if you plan to comport more closely 
>to the GFF
>> spec, in particular in this case by reporting 
>forwardcoordinates in the
>> GFF DUMP section too?   I see 
>> I see in your TODO list "    o Should GFF show all coordinates on the
>> +ve strand? (jason_p2g eg)".  Hear hear!  I second the motion.
>>  
>> And TODO item " GFF3 support ? http://song.sf.net/" gets my 
>vote too....
>> though this is more of a sticky wicket....
>>  
>
>Yup, GFF3 support is on the list,
>but probably it will not be done in time for exonerate 1.1
>Of course, I'd welcome a patch ...    ;)
>
>(I'm mainly working on getting the cdna2genome
> and genome2genome models working properly for 1.1)
>
>Cheers,
>
>Guy.
>
>> Cheers and Thanks!
>>  
>> Malcolm Cook
>>  
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Friday, September 02, 2005 9:46 AM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> I've already talked to Guy about some of this and I assume 
>fixes will be
>> part of the next release, but it can't hurt to have more people
>> requesting.  The main problem right now is reverse strand hits in GFF
>> output are still screwed up even if you provide the 
>--forwardcoordinates
>> option. 
>> 
>> If someone wanted to write/donate a VULGAR to GFF subroutine (okay
>> VULGAR to a list of Bio::Search::HSP::GenericHSP).  We can also
>> reconstruct everything needed from that, I gave a stab at it 
>once, but
>> there was something missing (or maybe it was pre --forwardcoordinates
>> option).   
>> 
>> 
>> -jason 
>> 
>> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:
>> 
>> 
>> Jason,
>>  
>> Thanks for the scripts and clues (esp re: using the --ryo option to
>> inject the needed length into the exonerate output to compensate).
>>  
>> I'm considering asking exonerate author to comport with GFF spec.  Do
>> you think this is a road to take?
>>  
>> Cheers,
>>  
>> Malcolm
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Wednesday, August 31, 2005 12:35 PM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> 
>http://fungal.genome.duke.edu/~jes12/software/scripts/process_e
>xonerate_
>> gff3.pl
>> 
>> You may still want to massage it some, but I use the script in this
>> basic form, maybe with a few tweaks:
>> 
>> Note that it requires you to run exonerate with specific 
>--ryo options
>> so that it includes the length of the query and hit sequences in the
>> report output. should be covered in the perldoc in the script.
>> 
>> Without the ryo options enabled,  you'll need to modify the 
>script more
>> to have access to the original sequence db, use 
>Bio::DB::Fasta,  and put
>> in some $dbh->length($seqid) calls instead.
>> 
>> I don't think the part which writes HSP/match lines is 
>actually correct
>> - it is trying to roll gapped HSPs from the similarity features. 
>> 
>> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse
>> instance and/or grepping out the lines I really think I need.  
>> You may want to s/exon/CDS/ for the protein2genome output as well.
>> 
>> -jason
>> 
>> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
>> 
>> 
>> Jason, 
>> 
>> This message is in regards to an old thread  in which you offered to
>> shared a 'script for munging over' exonerate output for lading in
>> DB::GFF (c.f.
>> <http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html>
>> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)
>> 
>> Would you be willing to still share that script, if you've got it
>> around? 
>> 
>> Thanks, and regards, 
>> 
>> Malcolm Cook -  <mailto:mec at stowers-institute.org>
>> mec at stowers-institute.org - 816-926-4449
>> Database Applications Manager - Bioinformatics
>> Stowers Institute for Medical Research - Kansas City, MO  USA
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>
>-- 
>%!PS % <------ Guy St.C. Slater ------> 
>http://www.ebi.ac.uk/~guy/  <------
>210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 
>12/d{exch moveto}
>a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 
>f}a/h{setlinewidth newpath dup
>g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 
>0 108 arc d e
>18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From oldham at ucla.edu  Fri Jun 23 12:18:39 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 23 Jun 2006 09:18:39 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>

Hello again,

I finally got it to work, using the following script.  However, it takes
about 5 hours to run on a fast computer.  Using grep (in bash), on the
other hand, takes about 5 minutes (see below if you are interested).
Thanks to everyone for your help!

SLOW perl script:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_all_X';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
print @ID;
chomp @ID;

while (my $line = <PROBES>) {
	foreach my $identifier (@ID) {
		if($line=~/^>probe:\w+:$identifier:/) {
				print OUT $line;
				print OUT scalar(<PROBES>);
		}
	}
}
exit;


FAST bash script:

#!/usr/bin/bash
exec<"ID_all_X"
while read line; do
	echo $line
	grep -A 1 :$line: HG_U95Av2_probe_fasta >>myresults.txt
done


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Wednesday, June 14, 2006 6:48 AM
To: Michael Oldham; Chris Fields
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT;

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu]
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086;
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date:
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date:
>6/13/2006
>
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.9.2/373 - Release Date: 6/22/2006


From pmiguel at purdue.edu  Sat Jun 24 10:17:46 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 10:17:46 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <449D498A.9020107@purdue.edu>

Brian Osborne wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>   
I would be very disappointed to lose one part of bptutorial.pl--this was 
described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
only purpose I've ever used bptutorial.pl for--to find all the methods 
available to any given object. Eg:

bptutorial.pl 100 Bio::PrimarySeq

 ***Methods for Object Bio::PrimarySeq ********


 Methods taken from package Bio::IdentifiableI
 lsid_string   namespace_string

 Methods taken from package Bio::PrimarySeq
 accession   accession_number   alphabet   authority   can_call_new   desc
 description   direct_seq_set   display_id   display_name   id   is_circular
 length   namespace   new   object_id   primary_id   seq
 subseq   validate_seq   version

 Methods taken from package Bio::PrimarySeqI
 moltype   revcom   translate   trunc

 Methods taken from package Bio::Root::Root
 DESTROY   confess   debug   throw   verbose

 Methods taken from package Bio::Root::RootI
 carp   deprecated   stack_trace   stack_trace_dump   
throw_not_implemented   warn
 warn_not_implemented


Phillip SanMiguel


From sdavis2 at mail.nih.gov  Sat Jun 24 10:45:52 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 24 Jun 2006 10:45:52 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a singlelargefile
References: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>
Message-ID: <001a01c6979c$ff576dd0$6501a8c0@WATSON>


----- Original Message ----- 
From: "Michael Oldham" <oldham at ucla.edu>
To: "Cook, Malcolm" <MEC at stowers-institute.org>; "Chris Fields" 
<cjfields at uiuc.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Friday, June 23, 2006 12:18 PM
Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
singlelargefile


> Hello again,
>
> I finally got it to work, using the following script.  However, it takes
> about 5 hours to run on a fast computer.  Using grep (in bash), on the
> other hand, takes about 5 minutes (see below if you are interested).
> Thanks to everyone for your help!
>
> SLOW perl script:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID_all_X';
>
> unless (open(IDFILE, $IDs)) {
> print "Could not open file $IDs!\n";
> }
>
> my $probes = 'HG_U95Av2_probe_fasta';
>
> unless (open(PROBES, $probes)) {
> print "Could not open file $probes!\n";
> }
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> print @ID;
> chomp @ID;
>
> while (my $line = <PROBES>) {
> foreach my $identifier (@ID) {
> if($line=~/^>probe:\w+:$identifier:/) {
> print OUT $line;
> print OUT scalar(<PROBES>);
> }
> }
> }

This could probably be done MUCH faster using a hash on the sequence 
identifier.  (I have to admit that I didn't follow the first part of this 
conversation, so I could be misunderstanding some part of what you are 
trying to do.)  If you have a couple hundred-thousand sequences, my guess is 
that it could be done in under 30 seconds, but I could be wrong about the 
exact time.  The important part is to make a hash of your sequences with the 
key being the $identifier.  Then, loop through your @ID array doing 
something like (untested):

#open files as before and read in @ID as before

my %seq_hash;

while (my $line = <PROBES>) {
    if ($line =~/^>probe:\w+:$identifier:/) {
        $seq_hash{$identifier}=<PROBES>;
    }
}

foreach my $id (@ID) {
    print OUT ">$id\n" . $seq_hash{$id};
}


From arareko at campus.iztacala.unam.mx  Sat Jun 24 11:27:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 10:27:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D498A.9020107@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>
	<449D498A.9020107@purdue.edu>
Message-ID: <449D59C7.4030008@campus.iztacala.unam.mx>

Hi Philip,

Have you tried the Deobfuscator interface? It's a newer and better way 
to browse all the methods available in BioPerl:

http://bioperl.org/wiki/Deobfuscator
http://bioperl.org/cgi-bin/deob_interface.cgi

Regards,
Mauricio.

Phillip SanMiguel wrote:
> Brian Osborne wrote:
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
>>   
> I would be very disappointed to lose one part of bptutorial.pl--this was 
> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
> only purpose I've ever used bptutorial.pl for--to find all the methods 
> available to any given object. Eg:
> 
> bptutorial.pl 100 Bio::PrimarySeq
> 
>  ***Methods for Object Bio::PrimarySeq ********
> 
> 
>  Methods taken from package Bio::IdentifiableI
>  lsid_string   namespace_string
> 
>  Methods taken from package Bio::PrimarySeq
>  accession   accession_number   alphabet   authority   can_call_new   desc
>  description   direct_seq_set   display_id   display_name   id   is_circular
>  length   namespace   new   object_id   primary_id   seq
>  subseq   validate_seq   version
> 
>  Methods taken from package Bio::PrimarySeqI
>  moltype   revcom   translate   trunc
> 
>  Methods taken from package Bio::Root::Root
>  DESTROY   confess   debug   throw   verbose
> 
>  Methods taken from package Bio::Root::RootI
>  carp   deprecated   stack_trace   stack_trace_dump   
> throw_not_implemented   warn
>  warn_not_implemented
> 
> 
> Phillip SanMiguel
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From golharam at umdnj.edu  Sat Jun 24 10:43:29 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 10:43:29 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>
Message-ID: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>

I've managed to code three methods to calculate K into a perl script
using the algorithms as described in "Molecular Evolution" by Wen-Hsuing
Li.   I'd be happy to contribute it as a script...


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:40 AM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from 
> baseml in the PAML package to measure the distances of some non-coding

> regions.
>
> I started with the coding regions, and used the script 
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do 
> something similar for non-coding regions.  However, when I call 
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' 
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a

> way to measure substitution frequencies of non-coding regions with the

> PAML package.  The documentation with PAML is sparse so I'm not
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the 
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 12:59:21 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 12:59:21 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D59C7.4030008@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>
	<449D59C7.4030008@campus.iztacala.unam.mx>
Message-ID: <449D6F69.1090104@purdue.edu>

Yes I have. It is very useful.
But in situations where I don't have web access? Or I am working with 
Bioperl 1.5?

Mauricio Herrera Cuadra wrote:
> Hi Philip,
>
> Have you tried the Deobfuscator interface? It's a newer and better way 
> to browse all the methods available in BioPerl:
>
> http://bioperl.org/wiki/Deobfuscator
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Regards,
> Mauricio.
>
> Phillip SanMiguel wrote:
>   
>> Brian Osborne wrote:
>>     
>>> Jay,
>>>
>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>
>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>> don't want to have to maintain two bptutorials.
>>>   
>>>       
>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>> available to any given object. Eg:
>>
>> bptutorial.pl 100 Bio::PrimarySeq
>>
>>  ***Methods for Object Bio::PrimarySeq ********
>>
>>
>>  Methods taken from package Bio::IdentifiableI
>>  lsid_string   namespace_string
>>
>>  Methods taken from package Bio::PrimarySeq
>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>  length   namespace   new   object_id   primary_id   seq
>>  subseq   validate_seq   version
>>
>>  Methods taken from package Bio::PrimarySeqI
>>  moltype   revcom   translate   trunc
>>
>>  Methods taken from package Bio::Root::Root
>>  DESTROY   confess   debug   throw   verbose
>>
>>  Methods taken from package Bio::Root::RootI
>>  carp   deprecated   stack_trace   stack_trace_dump   
>> throw_not_implemented   warn
>>  warn_not_implemented
>>
>>
>> Phillip SanMiguel
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   


From arareko at campus.iztacala.unam.mx  Sat Jun 24 13:35:54 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 12:35:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D6F69.1090104@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
Message-ID: <449D77FA.70103@campus.iztacala.unam.mx>

Currently I'm modifying the Deobfuscator so it'd be capable of browsing 
the different BioPerl packages as well as their respective releases, but 
haven't got many spare time to finish it :(

Dave and I committed the Deobfuscator into the bioperl-live source tree 
(in /doc directory), so it'd be included in future releases of BioPerl. 
I'm also working on a command line version which won't need a CGI 
environment to have the same functionality, this would address the web 
access situation that you mention.

Phillip SanMiguel wrote:
> Yes I have. It is very useful.
> But in situations where I don't have web access? Or I am working with 
> Bioperl 1.5?
> 
> Mauricio Herrera Cuadra wrote:
>> Hi Philip,
>>
>> Have you tried the Deobfuscator interface? It's a newer and better way 
>> to browse all the methods available in BioPerl:
>>
>> http://bioperl.org/wiki/Deobfuscator
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> Regards,
>> Mauricio.
>>
>> Phillip SanMiguel wrote:
>>   
>>> Brian Osborne wrote:
>>>     
>>>> Jay,
>>>>
>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>
>>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>>> don't want to have to maintain two bptutorials.
>>>>   
>>>>       
>>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>>> available to any given object. Eg:
>>>
>>> bptutorial.pl 100 Bio::PrimarySeq
>>>
>>>  ***Methods for Object Bio::PrimarySeq ********
>>>
>>>
>>>  Methods taken from package Bio::IdentifiableI
>>>  lsid_string   namespace_string
>>>
>>>  Methods taken from package Bio::PrimarySeq
>>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>>  length   namespace   new   object_id   primary_id   seq
>>>  subseq   validate_seq   version
>>>
>>>  Methods taken from package Bio::PrimarySeqI
>>>  moltype   revcom   translate   trunc
>>>
>>>  Methods taken from package Bio::Root::Root
>>>  DESTROY   confess   debug   throw   verbose
>>>
>>>  Methods taken from package Bio::Root::RootI
>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>> throw_not_implemented   warn
>>>  warn_not_implemented
>>>
>>>
>>> Phillip SanMiguel
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>     
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Sat Jun 24 09:39:56 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 09:39:56 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
References: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>

baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
>
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not  
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 13:48:15 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 13:48:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D77FA.70103@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
Message-ID: <449D7ADF.3030604@purdue.edu>


Yes, that would be better than bptutorial.pl 100 then. For some modules 
bptutorial.pl 100 doesn't seem to give any of the methods they have 
access to. Whereas the deobfuscator does.

Mauricio Herrera Cuadra wrote:
> Currently I'm modifying the Deobfuscator so it'd be capable of 
> browsing the different BioPerl packages as well as their respective 
> releases, but haven't got many spare time to finish it :(
>
> Dave and I committed the Deobfuscator into the bioperl-live source 
> tree (in /doc directory), so it'd be included in future releases of 
> BioPerl. I'm also working on a command line version which won't need a 
> CGI environment to have the same functionality, this would address the 
> web access situation that you mention.
>
> Phillip SanMiguel wrote:
>> Yes I have. It is very useful.
>> But in situations where I don't have web access? Or I am working with 
>> Bioperl 1.5?
>>
>> Mauricio Herrera Cuadra wrote:
>>> Hi Philip,
>>>
>>> Have you tried the Deobfuscator interface? It's a newer and better 
>>> way to browse all the methods available in BioPerl:
>>>
>>> http://bioperl.org/wiki/Deobfuscator
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Phillip SanMiguel wrote:
>>>  
>>>> Brian Osborne wrote:
>>>>    
>>>>> Jay,
>>>>>
>>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>>
>>>>> - Do we remove the file bptutorial.pl from the package now? I'd 
>>>>> say yes, we
>>>>> don't want to have to maintain two bptutorials.
>>>>>         
>>>> I would be very disappointed to lose one part of 
>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for 
>>>> Bioinformatics_. It is the only purpose I've ever used 
>>>> bptutorial.pl for--to find all the methods available to any given 
>>>> object. Eg:
>>>>
>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>
>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>
>>>>
>>>>  Methods taken from package Bio::IdentifiableI
>>>>  lsid_string   namespace_string
>>>>
>>>>  Methods taken from package Bio::PrimarySeq
>>>>  accession   accession_number   alphabet   authority   
>>>> can_call_new   desc
>>>>  description   direct_seq_set   display_id   display_name   id   
>>>> is_circular
>>>>  length   namespace   new   object_id   primary_id   seq
>>>>  subseq   validate_seq   version
>>>>
>>>>  Methods taken from package Bio::PrimarySeqI
>>>>  moltype   revcom   translate   trunc
>>>>
>>>>  Methods taken from package Bio::Root::Root
>>>>  DESTROY   confess   debug   throw   verbose
>>>>
>>>>  Methods taken from package Bio::Root::RootI
>>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>>> throw_not_implemented   warn
>>>>  warn_not_implemented
>>>>
>>>>
>>>> Phillip SanMiguel
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>   
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From jason at bioperl.org  Sat Jun 24 14:42:57 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 14:42:57 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
References: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>

You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun 24 15:07:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 14:07:06 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D7ADF.3030604@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
	<449D7ADF.3030604@purdue.edu>
Message-ID: <EF5998FD-BA4F-439C-873E-71E55DBA0F4D@uiuc.edu>

As a quickie method I use the script from the FAQ; you have to  
install Class::Inspector:

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector->methods 
($class,'full','public')}), "\n";

Works well, though doesn't have the links and so on like  
Deobfuscator; I use HTML-generated ActiveState docs:

glaciers-115 chris$ methods.pl Bio::SeqIO
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::debug
Bio::Root::Root::except
Bio::Root::Root::finally
Bio::Root::Root::otherwise
Bio::Root::Root::throw
Bio::Root::Root::try
Bio::Root::Root::verbose
Bio::Root::Root::with
Bio::Root::RootI::carp
Bio::Root::RootI::confess
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::SeqIO::DESTROY
Bio::SeqIO::PRINT
Bio::SeqIO::READLINE
Bio::SeqIO::TIEHANDLE
Bio::SeqIO::alphabet
Bio::SeqIO::fh
Bio::SeqIO::location_factory
Bio::SeqIO::new
Bio::SeqIO::newFh
Bio::SeqIO::next_seq
Bio::SeqIO::object_factory
Bio::SeqIO::sequence_builder
Bio::SeqIO::sequence_factory
Bio::SeqIO::write_seq


Chris

On Jun 24, 2006, at 12:48 PM, Phillip SanMiguel wrote:

>
> Yes, that would be better than bptutorial.pl 100 then. For some  
> modules
> bptutorial.pl 100 doesn't seem to give any of the methods they have
> access to. Whereas the deobfuscator does.
>
> Mauricio Herrera Cuadra wrote:
>> Currently I'm modifying the Deobfuscator so it'd be capable of
>> browsing the different BioPerl packages as well as their respective
>> releases, but haven't got many spare time to finish it :(
>>
>> Dave and I committed the Deobfuscator into the bioperl-live source
>> tree (in /doc directory), so it'd be included in future releases of
>> BioPerl. I'm also working on a command line version which won't  
>> need a
>> CGI environment to have the same functionality, this would address  
>> the
>> web access situation that you mention.
>>
>> Phillip SanMiguel wrote:
>>> Yes I have. It is very useful.
>>> But in situations where I don't have web access? Or I am working  
>>> with
>>> Bioperl 1.5?
>>>
>>> Mauricio Herrera Cuadra wrote:
>>>> Hi Philip,
>>>>
>>>> Have you tried the Deobfuscator interface? It's a newer and better
>>>> way to browse all the methods available in BioPerl:
>>>>
>>>> http://bioperl.org/wiki/Deobfuscator
>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>
>>>> Regards,
>>>> Mauricio.
>>>>
>>>> Phillip SanMiguel wrote:
>>>>
>>>>> Brian Osborne wrote:
>>>>>
>>>>>> Jay,
>>>>>>
>>>>>> Excellent! Now we need to answer a few more questions for  
>>>>>> ourselves:
>>>>>>
>>>>>> - Do we remove the file bptutorial.pl from the package now? I'd
>>>>>> say yes, we
>>>>>> don't want to have to maintain two bptutorials.
>>>>>>
>>>>> I would be very disappointed to lose one part of
>>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for
>>>>> Bioinformatics_. It is the only purpose I've ever used
>>>>> bptutorial.pl for--to find all the methods available to any given
>>>>> object. Eg:
>>>>>
>>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>>
>>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>>
>>>>>
>>>>>  Methods taken from package Bio::IdentifiableI
>>>>>  lsid_string   namespace_string
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeq
>>>>>  accession   accession_number   alphabet   authority
>>>>> can_call_new   desc
>>>>>  description   direct_seq_set   display_id   display_name   id
>>>>> is_circular
>>>>>  length   namespace   new   object_id   primary_id   seq
>>>>>  subseq   validate_seq   version
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeqI
>>>>>  moltype   revcom   translate   trunc
>>>>>
>>>>>  Methods taken from package Bio::Root::Root
>>>>>  DESTROY   confess   debug   throw   verbose
>>>>>
>>>>>  Methods taken from package Bio::Root::RootI
>>>>>  carp   deprecated   stack_trace   stack_trace_dump
>>>>> throw_not_implemented   warn
>>>>>  warn_not_implemented
>>>>>
>>>>>
>>>>> Phillip SanMiguel
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pmiguel at purdue.edu  Sat Jun 24 15:37:08 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 15:37:08 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <449D9464.6030508@purdue.edu>

Here is an example bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1682

It was a bug fixed in a module in BioPerl 1.4  back in October of 2004. 
The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the 
module. However the version of the module currently available from CPAN 
is 1.6. (That is the current "stable" release, BioPerl 1.4.0)

I've written a script that relies on that bug being fixed. How should I 
deal with this when I want to give the script to others to use? Just 
tell them "You must have BioPerl 1.5 installed". Give them instructions 
for patching the module code?

How long before the next "stable" release? Maybe a year? Should not a 
BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or 
would that be very difficult?

By the way, I think the revision graph viewer is great for someone, at 
best, peripherally involved in BioPerl to figure out which module 
version is associated with which BioPerl version, for example:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/QualI.pm?graph=1
Phillip SanMiguel


From golharam at umdnj.edu  Sat Jun 24 14:57:52 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 14:57:52 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>
Message-ID: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>

Hi Jason,

It looks like DNAStatistics is only for coding sequences.  I'm trying to
calculate the Ks of exons and the K (or Ki) of introns.  All the methods
in bioperl are based on coding sequences.  Only the  PAUP package (that
I've found) does non-coding sequences.   I would have used it but you
need to pay for it and we don't have the funding to purchase much at the
moment.

I brielfy looked at PHYLIP and EMBOSS but it didn't look as
straight-forward as I was hoping it would be.  Either that, or I was
getting fustrated looking for a simple solution.  

In the end, I found a molecular evolution book that talks about several
methods used for non-coding sequences so I went ahead and implemented
them.  They seem to work well.  

Ryan


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 2:43 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script 
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Sat Jun 24 18:37:15 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 18:37:15 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
Message-ID: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>

I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
alignments and parsing the resulting alignments.

The ClustalW output is being sent to STDOUT.  Is there a way I can
redirect the output to STDERR instead?

Here's how I'm using it:

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

(Forgive me if it in the docs - I've been coding for a week straight now
including saturday)

Thanks, Ryan


From cjfields at uiuc.edu  Sat Jun 24 20:16:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 19:16:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>

On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

A BioPerl module version is not the same as the distribution  
version.  All the modules have different version numbers  
corresponding to CVS commits for various code changes.  If you want  
to see the version for the distribution, read this:

http://www.bioperl.org/wiki/ 
FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

Many 'bug fixes', you'll find, have less to do with problems/bugs in  
BioPerl code than they do with outside code changes beyond our  
control.  By that I mean changes to other programs modify output so  
parsers break (BLAST, PAML, etc), or changes to API for remote  
databases that break queries (recent changes in EBI database  
concerning Swissprot, for example).  So, the code is considered  
'stable' at the time of release, but past that point issues beyond  
our control may break certain modules parsing output, accessing  
remote databases, and so on, at any time. This link:

http://www.bioperl.org/wiki/FAQ#BioPerl_in_General

should answer a few more questions you may have.  The FAQ is very  
helpful...

In general, if there are problems with code you could look at the  
latest developer's release (1.5.1, released in Oct 2005) to see if  
any bugs have been fixed.  They may be fixed post-1.5.1 and will be  
in CVS; you can always suggest using 1.5.1 (it's pretty stable) and  
updating only the fixed modules from CVS if needed.

> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

No, it's not that easy.  BioPerl isn't like most CPAN modules with  
one or two developers.  See the wiki page for details on planning  
releases to see why:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

It takes a lot of effort and coordination, much more so than the  
average CPAN module.  I believe some of the core developers are  
meeting this weekend; maybe something will come of that and we'll get  
an idea of a next release.

Chris

> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat Jun 24 21:02:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 24 Jun 2006 21:02:36 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>


On Jun 24, 2006, at 3:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

Either way. If the patch is trivial you could also provide the patch  
as an option. Generally we don't support that though. (Not everything  
that we don't support we don't support because it doesn't work.  
Sometimes it's just a statement along 'it-probably-works-but-don't- 
bug-us-if-it-doesn't'.)

>
> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

1.5.1 fixes a number of other problems too, so there isn't really  
much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,  
so investing time into creating 1.4.1 we think is not the best  
investment we can make.

Our current goal is to release 1.5.2 and possibly more development  
versions all leading on a steady path to 1.6.0. There's very few (but  
significant) stumbling blocks on this path that will require I  
believe some dedicated time from a couple  of people and after that  
there shouldn't be any real obstacles. It's quite possible that at  
BOSC or as early as next week at the GMOD meeting we could see a leap  
forward, typically it's those meetings that pull the respective  
people away from their daily obligations (short of an actual  
hackathons).

Some time back in spring 1.6 was put in proximity to BOSC, but that's  
probably not going to happen, but quite possibly not that much  
afterwards.

	-hilmar

>
> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Jun 24 21:21:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:21:56 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c697f5$c08d1150$15327e82@pyrimidine>

According to the docs ( ;> ) the default behaviour is to return "a BioPerl
Bio::SimpleAlign object which can then be printed and/or saved in multiple
formats using the AlignIO.pm module"; you should be able to do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight now
> including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Sat Jun 24 21:38:06 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 21:38:06 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
References: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>

they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm  
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as
> straight-forward as I was hoping it would be.  Either that, or I was
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about  
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun 24 21:40:49 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:40:49 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>
Message-ID: <000401c697f8$62d41e70$15327e82@pyrimidine>

...
> > I've written a script that relies on that bug being fixed. How
> > should I
> > deal with this when I want to give the script to others to use? Just
> > tell them "You must have BioPerl 1.5 installed". Give them
> > instructions
> > for patching the module code?
> 
> Either way. If the patch is trivial you could also provide the patch
> as an option. Generally we don't support that though. (Not everything
> that we don't support we don't support because it doesn't work.
> Sometimes it's just a statement along 'it-probably-works-but-don't-
> bug-us-if-it-doesn't'.)

The bug was fixed post-1.4 release according to the link, so Phillip should
use v1.5.1 or newer.

Hilmar's right.  It's hard to address every single complaint about code not
working or method not implemented w/o having patches or fixes submitted.
It's not my top priority to fix bugs in modules submitted by other authors
when I don't know the code.  I'll try if I have the free time, but that's
getting to be a precious commodity lately...

> > How long before the next "stable" release? Maybe a year? Should not a
> > BioPerl 1.4.1 be released so CPAN would get bug fixes like this
> > one? Or
> > would that be very difficult?
> 
> 1.5.1 fixes a number of other problems too, so there isn't really
> much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,
> so investing time into creating 1.4.1 we think is not the best
> investment we can make.
> 
> Our current goal is to release 1.5.2 and possibly more development
> versions all leading on a steady path to 1.6.0. There's very few (but
> significant) stumbling blocks on this path that will require I
> believe some dedicated time from a couple  of people and after that
> there shouldn't be any real obstacles. It's quite possible that at
> BOSC or as early as next week at the GMOD meeting we could see a leap
> forward, typically it's those meetings that pull the respective
> people away from their daily obligations (short of an actual
> hackathons).
> 
> Some time back in spring 1.6 was put in proximity to BOSC, but that's
> probably not going to happen, but quite possibly not that much
> afterwards.
> 
> 	-hilmar
...

Nice to know.  I guess a Release Pumpkin will be picked as well.  BOSC is
right around the corner so I guess we can expect something announced soon as
to a possible roadmap (we can't talk about 'timelines' in the States, it's
not patriotic).  

Chris


From golharam at umdnj.edu  Sat Jun 24 23:03:01 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:03:01 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000301c697f5$c08d1150$15327e82@pyrimidine>
Message-ID: <000301c69803$df899f20$2f01a8c0@GOLHARMOBILE1>

Thanks Chris.  It is in fact when you call align() that clustalw
generates the output that you see on the console.  The alignment is
generates I'm parsing right away.  Here's the output (an example) of
what I'm referring to:

-- BEGIN --
 CLUSTAL W (1.83) Multiple Sequence Alignments


Sequence format is Pearson
Sequence 1: human           271 aa
Sequence 2: mouse           264 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  90
Guide tree        file created:   [/tmp/TX4yxP9uKQ/80W87TkT5Z.dnd]
Start of Multiple Alignment
There are 1 groups
Aligning...
Group 1: Sequences:   2      Score:5469
Alignment Score 1480
GCG-Alignment file created      [/tmp/TX4yxP9uKQ/xE4GNyY7Rc]
-- END --

How do I get this to do to stderr instead of stdout? 

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Saturday, June 24, 2006 9:22 PM
To: golharam at umdnj.edu; bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
redirect?


According to the docs ( ;> ) the default behaviour is to return "a
BioPerl Bio::SimpleAlign object which can then be printed and/or saved
in multiple formats using the AlignIO.pm module"; you should be able to
do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output 
> redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some 
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can 
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight 
> now including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Sat Jun 24 23:05:41 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:05:41 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>
Message-ID: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>

>>they make no assumption about coding sequence,
>>where do you get that impression

I get that information from the 1.5 api docs:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/

Its documented under the description section.  

Oh well, I have it coded and working...might as well use it.

Ryan
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:38 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as 
> straight-forward as I was hoping it would be.  Either that, or I was 
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Sun Jun 25 07:33:58 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Jun 2006 12:33:58 +0100
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
References: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <449E74A6.3020709@sendu.me.uk>

Ryan Golhar wrote:
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);

You can suppress the output completely using
$aln_factory->quiet(1);

(supplying quiet => 1 to new() should also work according to the docs, 
but doesn't seem to be implemented, though I could be wrong)

If you really want the messages on STDERR you could try redirecting 
STDOUT to STDERR before calling align():
open(OLDOUT, ">&STDOUT");
open(STDOUT, ">&STDERR");
my $aa_aln = $aln_factory->align(\@aa_seq);
open(STDOUT, ">&OLDOUT");

I haven't tested either of these ideas, but I think they should both 
work - try them out and let us know.

Ideally there would be a saner way of doing this, but it isn't readily 
apparent to me.


From jason at bioperl.org  Sun Jun 25 08:37:11 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 25 Jun 2006 08:37:11 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with
	(baseml from PAML package)]
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>


On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:

>>> they make no assumption about coding sequence,
>>> where do you get that impression
>
> I get that information from the 1.5 api docs:
>
> http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
great - I would also always point people to the LIVE code  
documentation not the 1.5.0-RC1 which is +1 years old, but nothing  
particular has changed in this module since 1.5.0 that I know of.   
Someday someone will put a new ball of docs up on the site, but I  
hope that will come with the next development or stable release.

> Its documented under the description section.
>
i don't really see what you refer to since there is a lot of  
documentation, but perhaps it should be clarified - I had hoped this  
was a sufficient description:
"This object contains routines for calculating various statistics and  
distances for DNA alignments."

> Oh well, I have it coded and working...might as well use it.
>
Sounds like your best bet for your situation.

For the record and in the mailing list archives - as long as you  
don't call a method that contains "KaKs" it will work fine.  You can  
calculate distances using the currently implemented distance methods:

    JukesCantor
    Uncorrected
    F81
    Kimura
    Tamura
    F84 (Felsenstien 84)
    TajimaNei
    JinNei


It will be more productive is to just drop the discussion since you  
seem to be fine without all of this anyways  - if you decide you  
would like to use it and contribute new distances methods or doc  
fixes I am sure we'll enjoy your contributions.


-jason
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Sun Jun 25 13:05:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 12:05:34 -0500
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
	with(baseml from PAML package)]
In-Reply-To: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>
Message-ID: <000901c69879$97b7d5b0$15327e82@pyrimidine>

> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing
> particular has changed in this module since 1.5.0 that I know of.
> Someday someone will put a new ball of docs up on the site, but I
> hope that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as it's
the most up-to-date, that fact isn't really emphasized much on the docs
page; the link is along with the other toolkits at the bottom of the page
and is listed as Bioperl Core Code (some users don't seem to get that, in
general, bioperl=bioperl core).  Could be this is causing a bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or that I
should be able to!).  Maybe the best way to go is to have a wiki page for
this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or at
least link to the Release Pumpkin page which indicates the code release date
for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.  Work
is underway to correct them, but do not expect this code to give you the
right answer currently!  Use dnadist/distmat in the PHLYIP or EMBOSS
packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some point
relatively soon myself so I may be up to some testing on one/both of these
modules if needed.

Chris

....


From golharam at umdnj.edu  Sun Jun 25 13:20:12 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sun, 25 Jun 2006 13:20:12 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
 with(baseml from PAML package)]
In-Reply-To: <000901c69879$97b7d5b0$15327e82@pyrimidine>
Message-ID: <000801c6987b$9e65f840$2f01a8c0@GOLHARMOBILE1>

Exactly.  Also on the page it says (in the descriptionfor
Bio::Align::DNAStatistics):

In order to use these methods there are
several pre-requisites for the alignment.

   1
   DNA alignment must be based on protein alignment. Use the subroutine
aa_to_dna_aln    in Bio::Align::Utilities to achieve this.

 Etc etc etc


The rest of the pre-reqs also mention that the sequences should be
coding sequences.  Because of this, I thought DNAStatistics was only for
coding sequences and could not be used for non-coding sequences...

Anyway, I've gotten past my troubles and am on to finish this project.
I think the isssues I ran into others might run into as well.  I'd be
happy to contribue what I can but need to finish this stuff first...
Thanks for all your help Jason, Chris, Sendu!

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Sunday, June 25, 2006 1:06 PM
To: 'Jason Stajich'; golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] DNA distance methods [was
Bio::Tools::Phylo::PAML with(baseml from PAML package)]


> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code 
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing 
> particular has changed in this module since 1.5.0 that I know of. 
> Someday someone will put a new ball of docs up on the site, but I hope

> that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as
it's the most up-to-date, that fact isn't really emphasized much on the
docs page; the link is along with the other toolkits at the bottom of
the page and is listed as Bioperl Core Code (some users don't seem to
get that, in general, bioperl=bioperl core).  Could be this is causing a
bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or
that I should be able to!).  Maybe the best way to go is to have a wiki
page for this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or
at least link to the Release Pumpkin page which indicates the code
release date for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.
Work is underway to correct them, but do not expect this code to give
you the right answer currently!  Use dnadist/distmat in the PHLYIP or
EMBOSS packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some
point relatively soon myself so I may be up to some testing on one/both
of these modules if needed.

Chris

....


From pmiguel at purdue.edu  Sun Jun 25 15:02:14 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sun, 25 Jun 2006 15:02:14 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
Message-ID: <449EDDB6.8020401@purdue.edu>

Chris Fields wrote:
> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>
> [...]
>> How long before the next "stable" release? Maybe a year? Should not a
>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or
>> would that be very difficult?
>
> No, it's not that easy.  BioPerl isn't like most CPAN modules with one 
> or two developers.  See the wiki page for details on planning releases 
> to see why:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> It takes a lot of effort and coordination, much more so than the 
> average CPAN module.  I believe some of the core developers are 
> meeting this weekend; maybe something will come of that and we'll get 
> an idea of a next release.
>
> Chris
Hi Chris,
   Thanks for the information--the key part being that a bug fix from a 
couple of years ago has not propagated into the current stable release. 
Below I'll try to convince you that this is a serious problem. (Not 
because it is your fault, of course. I'm just trying to deliver my take 
on the situation to the bioperl-programmer-warriors who happen to be 
listening...)
   It isn't a problem for me to edit the offending statement in the 
QualI.pm module on systems I generally use. Or even install a 
developer's release of bioperl. My problem is one of advocacy. Maybe I 
have a warped view of the world, but it seems that except for those 
directly involved in the bioperl or GMOD projects, everyone looks to 
CPAN when they install bioperl.
    I write scripts that I sometimes want to send to biologists even 
less programming-capable than I am. I can just barely envision those 
biologists pestering their sysadmin to do a CPAN install of bioperl 
modules so that my script will work. But installing a non-CPAN set of 
modules probably isn't going to happen.
    So, this being the case, how can I, with a clear conscious, advocate 
bioperl to the junior bioinformaticians with whom I happen to interact?
    My take, for what it is worth, is that 1.5 has become an unratified 
stable release. How hard would it be to take 1.5.1--as is--and deposit 
that in CPAN? What would be the downside?

Phillip SanMiguel
   

From hlapp at gmx.net  Sun Jun 25 15:42:20 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 25 Jun 2006 15:42:20 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>

We did not and will not deposit 1.5.1 into CPAN due to the API issues  
in some (rather central) interfaces. These issues are changes over  
the 1.4 API and some of those changes are going to go away. Once we  
deposit it into CPAN we would sanction the changed API as the new  
'official' API and would open a huge can of backward liability worms.  
If you just continue to use the 1.4 API on the 1.5.1 release you  
don't need to be concerned about an API method you're using going away.

As I said, the people from the core group of developers who have  
traditionally shepherded releases all think that doing a 1.4.1  
release wouldn't be the best investment of their time. You are most  
welcome to disagree and volunteer your time to coordinate the 1.4.1  
release, and a lot of people will appreciate your efforts - including  
the bioperl developers and 'core'. It shouldn't be much work  
theoretically.

	-hilmar

On Jun 25, 2006, at 3:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one
>> or two developers.  See the wiki page for details on planning  
>> releases
>> to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the
>> average CPAN module.  I believe some of the core developers are
>> meeting this weekend; maybe something will come of that and we'll get
>> an idea of a next release.
>>
>> Chris
> Hi Chris,
>    Thanks for the information--the key part being that a bug fix  
> from a
> couple of years ago has not propagated into the current stable  
> release.
> Below I'll try to convince you that this is a serious problem. (Not
> because it is your fault, of course. I'm just trying to deliver my  
> take
> on the situation to the bioperl-programmer-warriors who happen to be
> listening...)
>    It isn't a problem for me to edit the offending statement in the
> QualI.pm module on systems I generally use. Or even install a
> developer's release of bioperl. My problem is one of advocacy. Maybe I
> have a warped view of the world, but it seems that except for those
> directly involved in the bioperl or GMOD projects, everyone looks to
> CPAN when they install bioperl.
>     I write scripts that I sometimes want to send to biologists even
> less programming-capable than I am. I can just barely envision those
> biologists pestering their sysadmin to do a CPAN install of bioperl
> modules so that my script will work. But installing a non-CPAN set of
> modules probably isn't going to happen.
>     So, this being the case, how can I, with a clear conscious,  
> advocate
> bioperl to the junior bioinformaticians with whom I happen to  
> interact?
>     My take, for what it is worth, is that 1.5 has become an  
> unratified
> stable release. How hard would it be to take 1.5.1--as is--and deposit
> that in CPAN? What would be the downside?
>
> Phillip SanMiguel
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Jun 25 16:20:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 15:20:20 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <7C28EA28-031A-4B1C-9625-A643247445FD@uiuc.edu>


On Jun 25, 2006, at 2:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one or two developers.  See the wiki page for details on planning  
>> releases to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the  
>> average CPAN module.  I believe some of the core developers are  
>> meeting this weekend; maybe something will come of that and we'll  
>> get an idea of a next release.
>>
>> Chris
> Hi Chris,
>   Thanks for the information--the key part being that a bug fix  
> from a couple of years ago has not propagated into the current  
> stable release. Below I'll try to convince you that this is a  
> serious problem. (Not because it is your fault, of course. I'm just  
> trying to deliver my take on the situation to the bioperl- 
> programmer-warriors who happen to be listening...)
>   It isn't a problem for me to edit the offending statement in the  
> QualI.pm module on systems I generally use. Or even install a  
> developer's release of bioperl. My problem is one of advocacy.  
> Maybe I have a warped view of the world, but it seems that except  
> for those directly involved in the bioperl or GMOD projects,  
> everyone looks to CPAN when they install bioperl.

Again, it's not as easy as you make it seem.  The idea is to upgrade  
the CPAN version to stable releases (even numbered) and that odd- 
numbered releases would be developer versions.  Yes, it has been a  
while since the last stable version; it could be a while until the  
next as there have been suggestions of an interim 1.5.x release or so  
before that occurs (though he did say 1.6 could be soon after BOSC  
which is in August).  Hilmar has explained that there are some  
stumbling blocks to get around before the next major release (if  
those 'stumbling blocks' are what I think they are, I agree).  It's  
very likely implementation of changes that he mentions may require  
refactoring code, changing API, etc.  Not easy in a project like  
this, a large core of contributors and with the developers scattered  
all over the world, all with different priorities (we all have $jobs  
after all).

That's why we have a Release Pumpkin, akin to the Pumpkings that have  
ushered forth regular perl releases.  It requires a large,  
coordinated effort with one person acting as overseer, pushing  
everybody to meet deadlines.  Not easy and not, by a long shot, your  
typical CPAN module.

>    I write scripts that I sometimes want to send to biologists even  
> less programming-capable than I am. I can just barely envision  
> those biologists pestering their sysadmin to do a CPAN install of  
> bioperl modules so that my script will work. But installing a non- 
> CPAN set of modules probably isn't going to happen.
>    So, this being the case, how can I, with a clear conscious,  
> advocate bioperl to the junior bioinformaticians with whom I happen  
> to interact?

Give those biologists some credit. Quite frankly, I would expect any  
bioinformaticist or computational biologist, junior or otherwise, to  
know or at least learn how to install from CPAN or from CVS,  
otherwise they need to change their job title.  And, as a  
microbiologist myself (i.e. one of those biologists you mention) and  
as one who regularly interacts with biologists with little to no  
computer science experience, I believe I can speak from experience.   
I find the install documents that come with BioPerl and available on  
the wiki pretty much cover everything, from how to install to the  
dependencies required to problems one may encounter.   The web site  
has a tone of documentation, including the FAQ (*cough* which covers  
this ground *cough*).

If they are running perl scripts and using a system that requires  
sysadmin privileges they probably know what thy are doing anyway.  If  
not they probably have students/employees that do know what's going  
on (and who may be the ones actually running the scripts).  You can't  
please everybody, so I think you can proceed with a clear conscious  
knowing you did the best that you can to help!

>    My take, for what it is worth, is that 1.5 has become an  
> unratified stable release. How hard would it be to take 1.5.1--as  
> is--and deposit that in CPAN? What would be the downside?

Ah I see Hilmar has responded.  I think he adequately answers this.   
API is everything; changing API suddenly is bad bad bad.

> Phillip SanMiguel

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From akholloway at ucdavis.edu  Mon Jun 26 00:15:16 2006
From: akholloway at ucdavis.edu (Alisha Holloway)
Date: Sun, 25 Jun 2006 21:15:16 -0700
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
 package)
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <a06230932c0c50e71ad97@[10.0.1.2]>

Hi Ryan & Jason,

Sorry I didn't get back to you sooner.  I escaped the central valley 
heat (108!) and went to the coast for the weekend.  I do have a 
script that will call baseml and then parse the results.  Here it is 
and, Ryan, I can show you how to retrieve other parts of the data as 
well, but you may already know how to do this.  I know it's ugly, I 
got it working and didn't clean it up.  Just let me know if you need 
more info.

Alisha

At 11:05 PM -0400 6/24/06, Ryan Golhar wrote:
>  >>they make no assumption about coding sequence,
>>>where do you get that impression
>
>I get that information from the 1.5 api docs:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
>Its documented under the description section. 
>
>Oh well, I have it coded and working...might as well use it.
>
>Ryan
>-----Original Message-----
>From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
>Stajich
>Sent: Saturday, June 24, 2006 9:38 PM
>To: golharam at umdnj.edu
>Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
>package)
>
>
>they make no assumption about coding sequence,where do you get that 
>impression.  the ka,ks are for coding but the tamura/nei kimura, 
>jukes-cantor are all for any type of sequence.
>
>the phylip and emboss are pretty straightforward IMHO - you give it 
>an alignment and you get out a matrix of pairwise numbers....
>\
>but whatever makes sense to you - we are using the same methods as 
>are in Li's book (that is where I took the equations from).
>-j
>On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:
>
>>  Hi Jason,
>>
>>  It looks like DNAStatistics is only for coding sequences.  I'm
>>  trying to
>>  calculate the Ks of exons and the K (or Ki) of introns.  All the 
>>  methods
>>  in bioperl are based on coding sequences.  Only the  PAUP package 
>>  (that
>>  I've found) does non-coding sequences.   I would have used it but you
>>  need to pay for it and we don't have the funding to purchase much 
>>  at the
>>  moment.
>>
>>  I brielfy looked at PHYLIP and EMBOSS but it didn't look as
>>  straight-forward as I was hoping it would be.  Either that, or I was
>>  getting fustrated looking for a simple solution.
>>
>>  In the end, I found a molecular evolution book that talks about
>>  several
>>  methods used for non-coding sequences so I went ahead and implemented
>>  them.  They seem to work well.
>>
>>  Ryan
>>
>>
>>  -----Original Message-----
>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>  Jason
>>  Stajich
>>  Sent: Saturday, June 24, 2006 2:43 PM
>>  To: golharam at umdnj.edu
>>  Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from 
>>  PAML
>>  package)
>>
>>
>>  You should look at the Align::DNAStatistics module if you just want
>>  pairwise DNA distance.  I put in several different distance methods.
>>  Or you can use the distance methods implemented in PHYLIP or EMBOSS
>>  programs -- I thought you wanted the somewhat more sophisticated ML
>>  approaches that are implemented in PAML?
>>
>>  --jason
>>
>>  On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>>
>>>  I've managed to code three methods to calculate K into a perl script
>>>  using the algorithms as described in "Molecular Evolution" by Wen-
>>>  Hsuing
>>>  Li.   I'd be happy to contribute it as a script...
>>>
>>>
>>>
>>>  -----Original Message-----
>>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>>  Jason
>>>  Stajich
>>>  Sent: Saturday, June 24, 2006 9:40 AM
>>>  To: golharam at umdnj.edu
>>>  Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>>>  PAML
>>>  package)
>>>
>>>
>>>  baseml is not well-supported to my knowledge - I think I started with
>>>  attempt to capture a small amount of the data in the file.  There are
>  >> some people who have made modifications to possible parse it in-house
>>>  but I know of no submitted patches.   Many of the knowledgeable
>>>  people are probably at the evolution meetings  this week.
>>>
>>>  I have no idea about the full set of information in the report files
>>>  without going back to the Yang papers first.   It depends on how much
>>>  of that information you really want to capture of just the
>>>  substitution rates.
>>>
>>>  I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>>>  work+PAML.
>>>
>>>  -jason
>>>  On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>>
>>>>  Hi all,
>>>>
>>>>  I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>>>  baseml in the PAML package to measure the distances of some non-
>>>>  coding
>>>
>>>>  regions.
>>>>
>>>>  I started with the coding regions, and used the script
>>>>  bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>>>  something similar for non-coding regions.  However, when I call
>>>>  Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>>>  meaning matrix was never defined.
>>>>
>>>>  I wanted to find out if anyone on here has done this before or
>>>>  knows a
>>>
>>>>  way to measure substitution frequencies of non-coding regions with
>>>>  the
>>>
>>>>  PAML package.  The documentation with PAML is sparse so I'm not
>>>>  sure how
>>>>  to interpret its output directly - that's why I'm using Bioperl.
>>>>
>>>>  Hopefully someone can help me before I start digging into the
>>>>  code...Thanks.
>>>>
>>>>  Ryan
>>>>
>>>>  _______________________________________________
>>>>  Bioperl-l mailing list
>>>>  Bioperl-l at lists.open-bio.org
>>>>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  --
>>>  Jason Stajich
>>>  Duke University
>>>  http://www.duke.edu/~jes12
>>>
>>
>>  --
>>  Jason Stajich
>>  Duke University
>>  http://www.duke.edu/~jes12
>>
>
>--
>Jason Stajich
>Duke University
>http://www.duke.edu/~jes12


-- 
Alisha Holloway

Postdoctoral Fellow
Section of Evolution & Ecology
3347 Storer Hall
University of California
Davis, CA  95616

530-754-9551 Office
512-297-3958 Cell
530-752-1449 Fax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: batch_baseml_50nt.pl
Type: application/octet-stream
Size: 5395 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: baseml.ctl
Type: application/octet-stream
Size: 1699 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0005.obj>

From fernan at iib.unsam.edu.ar  Mon Jun 26 08:47:30 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 09:47:30 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
Message-ID: <20060626124730.GA53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
|
| We did not and will not deposit 1.5.1 into CPAN due to the API issues  
| in some (rather central) interfaces. These issues are changes over  
| the 1.4 API and some of those changes are going to go away. Once we  
| deposit it into CPAN we would sanction the changed API as the new  
| 'official' API and would open a huge can of backward liability worms.  
| If you just continue to use the 1.4 API on the 1.5.1 release you  
| don't need to be concerned about an API method you're using going away.
| 
| As I said, the people from the core group of developers who have  
| traditionally shepherded releases all think that doing a 1.4.1  
| release wouldn't be the best investment of their time. You are most  
| welcome to disagree and volunteer your time to coordinate the 1.4.1  
| release, and a lot of people will appreciate your efforts - including  
| the bioperl developers and 'core'. It shouldn't be much work  
| theoretically.
| 
| 	-hilmar
|
+----]

I understand that, being a volunteer project, people can
decide where to best invest their time. If core developers
are no longer using 1.4 in their production setups, it is
reasonable to expect that they invest all of their time in
1.5 or any other bioperl version that they're using.

However, when view as an issue related to the setting of a
policy for the whole project, then it makes sense to have a
policy saying for how long a stable release will be
supported, and when and in which case bugfixes that are committed
to and tested in the development branch (as it should be)
will get merged back to stable. 

I'm not knowledgeable enough about the bioperl release
engineering process, nor about the internal development
process, but just guessing I'd expect that whenever anyone
submits a bugfix, it should be the responsibility of
the committer to check (against the project policy,
(written or implicit) or with the core developers in a
difficult case) whether the fix should be committed to more
than one branch.

A patch like the one that started this thread, should have
been committed to the 1.4 branch without too much thinking.
And it would have cost the committer only a few seconds more
of her/his time. 

But you only get this by setting and enforcing a policy.

After a number of these fixes has accumulated, then making a
new release shouldn't represent too much effort, nor it
should be expected that the tests that passed before would
break now. And in the worst case (no tarball release),
people can be directed to obtain the most current 'stable'
code from the repository, containing all bugfixes. 

I guess that this is what was meant by Phillip.

Fernan


From hlapp at gmx.net  Mon Jun 26 09:59:00 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 09:59:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.

Sure. But for some reason he or she forgot. So what do you suggest we  
do - and I mean as a community, because this is a community project.  
Come after the guy  until he commits it to the branch? Or post an  
email to the list saying what you think is the right way and then do  
it (yourself)?

>
> But you only get this by setting and enforcing a policy.

Man, this is not a company. Take a step back and think again. What do  
you suggest we - again we as a community - do to enforce a policy?  
Take increasing levels of disciplinary action if someone keeps  
forgetting to commit to the branch?

While there are clearly some rules everybody needs to follow and if  
you violate them deliberately and repeatedly you will get your CVS  
privileges withdrawn, by and large we as a community need to accept  
some responsibility for making the project what we think it should be  
- and do so not by invoking disciplinary action but by living by  
example and by taking action yourself when you think action is due.

If Bioperl were a company and you asked for a 1.4.1 release and the  
customer service rep told you nope there's a 1.5.1 that you should  
use instead and that will do just fine, what will you do? Argue with  
him about the company policies and whether they are properly enforced  
or not?

Obviously doing so will be a waste of your time. In Bioperl it is at  
the bottom of it no less waste of your time, because instead you now  
have the opportunity to make happen what you believe needs to happen.  
We have had a history of rapidly and un-bureaucratically putting  
people in power of what they wanted to do. We have also had a history  
of not listening much to people who don't want to put their feet  
where their mouth is.

I'm sorry if what I'm saying puts people off, but really this is an  
open-source project and if you ask me it's one with the least  
barriers of entry for new developers or 'activists' that you can find  
in the open source arena. This doesn't come without some degree of  
anarchy, but really IMHO that's more of an advantage than a  
disadvantage.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Mon Jun 26 10:13:00 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Jun 2006 10:13:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <BDC70861-52D3-4389-9073-07F456661B14@bioperl.org>

fair enough - we can certainly merge fixes onto the branch -  I am  
not sure why that is such a big deal.

once the changes are made to the branch, If someone then wants to  
update to the latest code on  1.4 branch,  they would  to volunteer  
to do the last step of:
cvs export -r branch-1-4 -d bioperl-1-4-1 bioperl-live

then validate it, then make a tar ball, we can submit a 1.4.x to  
CPAN, but honestly a lot of other fixes have accumulated since the  
1.4 branch and I don't think we want to keep merging back to it, we'd  
rather move forward. the not-so-compatible changes that got checked  
in after the 1.4 branch (having to do with Annotateable) has been  
part of the problem as this has not been fully fixed to make things  
backwards compatable.

Nathan asked earlier on the list about how to get a list of modules  
added since 1.4 and I can only say how to generate a diff to the  
current version of the code which might be more than what he is  
asking for. read the docs on cvs diff where you specify the two tags  
you want to diff between.


We certainly have a problem of meeting the needs of several different  
user groups - developers who need latest code, and users who want  
stable releases.  We either get funding to support stable releases  
more deliberately, things that don't seem to be on the main radar  
screen of primary developers or people who are tied to working with  
older stable releases.  Since most of us who are coding and making  
changes are just working from a CVS checkout we don't have a lot of  
pressure to make a release -- and we don't want to dump newly buggy  
(or broken interfaces) into CPAN on purpose.  It also seems like many  
reported bugs have already been fixed on the latest branch but people  
are less interested in back-fixing on the old branch.


Our hope is that 1.6 would be a good replacement for 1.4 - presumably  
API consistent for the most part, but we are suffering from lack of  
time of people willing to do the work to make this happen.

I have mentioned in the past that I cannot be the release master for  
the project and it is time for someone else to step up and make this  
happen.  Chris Fields has done a phenomenal job answering questions,  
fixing bugs, and helping run the project as some of us have started  
to have too busy of a schedule to keep daily tabs on Bioperl.  But he  
too will probably have to cycle off as his career responsibilities  
(and job search) takes more time.   I don't have a good answer for  
anyone on how to make this happen more smoothly, I am hopeful that  
the gmod mtg will spur some more commits and a roadplan for releasing  
the next dev release and seeing what can happen with 1.6.  If we  
funded a Bioperl coordinator I am sure that would help things more  
and manage the different sets of priorities of the user groups.

I think a dedicated hackathon to bioperl work could get 1.6 out after  
one week of solid work with some bug squashing followup.

Barring that we'll have to see what everyone else wants to see done  
to get the next release out.  The person leading the release doesn't  
have to really program things they just need to organize people  
around a time-frame, a set of features that need to be tested and  
fixed, and commitments from people of what they will do.

Much of the release process is documented on the bioperl wiki site,  
if this is not clear enough please make a note on the page/talk page  
and we can start .  My hope is that the wiki can be a good repository  
of the thought process behind the project.  right now too much of it  
is floating in the minds of former and current project coordinators.

...just some of my thoughts as I get ready to be off-line starting  
next week for 4 weeks...

-jason


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> +----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
> |
> | We did not and will not deposit 1.5.1 into CPAN due to the API  
> issues
> | in some (rather central) interfaces. These issues are changes over
> | the 1.4 API and some of those changes are going to go away. Once we
> | deposit it into CPAN we would sanction the changed API as the new
> | 'official' API and would open a huge can of backward liability  
> worms.
> | If you just continue to use the 1.4 API on the 1.5.1 release you
> | don't need to be concerned about an API method you're using going  
> away.
> |
> | As I said, the people from the core group of developers who have
> | traditionally shepherded releases all think that doing a 1.4.1
> | release wouldn't be the best investment of their time. You are most
> | welcome to disagree and volunteer your time to coordinate the 1.4.1
> | release, and a lot of people will appreciate your efforts -  
> including
> | the bioperl developers and 'core'. It shouldn't be much work
> | theoretically.
> |
> | 	-hilmar
> |
> +----]
>
> I understand that, being a volunteer project, people can
> decide where to best invest their time. If core developers
> are no longer using 1.4 in their production setups, it is
> reasonable to expect that they invest all of their time in
> 1.5 or any other bioperl version that they're using.
>
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.
>
> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
>
> But you only get this by setting and enforcing a policy.
>
> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.
>
> I guess that this is what was meant by Phillip.
>
> Fernan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Mon Jun 26 10:44:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 15:44:55 +0100
Subject: [Bioperl-l] Tests
Message-ID: <449FF2E7.3040101@sendu.me.uk>

What level of testing is expected to be done in a test file? Is there 
such a thing as too many tests? Tests for every possible (documented) 
way of achieving a result with a module's method? Tests for every 
conceivable way of misusing a method?

If I come across a test for a module that doesn't test for everything 
the module can do, should I add tests as a matter of course? Would this 
be beneficial, or a waste of time (given that the module probably is 
bug-free already)?


From cjfields at uiuc.edu  Mon Jun 26 11:24:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:24:00 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <001301c69934$909f83c0$15327e82@pyrimidine>

...
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.

In a project this large which relies on a lot of outside resources
maintaining API and availability at all times, having a completely bug-free
fix for any reasonable length of time is impossible.  As a small example,
almost every time NCBI changes BLAST output, it breaks our text parsers, and
though we recommend using the BLAST XML format parser (which is much more
stable), almost everybody continues using text parsing and wants that fixed.
Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
notification, so remote BLAST parsing can break at any time.  Fold into that
any software changes that change output or API (PAML comes to mind).  Fold
into that remote database changes (EBI interface to Swissprot).  Oh, let's
not forget sequence format changes (recent SwissProt and GenBank changes).
And, worst of all, we can't expect them to maintain API or output b/c
they're updating based on user input/suggestions or bug fixes which require
them to make changes.  What's 'stable' about that?

It's very easy to say you want something and then not volunteer to do it; if
you want something then put forth the time and effort to get it done.  Put
your money where your mouth is (as they say in my home state).

Again (for the third or fourth time now), putting together a release takes
some time and effort.  I actually think it takes more effort than Hilmar
suggests; either way, it requires someone to act as the leader (release
pumpkin) to handle changes, and I don't see anybody stepping forward.
Personally, if I have the time, maybe I'll handle an interim release, but
I'm looking for a job starting in the fall as well as finishing up research
for publication so that will take up almost all the time I have.  As Hilmar
says, if you want to do it, fine.  Realize, though, many many changes have
been made since 1.4 and many more will likely be made on the road to 1.6

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.

This is a large open-source project with a ton of developers all over the
world.  Check out the AUTHORS file; it's at best incomplete and still has
about 100 contributors.  

(Hey, my name's not on there!!!)

> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
> 
> But you only get this by setting and enforcing a policy.

You need to realize what this project is, what it is not, and how it
evolved.  A little history lesson might get you (and others) to understand
just how complex it all is (and how old some of the code is).

http://www.bioperl.org/wiki/FAQ#Can_you_explain_the_Object_Model_design_and_
rationale.3F

explains a bit on the project design.

http://www.bioperl.org/wiki/History_of_BioPerl

explains how BioPerl came to be.  

This is not a job or a company but an open-source project; it's origins are
based in the scientific community.  You're probably right about the person
not committing the change to the 1.4 branch.  We probably should have a
policy for commits to stable releases.  But how can we logically rationalize
doing so now for 1.4, almost three years hence?  We're post 1.5.1 and likely
going into 1.6 as we speak.  It's too late for 1.4 changes IMHO, frankly,
but you're welcome to try.  I don't think it's worth the effort.

As for policy enforcement, what would you want us to do?  This is a
volunteer effort.  Fire him/her?  Frankly they should be commended for
getting the fix committed in the first place, and if someone points out that
it should be committed to the 1.4 branch then fine; it shouldn't be hard to
do so even long after the commit to the main branch is made.  It just
requires someone to do so.

Again, this is NOT your typical CPAN module with one or two developers or a
project that relies on doing one thing very well.  This project has over 100
developers and is supposed to do everything adequately (and many things very
well). 

> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.

You can download a tarball from the latest CVS code at any time.  There is a
link for doing just that at the bottom of the anonymous CVS page:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/

Chris


From hlapp at gmx.net  Mon Jun 26 11:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 11:30:05 -0400
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
References: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>


On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:

> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests?

No, not really.

> Tests for every possible (documented)
> way of achieving a result with a module's method?

Ideally that's the minimum.

> Tests for every conceivable way of misusing a method?

If some or known already (from reports) or you think can be  
anticipated, yes. Generally, if a method documents what are invalid  
values for its input it's a good idea to test what the method does if  
supplied with such values. The one thing it shouldn't do is silently  
ignore them, or produce a result anyway (which presumably would be a  
wrong result by definition).

>
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would  
> this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?

It would certainly be beneficial. It'd be great if you were willing  
to volunteer for this.

Note that a module being bug free now doesn't mean it always will be.  
The main point of tests is not only to weed out bugs at the time it  
is written, but also to make sure that future changes to the module  
itself, or to other modules it interacts with or inherits from, don't  
break it.

	-hilmar

>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Mon Jun 26 11:39:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:39:25 +0100
Subject: [Bioperl-l] Tests
In-Reply-To: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
References: <449FF2E7.3040101@sendu.me.uk>
	<AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
Message-ID: <449FFFAD.40506@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:
>
>> If I come across a test for a module that doesn't test for everything
>> the module can do, should I add tests as a matter of course? Would this
>> be beneficial, or a waste of time (given that the module probably is
>> bug-free already)?
> 
> It would certainly be beneficial. It'd be great if you were willing to 
> volunteer for this.

I doubt I have time to do this on the global scale[*], but certainly I 
will for the modules I work on.


Cheers,
Sendu.

* Though... it would certainly be a good way of getting to know all of 
Bioperl intimately!


From bix at sendu.me.uk  Mon Jun 26 11:42:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:42:33 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00069.6010107@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).
> 
> The general tidy up will involve redoing and perhaps even removing 
> things.

Does anyone know what the intent behind the split Bio::Map::MappableI 
and Bio::Map::MarkerI was? I somehow get the impression these started as 
one interface but then became two. The split /seems/ to be MappableI as 
a map element with one position on one map, whilst MarkerI is a map 
element with multiple positions on multiple maps. But MarkerI has no 
synopsis or description, and MappableI says it does what MarkerI does 
(but doesn't). So I'm left guessing atm.

Do we want to keep the split? If yes, what exactly should be the 
difference between the two? If no, would it be ok to just get rid of 
MarkerI (folding it back into MappableI)?


From cjfields at uiuc.edu  Mon Jun 26 11:45:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:45:51 -0500
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <001a01c69937$9a1c1320$15327e82@pyrimidine>

My opinion: tests should cover methods and expected results and are based on
what the module actually accomplishes.  Some classes (like SeqIO, SearchIO)
are normally relatively easy to build tests for b/c the expected results are
in the file being parsed.  Tests which check calculated results from modules
(Bio::Align::DNAStatictics for instance) I would think are trickier since
you should confirm the calculations are correct through independent means.

Links:

http://www.bioperl.org/wiki/Advanced_BioPerl#Designing_Good_Tests

http://search.cpan.org/~mschwern/Test-Simple-0.62/lib/Test/Tutorial.pod

The link above uses Test::Simple or Test::More; we use Test (but have
considered moving to Test::More using Devel::Cover).

My 2c

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 26, 2006 9:45 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Tests
> 
> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests? Tests for every possible (documented)
> way of achieving a result with a module's method? Tests for every
> conceivable way of misusing a method?
> 
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Mon Jun 26 12:15:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 17:15:32 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00824.20002@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).

To do this I actually need to make some slightly more significant API 
changes than I had hoped. To make Position central, all maps, mappables 
and markers need to be able to add and remove Positions (and similar 
things). As I see it, we can say that such methods are fundamental to 
the coordination required between Bio::Map modules. I feel that I'm 
therefore justified in implementing these kinds of methods in the 
interfaces (which would allow all the downstream modules that implement 
those interfaces to work in the new system without much/any alteration).

Am I justified? Should I try harder to do it without implementations in 
the interfaces?


From pmiguel at purdue.edu  Mon Jun 26 12:53:56 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Mon, 26 Jun 2006 12:53:56 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <001301c69934$909f83c0$15327e82@pyrimidine>
References: <001301c69934$909f83c0$15327e82@pyrimidine>
Message-ID: <44A01124.5040102@purdue.edu>

Chris Fields wrote:
> ...
>   
>> However, when view as an issue related to the setting of a
>> policy for the whole project, then it makes sense to have a
>> policy saying for how long a stable release will be
>> supported, and when and in which case bugfixes that are committed
>> to and tested in the development branch (as it should be)
>> will get merged back to stable.
>>     
>
> In a project this large which relies on a lot of outside resources
> maintaining API and availability at all times, having a completely bug-free
> fix for any reasonable length of time is impossible.  As a small example,
> almost every time NCBI changes BLAST output, it breaks our text parsers, and
> though we recommend using the BLAST XML format parser (which is much more
> stable), almost everybody continues using text parsing and wants that fixed.
> Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
> notification, so remote BLAST parsing can break at any time.  Fold into that
> any software changes that change output or API (PAML comes to mind).  Fold
> into that remote database changes (EBI interface to Swissprot).  Oh, let's
> not forget sequence format changes (recent SwissProt and GenBank changes).
> And, worst of all, we can't expect them to maintain API or output b/c
> they're updating based on user input/suggestions or bug fixes which require
> them to make changes.  What's 'stable' about that?
>
> It's very easy to say you want something and then not volunteer to do it; if
> you want something then put forth the time and effort to get it done.  Put
> your money where your mouth is (as they say in my home state).
>
> Again (for the third or fourth time now), putting together a release takes
> some time and effort.  I actually think it takes more effort than Hilmar
> suggests; either way, it requires someone to act as the leader (release
> pumpkin) to handle changes, and I don't see anybody stepping forward.
> Personally, if I have the time, maybe I'll handle an interim release, but
> I'm looking for a job starting in the fall as well as finishing up research
> for publication so that will take up almost all the time I have.  As Hilmar
> says, if you want to do it, fine.  Realize, though, many many changes have
> been made since 1.4 and many more will likely be made on the road to 1.6
>
>   
Hi Chris et al.,

    I was just reporting the situation from where I sit. I think this 
issue was important enough to bring to everyones attention. I've done so 
and I'm more than satisfied with the response. I hope my emails were not 
too abrasive.
    I've have now read the wiki about coordinating a release. You are 
right, that does sound hard. At least to me--I've never even used CVS, 
nor contributed a module to CPAN. I just don't see myself as being 
qualified to coordinate a 1.4.1 release. So since I'm not, for that 
reason, able to volunteer to do it myself, I'll withdraw my request for 
a new release to CPAN.
    That being said, I think Fernan's suggestion bears keeping in mind 
once 1.6 has been released and bug fixes are being committed. By that 
time, I hope I'll be savvy enough to help out in the process.
    Thanks for your attention,

Phillip SanMiguel
   

From fernan at iib.unsam.edu.ar  Mon Jun 26 15:24:51 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 16:24:51 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
	<ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
Message-ID: <20060626192451.GB53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 11:01):
| 
| On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:
| 
| >I'm not knowledgeable enough about the bioperl release
| >engineering process, nor about the internal development
| >process, but just guessing I'd expect that whenever anyone
| >submits a bugfix, it should be the responsibility of
| >the committer to check (against the project policy,
| >(written or implicit) or with the core developers in a
| >difficult case) whether the fix should be committed to more
| >than one branch.
| >
| >A patch like the one that started this thread, should have
| >been committed to the 1.4 branch without too much thinking.
| >And it would have cost the committer only a few seconds more
| >of her/his time.
| 
| Sure. But for some reason he or she forgot. So what do you suggest we  
| do - and I mean as a community, because this is a community project.  
| Come after the guy until he commits it to the branch? 

No, I never said or implied that.

| Or post an email to the list saying what you think is the
| right way and then do  it (yourself)?

Of course I could volunteer some of my time to
do that (that is, go over the commit history and see what
changes could be merged back to 1.4, if that seems to be
useful), provided I get a polite reply to my 'email
to the list saying what [I] think is the right way'.

I'm a volunteer in other open source, community projects,
and I do contribute regularly so I see no problem except the
obvious scarcity of free time in doing the same for bioperl.

| >But you only get this by setting and enforcing a policy.
| 
| Man, this is not a company. Take a step back and think again. What do  
| you suggest we - again we as a community - do to enforce a policy?  
| Take increasing levels of disciplinary action if someone keeps  
| forgetting to commit to the branch?

Seems like you were pissed off by what I said ...

What I was just trying to say is that merely by formulating
and communicating a policy you could be taking steps towards
making it a reality. Maybe 'enforcing' was an unfortunate
word to use here ... 

You don't have to punish anyone, just sending a polite email
to the list reminding people about the policy once in a
while, should be enough. It's OK if some committer doesn't
care, or just forgets about doing the right thing once in a
while ...

But of course, you might be pissed off by me talking about
something that I know nothing about (the devleopment of
bioperl), given that I'm just a bioperl user.

Perhaps my mistake was to bring here ideas from
other projects (in which I do contribute regularly) without
realizing that, not being a contributor, I could be
punished for suggesting how things could be done better.

| While there are clearly some rules everybody needs to follow and if  
| you violate them deliberately and repeatedly you will get your CVS  
| privileges withdrawn, by and large we as a community need to accept  
| some responsibility for making the project what we think it should be  
| - and do so not by invoking disciplinary action but by living by  
| example and by taking action yourself when you think action is due.

I completely agree. When I said 'setting a policy' I just
meant something along the lines of clearly stating what are
those 'rules everybody needs to follow'. My suggestion was
to add a 'merge trivial fixes back to stable' rule to that
list.

I agree with Jason: why is that such a big deal. 

| If Bioperl were a company and you asked for a 1.4.1 release and the  
| customer service rep told you nope there's a 1.5.1 that you should  
| use instead and that will do just fine, what will you do? Argue with  
| him about the company policies and whether they are properly enforced  
| or not?
| 
| Obviously doing so will be a waste of your time. In Bioperl it is at  
| the bottom of it no less waste of your time, because instead you now  
| have the opportunity to make happen what you believe needs to happen.

Right, but first i have to realize what needs to happen. I
realized it when I read your reply to Philips message.

I then proceeded to write my thoughts and send them to the
list, to see what kind of feedback I get. 

Hopefully, someone with commit privileges would think that
what I said makes sense and just proceed to doing it (saving
me from the task :)

Or perhaps, someone, as Jason did, would say that it's
not worth to try to merge back things to 1.4 and move
forward instead. In his message he even explained what the
problems and needs are (lack of man-time, need for
volunteers) and politely asked for help.

| We have had a history of rapidly and un-bureaucratically putting  
| people in power of what they wanted to do. We have also had a history  
| of not listening much to people who don't want to put their feet  
| where their mouth is.

I would call your reply (this message) a barrier of entry
for new developers. In the above paragraph I guess you are
referring to the bioperl motto: 'whoever codes it wins'.
That is true in any open source project. But at least to me,
that doesn't say that you should not listen to people just
because they haven't contributed a single line of code.

| I'm sorry if what I'm saying puts people off, but really this is an  
| open-source project and if you ask me it's one with the least  
| barriers of entry for new developers or 'activists' that you can find  
| in the open source arena. 


Let me disagree. The barriers of entry are not just the
giving away of a developer accounts and/or repository write
privileges. 

I'm a regular contributor in another open source, community
project (FreeBSD) that has more and higher barriers of entry
with respect to giving away privileges (for example for
committing changes to the repository). Nonetheless FreeBSD
has historically shown to have few and low barriers of entry
for incorporating people to the project (without the need to
give away commit privileges, making them responsible for
parts of the FreeBSD source code/documentation/ports/etc).

IMO, that comes from a very good communication of the
direction of the project, what needs to be done, how to do
it, and a tendency of privileged and older members to listen
to people's suggestions, inviting and helping people
to jump the fence and become part of the project. It's not
an untought occurrence that FreeBSD has ?mentors? that
introduce new members, help them to get acquainted with how
the project works, policies, etc. and supervise their
actions.

| This doesn't come without some degree of  
| anarchy, but really IMHO that's more of an advantage than a  
| disadvantage.
| 	-hilmar
|
+----]

Fernan

PS: finally, let me just add that english is not my native
language. Although I'm quite familiar with it, once in a
while, an unfortunate choice of words might blur my intented
meaning or the strength I wanted to convey. In case that has
been the case, let me put clearly that it has not been my
intention to criticize the way the project does things, but
to suggest ideas for the future (merge back trivial changes
to a 'stable' branch as a policy) based on my experience
with other projects. Whether that fits bioperl or not was
what I would have expected as a reply.


From cjfields at uiuc.edu  Mon Jun 26 16:18:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 15:18:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626192451.GB53298@iib.unsam.edu.ar>
Message-ID: <002701c6995d$b738f790$15327e82@pyrimidine>


> | >A patch like the one that started this thread, should have
> | >been committed to the 1.4 branch without too much thinking.
> | >And it would have cost the committer only a few seconds more
> | >of her/his time.
> |
> | Sure. But for some reason he or she forgot. So what do you suggest we
> | do - and I mean as a community, because this is a community project.
> | Come after the guy until he commits it to the branch?
> 
> No, I never said or implied that.

Right, you didn't say that.  But you didn't clarify your statements either.
I think you're treading into dangerous waters when you come in and criticize
something w/o bothering to read up on how things have been done here.  As
you say yourself below, it's 'something that I know nothing about (the
devleopment of bioperl), given that I'm just a bioperl user'.  It's akin to
"I don't think you're coding things correctly, here's the right way to do
it" w/o knowing what the code is used for.

> | Or post an email to the list saying what you think is the
> | right way and then do  it (yourself)?
> 
> Of course I could volunteer some of my time to
> do that (that is, go over the commit history and see what
> changes could be merged back to 1.4, if that seems to be
> useful), provided I get a polite reply to my 'email
> to the list saying what [I] think is the right way'.

You will get a polite email when you respond politely.  I actually agree
with many things you say, but you sure aren't making any friends here by the
way you consistently take the opposite stance and judge what other people
do.  I think you have a point about having a stable release be supported for
a period of time.  My point is, how long?  We didn't really get an idea of
that from you, did we?

> I'm a volunteer in other open source, community projects,
> and I do contribute regularly so I see no problem except the
> obvious scarcity of free time in doing the same for bioperl.

And others here also volunteer elsewhere (GMOD, DAS, Ensembl, etc).  Don't
presume we don't have experience in open-source.  That's being pretty
judgmental.  

> | >But you only get this by setting and enforcing a policy.
> |
> | Man, this is not a company. Take a step back and think again. What do
> | you suggest we - again we as a community - do to enforce a policy?
> | Take increasing levels of disciplinary action if someone keeps
> | forgetting to commit to the branch?
> 
> Seems like you were pissed off by what I said ...

????Ya think????  

You know, okay, forget it.  This is completely non-productive.  We'll all
agree to disagree, argue, whatever.  The points made here, as I see them:

1)  Commits should be made to stable releases (as well as to the main branch
in CVS) to fix bugs as long as that release is supported.  I agree with
this, but someone has to volunteer, and the length of time a release is
supported also worked out.  Almost would be better going to a regular
release schedule (once every 3-6 months or so) where the code is given as is
to CPAN, whether it passes tests or not.

2)  More communication about the direction Bioperl is heading; personally I
haven't see a problem with this as much as there is no information about a
roadmap.  That is being alleviated soon I believe, thought people out there
need to be patient.

3)  Volunteer.  If you have something you believe needs to be done and you
believe so fervently, then put up or shut up.  Make (nice polite)
suggestions otherwise.  Don't judge code or "the way things are done" and
don't presume what kind of experience people have that you don't know and
haven't met.  End of story.

Chris


From torsten.seemann at infotech.monash.edu.au  Mon Jun 26 22:57:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 27 Jun 2006 12:57:47 +1000
Subject: [Bioperl-l] Comments on new PDOC documentation
Message-ID: <44A09EAB.2030401@infotech.monash.edu.au>

Hello all,

I am very happy to see the PDOC software has been improved, as I use the 
  online web documentation frequently. Thanks to Jason, Raphael and 
Patrick for making this happen.

http://doc.bioperl.org/bioperl-live/

Now for some comments...

1. CSS

It uses CSS which is excellent, reducing HTML size and allowing easy 
tweaks to the design. However its current implementation has some issues:

A. it seems to only use ID, rather than CLASS, to specify styles.
    ID values must be unique in a page, and are for one-off styles.
    CLASS may be re-used throught a page. eg "sub" and "subArea".
    Many browsers do not enforce this however...

B. it seems to be doing unusual, but possibly deliberate, things with
    the POD when determining what CSS ID to give it, but perhaps this is
    more to do with how Bioperl formats the POD on some subheadings
    eg.
    <a name="_pod_Reporting Bugs" id="_pod_Reporting Bugs">
    <a name="_pod_AUTHOR - Ewan Birney" id="_pod_AUTHOR - Ewan Birney">

C. the "Description" sections etc are in a proportional font, but
    I think it should be "font-family: monospace" as many authors have
    exploited the traditional monospace of most editors to format
    their comments, which are now lost

2. FRAMES

I notice it still uses HTML Frames. Although this reduces code size 
also, it makes it impossible to LINK directly to a specific 
documentation page with all the frames intact. It may be better to use 3 
DIV elements which are part of each page, and they could be server-side 
included so there is no HTML duplication.

3. MERGING OF BIOPERL DOCS

One facet of the docs I find frustrating is that bioperl-live and 
bioperl-run (and the others) are separate! This means that you have to 
keep switching between them, and more importantly, class-names to 
classes in other packages are not present; this is particularly bad when 
browsing bioperl-run.

Is there any chance of creating a "merged" bioperl-doc page somehow?

4. STYLE

Choice of colours and layouts is such a personal thing.
I guess people can download http://doc.bioperl.org/css/perl.css
and re-edit it, and get their Browser to over-ride the supplied CSS with 
  their version.

5. CONCLUSION

Please don't get the wrong idea, I love the new PDOC, I would just like 
to love it more. And yes I understand the nightmare that is parsing 
Perl/POD and generating compatible CSS :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From bix at sendu.me.uk  Tue Jun 27 06:21:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 11:21:57 +0100
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <44A106C5.9040706@sendu.me.uk>

Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
Is the idea of a Bio::Score of interest? See bug, but basically an 
object that can handle multiple kinds of scores effectively.

I would like to use such a thing in Bioperl, but what standard needs to 
be met before Bioperl gets a new kind of object?


From hlapp at gmx.net  Tue Jun 27 08:24:16 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 08:24:16 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A106C5.9040706@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
Message-ID: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>

So you basically want to attach semantic information to a number, and  
type the number thereby?

If so, an ontology would be the more natural choice (and in the end  
more flexible one) for expressing this kind of information.

Have you looked at the concept of 'quantitation types', e.g. in MAGE  
(the XML [MGAE-ML] or the object model [MAGE-OM])?

There is no quantitation type ontology at a repository I know of. I  
have used my own ones in the past and they have been pretty useful.

	-hilmar

On Jun 27, 2006, at 6:21 AM, Sendu Bala wrote:

> Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
> Is the idea of a Bio::Score of interest? See bug, but basically an
> object that can handle multiple kinds of scores effectively.
>
> I would like to use such a thing in Bioperl, but what standard  
> needs to
> be met before Bioperl gets a new kind of object?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 08:52:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 13:52:05 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A129F5.3030500@sendu.me.uk>

Hilmar Lapp wrote:
> So you basically want to attach semantic information to a number, and 
> type the number thereby?

Basically, I want to be able to stick a bunch of (different kinds of) 
numbers into an object, and later get the 'best' one out (of a 
particular kind), or sort multiple of those objects.


> If so, an ontology would be the more natural choice (and in the end more 
> flexible one) for expressing this kind of information.

I'm not really sure I understand 'and type the number', or what (useful) 
flexibility doing it with an ontology would provide.


> Have you looked at the concept of 'quantitation types', e.g. in MAGE 
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

I had a quick look, but not really sure what you intended to suggest here.


> There is no quantitation type ontology at a repository I know of. I have 
> used my own ones in the past and they have been pretty useful.

Can you provide a brief example of what you mean?

If it would be appropriate to implement a Bio::Score with an ontology 
that's fine. Would we want a Bio::Score implemented though? Or are you 
suggesting each module make it's own quantitation type ontology when it 
wants to deal with numerous scores?

I like the idea of a Bio::Score because then you can compare complex 
scores from multiple different unrelated modules.


From cjfields at uiuc.edu  Tue Jun 27 10:08:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 09:08:57 -0500
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
Message-ID: <001e01c699f3$3b6cda50$15327e82@pyrimidine>

> Hilmar Lapp wrote:
> > So you basically want to attach semantic information to a number, and
> > type the number thereby?
> 
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.

The 'best one' might be tricky when dealing with different kinds of scores,
esp. scores calculated different ways.  For instance, I run RNA motif
programs quite frequently (RNAMotif, ERPIN, Infernal), but all generate
'scores' based on different criteria (algorithms, different parameters, how
the author slept, and so on).  RNAMotif in particular is hard to deal with
(though a great program) b/c the scores are based on criteria in the
descriptor file (the file used to describe the motif), so aren't comparable
to other descriptors, which may have their own method of generating scores,
let alone output from other programs.  Which one would be 'the best?'  It's
a bit subjective since the scores are predictive based upon your input,
various program limitations, specific program parameter implementations,
etc.  

I do like the idea of grouping together scores for comparison, such as when
a particular region of DNA has multiple hits from different programs with
different scores.  It would at least suffice as a test on how various
programs or experimental data would compare with one another.

> > If so, an ontology would be the more natural choice (and in the end more
> > flexible one) for expressing this kind of information.
> 
> I'm not really sure I understand 'and type the number', or what (useful)
> flexibility doing it with an ontology would provide.

I'm not sure, but maybe something along the lines of what the number (the
score) actually means, especially when compared to other scores.  In other
words, how you could compare one score or number versus the other.  An
ontology would allow more complex information to be included along with the
score information so one could make more informed choices based on how the
score was obtained, the algorithm used, the program involved, etc.  Hence
flexible.  Is that close, Hilmar?

To use my RNA program example above, I could include the information about
how the scores were obtained, the programs involved, parameters used, the
various raw scores, the time it took to run the program, etc. (i.e. you
could make it as specific as you wanted).  This could also be extended to
other data types as well besides program, such as wet bench experimental
data and so on, which I deal with quite a bit.  I think there are a few XML
specs out there besides MAGE that do this as well but I can't think of any
off the top of my head.

> > Have you looked at the concept of 'quantitation types', e.g. in MAGE
> > (the XML [MGAE-ML] or the object model [MAGE-OM])?
> 
> I had a quick look, but not really sure what you intended to suggest here.

I think the idea is that MAGE, strictly as an example, deals with microarray
data from different sources or different data systems for comparison.
Sounds a little like what you want to do.

> > There is no quantitation type ontology at a repository I know of. I have
> > used my own ones in the past and they have been pretty useful.
> 
> Can you provide a brief example of what you mean?
> 
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology when it
> wants to deal with numerous scores?
> 
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.

Which is what MAGE does in a way, but more specifically, i.e. just
microarray data from different sources.  So the array data may be calculated
in different ways based upon the specs for different machines, the way array
slides were prepared, how the experimenter slept, etc.

Chris


From hlapp at gmx.net  Tue Jun 27 10:27:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 10:27:55 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
Message-ID: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>

I would have suggested initiating a quantitation type ontology, not  
one individual per module.

An ontology would capture all your semantic information (min/max or  
range, higher or lower is better, what is a reasonable default [not  
sure there would be one], etc) and you would have a hierarchical  
structure.

You type a score by associating it with an ontology term:

	BLAST_e-value is-a expectation_value
	expectation_value has-min-value 0
	expectation_value has-max-value positive_infinity
	BLAST_p-value is-a probability_value
	probability_value has-min-value 0
	probability_value has-max-value 1
	
etc and then something being an expectation_value for instance would  
imply several attributes laid down in the ontology (probably through  
has-a statements).

It seems to me that essentially what you are trying to do is  
capturing knowledge for particular types of scores, which you would  
then use in more general purpose programs to sort from more to less  
significant, and possibly filter? If so, then hard-coding this into  
objects (all over the place or in a single place) is typically not  
the best practice; rather, the usual best-practice approach is using  
(and if necessary, constructing) an ontology. This is also the most  
re-usable approach.

	-hilmar

On Jun 27, 2006, at 8:52 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> So you basically want to attach semantic information to a number, and
>> type the number thereby?
>
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.
>
>
>> If so, an ontology would be the more natural choice (and in the  
>> end more
>> flexible one) for expressing this kind of information.
>
> I'm not really sure I understand 'and type the number', or what  
> (useful)
> flexibility doing it with an ontology would provide.
>
>
>> Have you looked at the concept of 'quantitation types', e.g. in MAGE
>> (the XML [MGAE-ML] or the object model [MAGE-OM])?
>
> I had a quick look, but not really sure what you intended to  
> suggest here.
>
>
>> There is no quantitation type ontology at a repository I know of.  
>> I have
>> used my own ones in the past and they have been pretty useful.
>
> Can you provide a brief example of what you mean?
>
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology  
> when it
> wants to deal with numerous scores?
>
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 11:25:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 16:25:06 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
Message-ID: <44A14DD2.7000402@sendu.me.uk>

Hilmar Lapp wrote:
> I would have suggested initiating a quantitation type ontology, not one 
> individual per module.

Where would such a thing 'live'? Would it be some static file somewhere 
that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology 
that can added to by a module when it needs extra terms to describe its 
particular kind of scores?


> An ontology would capture all your semantic information [snip]

Thanks, I agree that an ontology would be the way to do it...


> It seems to me that essentially what you are trying to do is capturing 
> knowledge for particular types of scores, which you would then use in 
> more general purpose programs to sort from more to less significant, and 
> possibly filter?

Yes.


> If so, then hard-coding this into objects (all over the 
> place or in a single place) is typically not the best practice; rather, 
> the usual best-practice approach is using (and if necessary, 
> constructing) an ontology. This is also the most re-usable approach.

Not having any experience with ontolgies, I can't think how this would 
all be done in practice though. Don't we need some central module 
(Bio::Score) to create the ontology (or read it in) and then present 
some suitable interface to it? For example, modules that wanted to store 
some scores might just ask Bio::Score for the ontology and type their 
scores by associating with an available ontology term, creating new 
terms if necessary (or is that something you would never do; the 
ontology needed to have been set up to cover all possible terms?). Then 
when the user has a bunch of these typed scores, surely he doesn't want 
to deal with going through the ontology himself to work out what it all 
means? Well, he could if he needs that level of control, but also he 
just wants to say Bio::Score->sort(x y z) or something.


From bix at sendu.me.uk  Tue Jun 27 12:13:46 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 17:13:46 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
Message-ID: <44A1593A.809@sendu.me.uk>

Cook, Malcolm wrote:
>
> All this semantic cruft is overkill for a moving target and will never
> settle down until your analysis results are no longer relevant.

I'm not sure what you mean by that. What moves? An evalue will always be 
an evalue. Once you know that you are in fact dealing with an evalue, 
and once your sorting algorithm knows that lower evalues are better, 
nothing changes. Likewise for other kinds of scores.

Instead of having to discover that a particular program is giving you an 
evalue, and then writing code to deal with an evalue appropriately, I 
thought it would be nicer to have a single module that knew how to deal 
with it already.


From MEC at stowers-institute.org  Tue Jun 27 12:01:45 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 27 Jun 2006 11:01:45 -0500
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>

For the use case of TFBS analysis demonstrated in the attachment to the
bug, I would expect to find potentially three scores, ala, {evalue,
bitscore, and percentmatch}.  To deal with this in existing framework
(i.e. GFF/bioperl analysis modules/TFBS), I would try to make GFFx eat
scalars as scores and pack the three values into a string and unpack
them as needed for sorting, etc.  Else put the one score I know I'm
going to 'use' in a particular analysis into 'score' and adorn column 9
with the rest.

All this semantic cruft is overkill for a moving target and will never
settle down until your analysis results are no longer relevant.

my $.02

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>Sent: Tuesday, June 27, 2006 5:22 AM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::Score of interest?
>
>Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
>Is the idea of a Bio::Score of interest? See bug, but basically an 
>object that can handle multiple kinds of scores effectively.
>
>I would like to use such a thing in Bioperl, but what standard 
>needs to 
>be met before Bioperl gets a new kind of object?
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bix at sendu.me.uk  Tue Jun 27 14:07:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 19:07:44 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
References: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
Message-ID: <44A173F0.4040302@sendu.me.uk>

Chris Fields wrote:
>> Hilmar Lapp wrote:
>>> So you basically want to attach semantic information to a number, and
>>> type the number thereby?
>> Basically, I want to be able to stick a bunch of (different kinds of)
>> numbers into an object, and later get the 'best' one out (of a
>> particular kind), or sort multiple of those objects.
> 
> The 'best one' might be tricky when dealing with different kinds of scores,
> esp. scores calculated different ways.

I didn't make myself very clear, but you don't compare different kinds 
of scores. When you want to compare two different Score objects, each of 
which may contain multiple different kinds of scores, you pick the kind 
of score you're interested in, and for that kind of score ask which 
object has the 'best' score. I can't readily think of any exceptions to 
the rule that 'best' is either the higher score or the lower score, 
depending on what kind of score you've chosen.

I may not have made myself clear in another way. One of the ideas behind 
a Bio::Score is to have a container object for multiple different kinds 
of scores (and even multiple values per kind) all generated by one 
program in one analysis on one data set.
The container then lets you pick the kind of score you want to work with 
and compare its scores with those in other Bio::Score objects that 
contain the same kind of score (most probably, ones made by the same 
analysis program but on different data sets).

Furthermore, the kind of score you want to work with could have multiple 
values from that single analysis. So the container also lets you 
summarise these values (eg. average them) before trying to compare with 
another Score object. Often, it may be that for a certain kind of score 
it makes sense (it is intended by the score-generating program) to 
always summarise the values in a certain way. So the container needs to 
know about that and 'do the right thing' so the user can just compare 
things without having to trouble himself.

So this is why I feel that to just 'use an ontology' isn't enough. 
Certainly one ought to be used when defining the kinds, but you need 
some single interface with useful methods that lets you deal with the 
actual score values easily.


From cjfields at uiuc.edu  Tue Jun 27 14:56:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 13:56:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060627181439.GD51742@iib.unsam.edu.ar>
Message-ID: <000a01c69a1b$6d0338c0$15327e82@pyrimidine>

> | 1)  Commits should be made to stable releases (as well as to the main
> branch
> | in CVS) to fix bugs as long as that release is supported.  I agree with
> | this, but someone has to volunteer, and the length of time a release is
> | supported also worked out.
> 
> I volunteer to do that (merge approved changes/fixes back to
> a stable branch), though as said by others, 1.4 may not be
> the most appropriate 'stable' branch, as too many changes
> have accumulated, and maybe it's not worth it. But I could
> do that for the next 'stable' release, 1.6 or 2.0 whichever
> comes next.
> 
> As per the length of time, I would say that a stable release
> should be supported at least until another 'stable' release
> is made. Or until it's no longer being used in production
> setups, which is only feasible to know in small
> communities.

I'm posting this to the mail list so that others can respond.

Kevin Brown (in a response to me) made some good points about updating and
maintaining stable releases in that only bug fixes are committed (i.e. no
refactoring, no new modules or features).  I personally wouldn't have a
problem in someone doing this, releasing periodic updates to stable or
developer releases to fix bugs only but I may be in the minority here.  The
rest of the core guys and others need to also speak their thoughts.  I hate
forwarding this to Jason since he's in the middle of getting ready for a
move but I think this is important enough to do so.

I can say that I am unequivocally against updating 1.4.  Too much has
changed since then and I think it would be a mess trying to figure out what
bug fixes to include, etc.  

I also am very much against placing developer's releases in CPAN; those
releases are not intended to be completely stable as they may be
implementing new features that haven't been tested completely and may
contain various other bugs.  v 1.5.1 is remarkably stable for a developer's
release but several bug fixes have been made since.  If someone wants to try
out the developer's versions or bioperl-live they are most welcome to it;
the web site docs give all the instructions one needs to install from pretty
much any platform.

Beyond that, I'm spent on this thread.

Chris 


From lstein at cshl.edu  Tue Jun 27 18:35:08 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 27 Jun 2006 18:35:08 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <200606271835.09558.lstein@cshl.edu>

Hi All,

This is rather late, but just for future reference on the mailing list,  here 
is how I would do the task using Bio::DB::Fasta.

Script 1: index the file for future use:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	
	my $filename = shift;  # name of file to index on command line
	Bio::DB::Fasta->new($filename,-makeid=>\&make_my_id)
		or die "Indexing failed";
	print "Indexing succeeded!\n";
	exit 0;

	sub make_my_id {
		my $description_line = shift;
		$description_line =~ /(\d+_at)/ or die "malformed description line";
		return $1;
	}

Run this script once to create a reusable index of the file. The index will be 
stored in the same directory as the FASTA file.

Script 2: extract the sequences using the IDs stored in a second file:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	use Bio::SeqIO;
	use IO::File;

	my $indexed_fasta_file = shift;
	my $probe_id_file         = shift;

	# open up the indexed fasta file
	my $db = Bio::DB::Fasta->new($indexed_fasta_file) or die;
	# open up a FASTA writer
	my $out = Bio::SeqIO->new(-format=>'Fasta',-fh=>\*STDOUT) or die;
	# open the probe id file
	my $in   = IO::File->new($probe_id_file) or die;

	# do the work
	while (my $id = <$in>) {
		chomp $id;
		my $seq = $db->get_Seq_by_id($id) or die;
		$out->write_seq($seq);
	}

	exit 0;

Bio::Index::Fasta will work in almost exactly the same way. The only 
difference is that the Bio::DB::Fasta will allow you to retrieve subsequences 
efficiently.

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From awitney at sgul.ac.uk  Tue Jun 27 10:08:20 2006
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 27 Jun 2006 15:08:20 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A13BD4.60802@sgul.ac.uk>


> Have you looked at the concept of 'quantitation types', e.g. in MAGE  
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

the MGED Ontology has a concept of quantitation type if that helps

http://mged.sourceforge.net/ontologies/MGEDontology.php#QuantitationType


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From william.hsiao at gmail.com  Tue Jun 27 15:52:03 2006
From: william.hsiao at gmail.com (William Hsiao)
Date: Tue, 27 Jun 2006 12:52:03 -0700
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
Message-ID: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>

Hi all,
   I've encountered a strange problem while parsing a gff file from
NCBI using perl.  I'm hoping that someone on the list may have a
solution even though this is not a bioperl issue.  Maybe someone
familiar with gff3 parsing can help :)  Essentially, I'm parsing a gff
file into a nested hash structure using the following functions:

sub parse_gff {
    my $file = shift;
    my %hash_gff;
    open (INFILE, $file) or die "Cannot find file $file\n";
    while(<INFILE>){
	next if (/^\#/);
	chomp;
	my ($seqid, $source, $type, $start, $end, $score, $strand, $phase,
$attributes) = split /\t/;
	my $attri_ref = &process_attributes($attributes);
	my %record = ('seqid'     => $seqid,
		      'source'    => $source,
		      'type'      => $type,
		      'start'     => $start,
		      'end'       => $end,
		      'score'     => $score,
		      'strand'    => $strand,
		      'phase'     => $phase,
		      'attribute' => $attri_ref);
	push @{$hash_gff{$type}}, \%record;
    }
    close INFILE;
    print Dumper %hash_gff;
    return \%hash_gff;
}

sub process_attributes {
    my $attr_string = shift;
    my @attributes = split (/\;/, $attr_string);
    my %attr;
    foreach (@attributes){
	my ($key, $value) = split /=/;
	if ($value=~/\:/){
	    my ($subkey, $subvalue) = split (/:/, $value);
	    $attr{$key}{$subkey}=$subvalue;
	}
	else{
	    $attr{$key}=$value;
	}
    }
    return \%attr;
}

   It works for all the gff files we downloaded from NCBI's microbial
genomes refseq ftp repository.  However, 3 lines from one particular
file NC_005966.gff (of Acinetobacter_sp_ADP1) can not be parsed
properly.  These lines are:

NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	start_codon	636487	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	stop_codon	635833	635835	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

   They generate an error: Can't use string
("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
 The strange part is that all I have to do is replace the word
"function" in front of "=adaptation%20to%20stress;" with another word
or simply change it to functions or functio or Function, etc, then the
line parses properly.  If I retype the word "function", it doesn't
solve the problem.  For some strange reason, when the word "function"
is there, perl tried to use "adaptation%20to%20stress" as the hash key
and failed.  The word "function" is used in other lines as well so I
don't think the problem is not caused by the word alone.
    Any suggestion on what might be happening would be greatly
appreciated.  Thank you.

Cheers,

Will

-- 
William Hsiao
PhD Student, Brinkman Laboratory
Department of Molecular Biology and Biochemistry
Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6
Phone: 604-291-4206 Fax: 604-291-5583


From bix at sendu.me.uk  Wed Jun 28 04:25:52 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 09:25:52 +0100
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
In-Reply-To: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
References: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
Message-ID: <44A23D10.1010308@sendu.me.uk>

William Hsiao wrote:
>
> sub process_attributes {
>     my $attr_string = shift;
>     my @attributes = split (/\;/, $attr_string);
>     my %attr;
>     foreach (@attributes){
> 	my ($key, $value) = split /=/;
> 	if ($value=~/\:/){
> 	    my ($subkey, $subvalue) = split (/:/, $value);
             # assign hashref to $key, assign key => value pair to that
> 	    $attr{$key}{$subkey}=$subvalue;
> 	}
> 	else{
             # assign scalar $key
> 	    $attr{$key}=$value;
> 	}
>     }
>     return \%attr;
> }

> NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

>    They generate an error: Can't use string
> ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
>  The strange part is that all I have to do is replace the word
> "function" in front of "=adaptation%20to%20stress;" with another word
> or simply change it to functions or functio or Function, etc, then the
> line parses properly.

The problem is that these lines contain function=x twice, where the 
second x contains a colon.
So your code first assigns $attr{function} = $scalar, and then tries to
do $attr{function}{before_colon} = "after_colon".

Normally the latter would auto-vivicate $attr{function} as a hash 
reference: $attr{function} == HASH(xyz) and then set before_colon => 
after_colon as a key value pair of HASH(xyz). But in this case, 
$attr{function} already exists: $attr{function} == 
"adaptation%20to%20stress". But you try and set before_colon => 
after_colon as a key value pair of that string. Which you can't do.

Basically, your data structure isn't so great, mixing scalars and hash 
references as values of %attr.

The solution may be to parse using Bioperl instead ;).


From selvik at ufl.edu  Tue Jun 27 08:54:48 2006
From: selvik at ufl.edu (Kadirvel, Selvi)
Date: Tue, 27 Jun 2006 08:54:48 -0400 (EDT)
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
Message-ID: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>

All,

(I am new to Bioinformatics and Bioperl, so please apologize if I 
get my terminology wrong)

I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
report consists of three sections namely;

1. A ranked list of the best scoring HMMs
2. A list of the best scoring domains in order of their occurrence 
in the sequence
3. Alignments for all the best scoring domains.

Section 3 can be truncated to a specific number using the ??A? 
option when building the report.

Though the Bio::SearchIO::hmmer module parses through the entire 
HMMER report (Section 1, 2 and 3), the set of values made 
available through Bio::Search::Result::ResultI seem to be using 
Section 3 alone. So when we use the ?A option to truncate, we lose 
otherwise useful information in Section 1. This information is 
lost (only) for those models that do not have any of their domains 
in the top ?A number of? best scoring domains. The fields that are 
not available are:

1.	Description of a model
2.	Score of a model
3.	Evalue of a model

If I use the older Bio::Tools::HMMER:Results module, NEITHER 
Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to 
retrieve the above listed values. Scores and Evalues are available 
for each domain but not for the model it belongs to.

I was wondering if there is any other method to access these 
values or do I have to write my own module to do this?

Any ideas/suggestions would be greatly appreciated.

Thank you!


Selvi Kadirvel

Graduate Research Assistant
High Performance Computing Center
University of Florida


From hlapp at gmx.net  Tue Jun 27 20:18:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 20:18:36 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A14DD2.7000402@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
	<44A14DD2.7000402@sendu.me.uk>
Message-ID: <E4565670-479B-4247-A3CB-3DA998AF8456@gmx.net>


On Jun 27, 2006, at 11:25 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> I would have suggested initiating a quantitation type ontology,  
>> not one
>> individual per module.
>
> Where would such a thing 'live'? Would it be some static file  
> somewhere
> that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology
> that can added to by a module when it needs extra terms to describe  
> its
> particular kind of scores?

For instance, yes. Once you read in an ontology (through  
Bio::OntologyIO indeed) it sits essentially in memory.

> [...]
> Not having any experience with ontolgies, I can't think how this would
> all be done in practice though. Don't we need some central module
> (Bio::Score) to create the ontology (or read it in) and then present
> some suitable interface to it?

Possibly - the problem is how to get the ontology=typed term given an  
analysis program and attribute name (e.g. 'score' of a feature  
object). There is no method for doing this on a feature object and  
bolting one on would be a bad idea I think.

So, the Bio::Score would be a little hybrid between an objectified  
score value that now doesn't just have a numeric value but also a  
type term, and a factory for creating the ontology (e.g., by reading  
it in from a specified or default location). I.e., you'd have

	my $value = $score->value();
	my $type = $score->type();
	# $type is-a Bio::Ontology::TermI
	my $quant_ont = $type->ontology();
	
	# see what type of score we have
	my @ancestors = $quant_ont->get_ancestor_terms($type);
	if (grep {$_->name eq 'expectation_value'} @ancestors) {
		# it's an e-value
	} elsif ( ...test for some other type...) {
		# etc
	}


> For example, modules that wanted to store
> some scores might just ask Bio::Score for the ontology and type their
> scores by associating with an available ontology term, creating new
> terms if necessary (or is that something you would never do; the
> ontology needed to have been set up to cover all possible terms?).

Yes. You'd extend it as you encounter types that aren't in the  
ontology yet, until the ontology fully captures the knowledge domain.

> Then
> when the user has a bunch of these typed scores, surely he doesn't  
> want
> to deal with going through the ontology himself to work out what it  
> all
> means? Well, he could if he needs that level of control, but also he
> just wants to say Bio::Score->sort(x y z) or something.

See above for a quick example of the logic. I'd separate that into  
its own module, like Bio::Score::Utils.

	-hilmar

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 28 10:29:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:29:17 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>

Selvi, 

Can you send me the report you are trying to parse as an attachment?  I'll
give it a look.

Judging by the pdoc this is mapped for the event handler so it should be
there.  From the %MAPPING hash:

                 'HMMER_program'   => 'RESULT-algorithm_name',
                 'HMMER_version'   => 'RESULT-algorithm_version',
                 'HMMER_query-def' => 'RESULT-query_name',
                 'HMMER_query-len' => 'RESULT-query_length',
                 'HMMER_query-acc' => 'RESULT-query_accession',
                 'HMMER_querydesc' => 'RESULT-query_description',
                 'HMMER_hmm'       => 'RESULT-hmm_name',                 
                 'HMMER_seqfile'   => 'RESULT-sequence_file',
	           'HMMER_db'        => 'RESULT-database_name',

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> Sent: Tuesday, June 27, 2006 7:55 AM
> To: bioperl-l at lists.open-bio.org
> Cc: selvik at ufl.edu
> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> evalue, description)
> 
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A?
> option when building the report.
> 
> Though the Bio::SearchIO::hmmer module parses through the entire
> HMMER report (Section 1, 2 and 3), the set of values made
> available through Bio::Search::Result::ResultI seem to be using
> Section 3 alone. So when we use the ?A option to truncate, we lose
> otherwise useful information in Section 1. This information is
> lost (only) for those models that do not have any of their domains
> in the top ?A number of? best scoring domains. The fields that are
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model
> 
> If I use the older Bio::Tools::HMMER:Results module, NEITHER
> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> retrieve the above listed values. Scores and Evalues are available
> for each domain but not for the model it belongs to.
> 
> I was wondering if there is any other method to access these
> values or do I have to write my own module to do this?
> 
> Any ideas/suggestions would be greatly appreciated.
> 
> Thank you!
> 
> 
> 
> 
> Selvi Kadirvel
> 
> Graduate Research Assistant
> High Performance Computing Center
> University of Florida
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 10:55:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:55:31 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>
Message-ID: <003501c69ac2$e70623b0$15327e82@pyrimidine>

I hate responding to myself!!  Forgot to add that there is also
Bio::Tools::Hmmpfam :

http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam

I'll check if Bio::SearchIO catches this data and let you know what I find
out.  It should at least some according to the mapping.

Chris

> Selvi,
> 
> Can you send me the report you are trying to parse as an attachment?  I'll
> give it a look.
> 
> Judging by the pdoc this is mapped for the event handler so it should be
> there.  From the %MAPPING hash:
> 
>                  'HMMER_program'   => 'RESULT-algorithm_name',
>                  'HMMER_version'   => 'RESULT-algorithm_version',
>                  'HMMER_query-def' => 'RESULT-query_name',
>                  'HMMER_query-len' => 'RESULT-query_length',
>                  'HMMER_query-acc' => 'RESULT-query_accession',
>                  'HMMER_querydesc' => 'RESULT-query_description',
>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
> 	           'HMMER_db'        => 'RESULT-database_name',
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> > Sent: Tuesday, June 27, 2006 7:55 AM
> > To: bioperl-l at lists.open-bio.org
> > Cc: selvik at ufl.edu
> > Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> > evalue, description)
> >
> > All,
> >
> > (I am new to Bioinformatics and Bioperl, so please apologize if I
> > get my terminology wrong)
> >
> > I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> > report consists of three sections namely;
> >
> > 1. A ranked list of the best scoring HMMs
> > 2. A list of the best scoring domains in order of their occurrence
> > in the sequence
> > 3. Alignments for all the best scoring domains.
> >
> > Section 3 can be truncated to a specific number using the ??A?
> > option when building the report.
> >
> > Though the Bio::SearchIO::hmmer module parses through the entire
> > HMMER report (Section 1, 2 and 3), the set of values made
> > available through Bio::Search::Result::ResultI seem to be using
> > Section 3 alone. So when we use the ?A option to truncate, we lose
> > otherwise useful information in Section 1. This information is
> > lost (only) for those models that do not have any of their domains
> > in the top ?A number of? best scoring domains. The fields that are
> > not available are:
> >
> > 1.	Description of a model
> > 2.	Score of a model
> > 3.	Evalue of a model
> >
> > If I use the older Bio::Tools::HMMER:Results module, NEITHER
> > Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> > retrieve the above listed values. Scores and Evalues are available
> > for each domain but not for the model it belongs to.
> >
> > I was wondering if there is any other method to access these
> > values or do I have to write my own module to do this?
> >
> > Any ideas/suggestions would be greatly appreciated.
> >
> > Thank you!
> >
> >
> >
> >
> > Selvi Kadirvel
> >
> > Graduate Research Assistant
> > High Performance Computing Center
> > University of Florida
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Jun 28 11:04:29 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 16:04:29 +0100
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <44A29A7D.7020602@sendu.me.uk>

Kadirvel, Selvi wrote:
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I 
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence 
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A? 
> option when building the report.

What do you mean by this? What is ??A? ?
Is this an option you're supplying to hmmpfam or a bioperl module?


> Though the Bio::SearchIO::hmmer module parses through the entire 
> HMMER report (Section 1, 2 and 3), the set of values made 
> available through Bio::Search::Result::ResultI seem to be using 
> Section 3 alone. So when we use the ?A option to truncate, we lose 
> otherwise useful information in Section 1. This information is 
> lost (only) for those models that do not have any of their domains 
> in the top ?A number of? best scoring domains. The fields that are 
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model

Each hit you get back from each result of the SearchIO is a 
Bio::Search::Hit::HMMERHit and represents the results of a particular 
model (you can also say $result->next_model).

So you can say:
$hit->name, " ", $hit->description, " ", $hit->significance, " ", 
$hit->score;

To get the information you want.
General information about the result can be had like so:
print $result->query_name, " ", $result->algorithm, " ", 
$result->hmm_name, "\n";

I have another problem (or the same one as you? I'm can't tell...) in 
that I can only get a single result, hit and hsp from my hmmpfam file!
It is doing my head in, but I might be doing something wrong so will 
look into it further before posting a bug report.


From bix at sendu.me.uk  Wed Jun 28 12:46:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 17:46:57 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A29A7D.7020602@sendu.me.uk>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
	<44A29A7D.7020602@sendu.me.uk>
Message-ID: <44A2B281.7030806@sendu.me.uk>

Sendu Bala wrote:
[ from thread Bio::SearchIO - Accessing Model parameters (score, evalue, 
description) ]
[ concerning hmmpfam output ]
> I have another problem (or the same one as you? I'm can't tell...) in 
> that I can only get a single result, hit and hsp from my hmmpfam file!
> It is doing my head in, but I might be doing something wrong so will 
> look into it further before posting a bug report.

I was just doing something wrong, but...

Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report 
a single HSP per Hit so domains with multiple alignments get separate 
Hits (more FASTA like) since they aren't really HSPs'

Strangely 1.25 (Bioperl 1.4) seems to behave like that already.

In any case, this is extremely counter-intuitive, especially given that 
next_domain is a synonym of next_hsp. I think either the synonym 
relationship remains and hits have multiple hsps (and there is only one 
hit per model), or next_domain goes off and finds the hsp that is the 
next domain of the current model. But that would be incredibly broken in 
the current model since it would be found in a different hit object...

What hmmpfam does is take a database of models which can be thought of 
as database sequences. Then it aligns each one against your query 
sequences. A model could align in multiple locations along a query 
sequence. Each one of these locations is called a domain of the model. A 
user of hmmpfam is model-centric (wants to know which models are on his 
query), and so you want to know all about how well the model did in one 
go. So you should be able to get the results for a model ($hit = 
$result->next_model), get overall info about it ($hit->score etc.), then 
get more detailed information about each domain of it (while ($hsp = 
$hit->next_domain) {...}). But right now you only get one domain and you 
have to go searching through all your other hits to find a hit with the 
same ->name() as your model of interest to get the next domain of your 
model.

In my view this is less than ideal. What do people think? Should it be 
changed?


From selvik at ufl.edu  Wed Jun 28 11:21:37 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 11:21:37 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <003501c69ac2$e70623b0$15327e82@pyrimidine>
References: <003501c69ac2$e70623b0$15327e82@pyrimidine>
Message-ID: <2679E8D1-E225-4414-8925-1EB73B83523B@ufl.edu>

Thanks for your reply Chris.

I am attaching a part of the report I am trying to parse.

Also I see that, Bio::SearchIO::hmmer.pm is parsing all three  
sections. I am not sure how (or whether) fields from Section 1 are  
actually being made available through Bio::SearchIO or Bio::Search:: 
[Hit | Hsp | Result].

I'll look into Bio::Tools::Hmmpfam and let you know if that works for  
me.

-Selvi


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ManyQueries.hmmer
Type: application/octet-stream
Size: 3684451 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060628/53dcc875/attachment-0002.obj>
-------------- next part --------------


On Jun 28, 2006, at 10:55 AM, Chris Fields wrote:

> I hate responding to myself!!  Forgot to add that there is also
> Bio::Tools::Hmmpfam :
>
> http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam
>
> I'll check if Bio::SearchIO catches this data and let you know what  
> I find
> out.  It should at least some according to the mapping.
>
> Chris
>
>> Selvi,
>>
>> Can you send me the report you are trying to parse as an  
>> attachment?  I'll
>> give it a look.
>>
>> Judging by the pdoc this is mapped for the event handler so it  
>> should be
>> there.  From the %MAPPING hash:
>>
>>                  'HMMER_program'   => 'RESULT-algorithm_name',
>>                  'HMMER_version'   => 'RESULT-algorithm_version',
>>                  'HMMER_query-def' => 'RESULT-query_name',
>>                  'HMMER_query-len' => 'RESULT-query_length',
>>                  'HMMER_query-acc' => 'RESULT-query_accession',
>>                  'HMMER_querydesc' => 'RESULT-query_description',
>>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
>> 	           'HMMER_db'        => 'RESULT-database_name',
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
>>> Sent: Tuesday, June 27, 2006 7:55 AM
>>> To: bioperl-l at lists.open-bio.org
>>> Cc: selvik at ufl.edu
>>> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters  
>>> (score,
>>> evalue, description)
>>>
>>> All,
>>>
>>> (I am new to Bioinformatics and Bioperl, so please apologize if I
>>> get my terminology wrong)
>>>
>>> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
>>> report consists of three sections namely;
>>>
>>> 1. A ranked list of the best scoring HMMs
>>> 2. A list of the best scoring domains in order of their occurrence
>>> in the sequence
>>> 3. Alignments for all the best scoring domains.
>>>
>>> Section 3 can be truncated to a specific number using the ??A?
>>> option when building the report.
>>>
>>> Though the Bio::SearchIO::hmmer module parses through the entire
>>> HMMER report (Section 1, 2 and 3), the set of values made
>>> available through Bio::Search::Result::ResultI seem to be using
>>> Section 3 alone. So when we use the ?A option to truncate, we lose
>>> otherwise useful information in Section 1. This information is
>>> lost (only) for those models that do not have any of their domains
>>> in the top ?A number of? best scoring domains. The fields that are
>>> not available are:
>>>
>>> 1.	Description of a model
>>> 2.	Score of a model
>>> 3.	Evalue of a model
>>>
>>> If I use the older Bio::Tools::HMMER:Results module, NEITHER
>>> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
>>> retrieve the above listed values. Scores and Evalues are available
>>> for each domain but not for the model it belongs to.
>>>
>>> I was wondering if there is any other method to access these
>>> values or do I have to write my own module to do this?
>>>
>>> Any ideas/suggestions would be greatly appreciated.
>>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>> Selvi Kadirvel
>>>
>>> Graduate Research Assistant
>>> High Performance Computing Center
>>> University of Florida
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From akarger at CGR.Harvard.edu  Wed Jun 28 15:49:54 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 28 Jun 2006 15:49:54 -0400
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
Message-ID: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>

>perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
-e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: acc (P09651) does not exist
---------------------------------------------------
>perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
$format="fasta";' -e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: id (ROA1_HUMAN) does not exist
---------------------------------------------------

But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
Same error for a couple other proteins.
Works for a GenBank protein.

perl 5.8.6
Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp

This worked a few months ago.
What's going on?

-Amir Karger


From cjfields at uiuc.edu  Wed Jun 28 16:27:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 15:27:15 -0500
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>
Message-ID: <006901c69af1$412c3590$15327e82@pyrimidine>

This was a recent bug due to recent changes in EBI's remote database; they
changed the name of the database from 'swall' to 'uniprot'.  Update to
bioperl-live from CVS (or just Bio::DB::SwissProt) and that should fix it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, June 28, 2006 2:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
> 
> >perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
> -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: acc (P09651) does not exist
> ---------------------------------------------------
> >perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
> $format="fasta";' -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: id (ROA1_HUMAN) does not exist
> ---------------------------------------------------
> 
> But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
> Same error for a couple other proteins.
> Works for a GenBank protein.
> 
> perl 5.8.6
> Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp
> 
> This worked a few months ago.
> What's going on?
> 
> -Amir Karger
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed Jun 28 16:39:43 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 28 Jun 2006 13:39:43 -0700
Subject: [Bioperl-l] FW:  How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <1A4207F8295607498283FE9E93B775B4019719A4@EX02.asurite.ad.asu.edu>

This was supposed to go to the list...  Still not used to Outlook...

> The points made here, as I see them:
> 
> 1)  Commits should be made to stable releases (as well as to 
> the main branch in CVS) to fix bugs as long as that release is
supported.  I 
> agree with this, but someone has to volunteer, and the length of time
a 
> release is supported also worked out.  Almost would be better going to
a regular
> release schedule (once every 3-6 months or so) where the code is given
as is
> to CPAN, whether it passes tests or not.

What I've seen in other projects is that stable is supported and bug
patched up till the next stable release.  After that support is dropped.
Once a branch was tagged stable the ONLY thing that went into it was
fixes for bugs based on the code already present.  No new features, no
refactoring of any code or modules.  I'm not certain how often things
like a stable patch release happened since most of the bugs were worked
on long before while it was still tagged as dev.  I could see, worst
case a .x release to stable every 6 months to a year until the next
stable came out if there were patches to it.  It looks like the wiki has
most of this kind of stuff documented in the previously posted link:
http://www.bioperl.org/wiki/Making_a_BioPerl_release.  I guess it would
just need a pumpkin/monkey/whatever to step up to keep things rolling...

> 2)  More communication about the direction Bioperl is 
> heading; personally I
> haven't see a problem with this as much as there is no 
> information about a
> roadmap.  That is being alleviated soon I believe, thought 
> people out there
> need to be patient.
> 
> 3)  Volunteer.  If you have something you believe needs to be 
> done and you
> believe so fervently, then put up or shut up.  Make (nice polite)
> suggestions otherwise.  Don't judge code or "the way things 
> are done" and
> don't presume what kind of experience people have that you 
> don't know and
> haven't met.  End of story.
> 
> Chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Jun 28 18:14:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:14:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A2B281.7030806@sendu.me.uk>
Message-ID: <007e01c69b00$2e091410$15327e82@pyrimidine>

> Sendu Bala wrote:
> [ from thread Bio::SearchIO - Accessing Model parameters (score, evalue,
> description) ]
> [ concerning hmmpfam output ]
> > I have another problem (or the same one as you? I'm can't tell...) in
> > that I can only get a single result, hit and hsp from my hmmpfam file!
> > It is doing my head in, but I might be doing something wrong so will
> > look into it further before posting a bug report.
> 
> I was just doing something wrong, but...
> 
> Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report
> a single HSP per Hit so domains with multiple alignments get separate
> Hits (more FASTA like) since they aren't really HSPs'
> 
> Strangely 1.25 (Bioperl 1.4) seems to behave like that already.
> 
> In any case, this is extremely counter-intuitive, especially given that
> next_domain is a synonym of next_hsp. I think either the synonym
> relationship remains and hits have multiple hsps (and there is only one
> hit per model), or next_domain goes off and finds the hsp that is the
> next domain of the current model. But that would be incredibly broken in
> the current model since it would be found in a different hit object...
>
> What hmmpfam does is take a database of models which can be thought of
> as database sequences. Then it aligns each one against your query
> sequences. A model could align in multiple locations along a query
> sequence. Each one of these locations is called a domain of the model. A
> user of hmmpfam is model-centric (wants to know which models are on his
> query), and so you want to know all about how well the model did in one
> go. So you should be able to get the results for a model ($hit =
> $result->next_model), get overall info about it ($hit->score etc.), then
> get more detailed information about each domain of it (while ($hsp =
> $hit->next_domain) {...}). But right now you only get one domain and you
> have to go searching through all your other hits to find a hit with the
> same ->name() as your model of interest to get the next domain of your
> model.
> 
> In my view this is less than ideal. What do people think? Should it be
> changed?

The model (hit-like) table scores are retained and can be retrieved via
$model->significance and the individual domain (hsp-like) evalues via
$model->evalue.  The reason you don't get all the individual domain evalues
is that only five alignments are returned by default.  You might try
changing the 'A' parameter to see if you can get more alignments; that may
work around the problem of missing domains for now.  You'll note that the
Model/Domain results returned are not based on top score but what looks like
the position of the domain in the sequence (seq-t in the last table); that's
what is stated in the hmmpfam docs.  Anyway, I tried this loop with the
reports Selvi sent and it works, but only for the ones that return
alignments:

my $result_count = 1;
while ( my $result = $searchio->next_result() ) {
  print "Result $result_count : ",$result->query_name,"\n";
  print "Result models: ",$result->num_hits,"\n";
  while (my $model = $result->next_hit) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_hsp) {
	  print "\t\tDomain : ",$domain->name,"\n";
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
  $result_count++;
}

>From the HMMER docs: "Say you have a new sequence that, according to a BLAST
analysis, shows a slew of hits to receptor tyrosine kinases. Before you
decide to call your sequence an RTK homologue, you suspiciously recall that
RTK's are, like many proteins, composed of multiple functional domains, and
these domains are often found promiscuously in proteins with a wide variety
of functions. Is your sequence really an RTK? Or is it a novel sequence that
just happens to have a protein kinase catalytic domain or fibronectin type
III domain?"

Model/domain pairs really aren't Hits/HSPs by definition, like the CVS
commit from Jason states.  The way Pfam is set up, you actually have your
query(ies) scanned using a database of Pfam domains (HMM's, built from
protein alignments for various protein families), hence the alignment in the
report is not a HSP since HSPs come from pairwise sequence alignments.  An
HSP is a pair of sequences which, when aligned, meet or exceed a maximal
cutoff.  The hmmpfam report has alignments of the sequence and the consensus
for the alignment the HMM is based on (not another sequence, so not an HSP).
This is also the same reason you can't get alignments from
Bio::Search::HSP::HMMERHSP objects since the model 'sequence' isn't a true
sequence but a consensus of sequences, so it's 'inappropriate' to use that
as an actual alignment.  Bad Bioperl user!  Bad!

I think the reasoning for keeping single model-domain pairs is that you
should consider each domain's location in the sequence as well as the number
of times they appear, regardless of whether they belong to the same model or
not.  One protein could have three ATP-binding domains and another two, and
they could be located in different positions on the sequence.  But where
they are on the sequence in relation to other domains and to each other
(i.e. positional information) is just as important, maybe more so, than how
many times that domain appears.  

Well, that and SearchIO is set up as a SAX-like parser, so I believe it
processes the model-domain alignments as the file is parsed.

My 2c: there should be a way to get all model-domain pairs in the "parsed
for domains" table (which is like a list of HSPs).  Seems the last few w/o
alignments are not retained; this may be the way the parser is set up.  I
would try getting the handler to return just evalues and similar stuff for
those and leave out sequence/alignment info, if that's possible.  Not sure
how this is handled with BLAST reports where there are more hits reported
than alignments...

Chris
_____________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 18:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:16:38 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
Message-ID: <000001c69b00$86adcc00$15327e82@pyrimidine>

Arghhhh!  Made a mistake:

> my $result_count = 1;
> while ( my $result = $searchio->next_result() ) {
>   print "Result $result_count : ",$result->query_name,"\n";
>   print "Result models: ",$result->num_hits,"\n";
>   while (my $model = $result->next_hit) {
> 	print "\tModel : ",$model->name,"\n";
> 	print "\tSignif: ",$model->significance,"\n";
> 	while (my $domain = $model->next_hsp) {
> 	  print "\t\tDomain : ",$domain->name,"\n";
                              ^^^^^^^
Should be:                    $model

> 	  print "\t\tEvalue : ",$domain->evalue,"\n";
> 	}
>   }
>   $result_count++;
> }

My bad!

Chris


From bix at sendu.me.uk  Wed Jun 28 19:00:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 00:00:11 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <007e01c69b00$2e091410$15327e82@pyrimidine>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
Message-ID: <44A309FB.2050009@sendu.me.uk>

Chris Fields wrote:
>> Sendu Bala wrote:
[snip]
>> In any case, this is extremely counter-intuitive, especially given
>> that next_domain is a synonym of next_hsp. I think either the
>> synonym relationship remains and hits have multiple hsps (and there
>> is only one hit per model)
[snip]

> The model (hit-like) table scores are retained and can be retrieved
> via $model->significance and the individual domain (hsp-like) evalues
> via $model->evalue.

I know, see my earlier post.

> The reason you don't get all the individual domain evalues is that
> only five alignments are returned by default.  You might try changing
> the 'A' parameter to see if you can get more alignments; that may 
> work around the problem of missing domains for now.

[I'm using my own data, not the OP's]
No, I have all the alignments: 'A' isn't a problem. And I can get all
the domains. The problem is I have to check multiple different hits to
find them all.


> You'll note that the Model/Domain results returned are not based on 
> top score but what looks like the position of the domain in the
> sequence (seq-t in the last table); that's what is stated in the
> hmmpfam docs.
[...]
> Well, that and SearchIO is set up as a SAX-like parser, so I believe 
> it processes the model-domain alignments as the file is parsed.

Yes, this is the problem. The parser does the obvious thing, but in my 
view it does not do the correct thing.


> Model/domain pairs really aren't Hits/HSPs by definition, like the
> CVS commit from Jason states.  The way Pfam is set up, you actually
> have your query(ies) scanned using a database of Pfam domains (HMM's,
> built from protein alignments for various protein families), hence
> the alignment in the report is not a HSP since HSPs come from
> pairwise sequence alignments.  An HSP is a pair of sequences which,
> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> has alignments of the sequence and the consensus for the alignment
> the HMM is based on (not another sequence, so not an HSP).

But this is just semantics. It doesn't /matter/ that its not really 
truly a sequence that's being aligned. The parser needs to present to 
the user the information in the file. As we see in the OP's example, it 
simply fails to do this because the parser isn't model-centric while the 
file it is parsing /is/.

And in any case, your argument doesn't hold because even the current 
parser /does/ store domains in hsp objects! It just only stores one hsp 
per hit, repeatedly, which is nonsensical.

[to avoid confusion, in the following the use of 'model' is in the 
programming sense, whilst 'Model' refers to the things generated by hmmer]

The correct model to describe the file being parsed is one that is able 
provide to the user all the available results for all Models that hit a 
query sequence, even when there are no alignments in the file. To make 
this fit the SearchIO scheme, we must have one hit per Model. The hit 
has hsps which are the domains. This perfectly matches the information 
in the file. It matches something like a Blast, where you have one hit 
per database sequence/query sequence combo.

A hit could end up with no hsps (no domains), but we may not even care. 
Sometimes you really do just want to know if a particular model hit at 
all, and with what evalue/score. The current parsing model isn't 
guaranteed to tell you this even when you can read it yourself in the 
file being parsed.

You can guess at the intent of the original authors, I think, just by 
looking at those method synonyms. next_hit == next_model. next_hsp == 
next_domain. This makes perfect sense. This is the way to correctly 
model the information in the file. The problem is that next_model 
doesn't give you the next Model (because each Model has multiple hits), 
and next_domain doesn't give you the next domain (because each hit only 
has one domain).


> I think the reasoning for keeping single model-domain pairs is that
> you should consider each domain's location in the sequence as well as
> the number of times they appear, regardless of whether they belong to
> the same model or not.  One protein could have three ATP-binding
> domains and another two, and they could be located in different
> positions on the sequence.  But where they are on the sequence in
> relation to other domains and to each other (i.e. positional
> information) is just as important, maybe more so, than how many times
> that domain appears.

Well, that's for the user to decide. But the way the results are 
presented needs to make sense. If blast results came back with all hsps 
listed out in sequence position order, would you have multiple hits per 
database sequence each with one hsp? No, because the meaning is 
completely wrong. The 'hit' is the collection of alignments of a 
particular database sequence hitting a query sequence. The alignments 
are stored in a bunch of hsps. It is absurd to have more than one hit 
object for a database+query sequence combo, because then we have 
multiple hit objects duplicating the exact same information, and 'hit' 
no longer has any meaning - it is a collection of /some/ of the 
alignments? Yet this is exactly what we have with hmmpfam result parsing.


From selvik at ufl.edu  Wed Jun 28 16:11:56 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 16:11:56 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
References: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
Message-ID: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>

Sendu,

>
> What do you mean by this? What is ??A? ?
> Is this an option you're supplying to hmmpfam or a bioperl module?

I was referring to the '-A' option when running hmmpfam. So if I were  
to use  '-A 5', Section 3 will have only the top scoring (first) five  
HSPs.

>
> So you can say:
> $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> $hit->score;
>
> To get the information you want.
> General information about the result can be had like so:
> print $result->query_name, " ", $result->algorithm, " ",
> $result->hmm_name, "\n";

I do use the same methods that you have suggested. Let me try to  
explain my problem in detail. Lets say I have a report that was  
generated using this "-A 5" option. I want to get the description,  
score, evalue of a model that *does not* have a domain in the top 5  
high scoring HSPs. This information *exists* in the report in Section  
1 but neither $result->next_hit or $hit->next_hsp can see it.

Details of ALL domains  are available through:

     foreach $domain ($result->each_Domain)
     {
            $domain-> [ hmmname, hmmacc, start, end, hstart, hend,  
evalue ]
     }

where $result is a Bio::Tools::HMMER::Results object. But this again  
represents information in Section 2. It gives us domain scores and  
evalues (and not model scores and evalues.)

I am working around this by finding the sum of scores (evalues) of  
all domains in a model. But there seems to be no work-around to  
retrieve the description. $domain->hmmacc contains only the first  
string of the description.

-Selvi


From jason at bioperl.org  Wed Jun 28 22:53:25 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 28 Jun 2006 22:53:25 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
	<44A309FB.2050009@sendu.me.uk>
Message-ID: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>

I don't have any time to really debate this sadly - I definitely went  
back and forth on how to solve this and not many people ever spoke up  
about what the WANTED.  So glad to hear there are opinions out there  
now.

I think the bug fix you refer to had to do with not returning things  
ordered by E-value -- the creation machinery only only builds Hit  
objects when there are HSP objects being built.  Basically the  
parsing is linear in terms of the file, we read "Model" (Hit) data  
first and store them in a hash keyed by the name of the domain, but  
we only >>build<< the "Hits" when seen HSPs, hence the problem when  
the -A option limits alignments but reports Hits that don't have  
individual alignments.  This has to do with the order of things not  
syncing up and/or dealing with the -A option when there is leftover  
Hit data but no HSPs to populate them.  We also had this problem in  
BLAST reports and had to work around that, but I never bothered  
solving it in HMMER I guess.  Glad there are other people who are  
going to fix the problems!

The one "alignment" (HSP) per hit was a workaround to the problem  
that Hits were being returned in the order the HSPs came in (Sequence  
order) -- because that is the order they were being built in -- not  
in the sorted order of the Hits as seen in the report.

Feel free to propose an alternative implement for parser as you see  
fit as long as the API is preserved.  you can contibute a new  
SearchIO plugin and HMMERSearchResultListener to deal with it - or I  
guess do what I also do and just run hmmer2table and deal with things  
in a tab-delimited format.

Personally my interests lie in the actual domains so the Hit objects  
are superfluous in my own work so it never bothered me to have one  
per Hit and it flows more naturally to things like GFF, etc.  You can  
aggregate them however you like after the fact pretty simply so I  
don't find this too hard to deal with, but if this a major deterrent  
for people I guess have at it ( I think the speed of object creation  
is a larger problem that I hope that someone will work on soon).

I'd appreciate you including the salient points of how the report is  
interpreted on the wiki at some point (with 8X10 glossy pictures and  
circles and arrows on the back...http://en.wikipedia.org/wiki/Alice% 
27s_Restaurant) so the debate can be archived too.

-jason

On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:

> Chris Fields wrote:
>>> Sendu Bala wrote:
> [snip]
>>> In any case, this is extremely counter-intuitive, especially given
>>> that next_domain is a synonym of next_hsp. I think either the
>>> synonym relationship remains and hits have multiple hsps (and there
>>> is only one hit per model)
> [snip]
>
>> The model (hit-like) table scores are retained and can be retrieved
>> via $model->significance and the individual domain (hsp-like) evalues
>> via $model->evalue.
>
> I know, see my earlier post.
>
>> The reason you don't get all the individual domain evalues is that
>> only five alignments are returned by default.  You might try changing
>> the 'A' parameter to see if you can get more alignments; that may
>> work around the problem of missing domains for now.
>
> [I'm using my own data, not the OP's]
> No, I have all the alignments: 'A' isn't a problem. And I can get all
> the domains. The problem is I have to check multiple different hits to
> find them all.
>
>
>> You'll note that the Model/Domain results returned are not based on
>> top score but what looks like the position of the domain in the
>> sequence (seq-t in the last table); that's what is stated in the
>> hmmpfam docs.
> [...]
>> Well, that and SearchIO is set up as a SAX-like parser, so I believe
>> it processes the model-domain alignments as the file is parsed.
>
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.
>
>
>> Model/domain pairs really aren't Hits/HSPs by definition, like the
>> CVS commit from Jason states.  The way Pfam is set up, you actually
>> have your query(ies) scanned using a database of Pfam domains (HMM's,
>> built from protein alignments for various protein families), hence
>> the alignment in the report is not a HSP since HSPs come from
>> pairwise sequence alignments.  An HSP is a pair of sequences which,
>> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
>> has alignments of the sequence and the consensus for the alignment
>> the HMM is based on (not another sequence, so not an HSP).
>
> But this is just semantics. It doesn't /matter/ that its not really
> truly a sequence that's being aligned. The parser needs to present to
> the user the information in the file. As we see in the OP's  
> example, it
> simply fails to do this because the parser isn't model-centric  
> while the
> file it is parsing /is/.
>
> And in any case, your argument doesn't hold because even the current
> parser /does/ store domains in hsp objects! It just only stores one  
> hsp
> per hit, repeatedly, which is nonsensical.
>
> [to avoid confusion, in the following the use of 'model' is in the
> programming sense, whilst 'Model' refers to the things generated by  
> hmmer]
>
> The correct model to describe the file being parsed is one that is  
> able
> provide to the user all the available results for all Models that  
> hit a
> query sequence, even when there are no alignments in the file. To make
> this fit the SearchIO scheme, we must have one hit per Model. The hit
> has hsps which are the domains. This perfectly matches the information
> in the file. It matches something like a Blast, where you have one hit
> per database sequence/query sequence combo.
>
> A hit could end up with no hsps (no domains), but we may not even  
> care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.
>
> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple  
> hits),
> and next_domain doesn't give you the next domain (because each hit  
> only
> has one domain).
>
>
>> I think the reasoning for keeping single model-domain pairs is that
>> you should consider each domain's location in the sequence as well as
>> the number of times they appear, regardless of whether they belong to
>> the same model or not.  One protein could have three ATP-binding
>> domains and another two, and they could be located in different
>> positions on the sequence.  But where they are on the sequence in
>> relation to other domains and to each other (i.e. positional
>> information) is just as important, maybe more so, than how many times
>> that domain appears.
>
> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all  
> hsps
> listed out in sequence position order, would you have multiple hits  
> per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result  
> parsing.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Wed Jun 28 23:40:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 22:40:28 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>
Message-ID: <000301c69b2d$c3fdc6a0$15327e82@pyrimidine>

According to CVS, using -A0 (no alignments) is supposed to work since v.
1.5.1 and (I'm guessing here) should return HMMERHit/HMMERHSP objects with
no sequences, just the values from the table.  By this reasoning using -A5
should work but the first five Hit/HSP pairs will give you sequences and any
remaining should give nothing, just the Sequence Model combined evalue
(which you can get by $model->significance) and individual Domain (HSP-like)
evalues ($domain->evalue).  I don't get these either (I only get a max of 5
model/domain pairs). 

So, I tried a little experiment using the first single result output for
this query from your combined file (nbd27e02.y1  716 69 831 ; translated),
which was the first one I came across with more than five model/domain
pairs, and this scripted loop:

while ( my $result = $searchio->next_result() ) {
  print "Query: ",$result->query_name,"\n";
  while (my $model = $result->next_model) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_domain) {
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
}

I get this with the file containing the alignments.  For anyone following,
I'm using bioperl-live, perl 5.8, WinXP:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 2.6e-43
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 40
        Model : IBN_N
        Signif: 2.1
                Evalue : 2.1
        Model : Arm
        Signif: 6e-38
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 0.0096

If I manually delete the alignments (make it like -A0 output) I get this:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 157.3
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 52.1
                Evalue : 40
        Model : IBN_N
        Signif: -3.6
                Evalue : 2.1
        Model : Arm
        Signif: 139.5
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0096
        Model : Arm
        Signif: 139.5
                Evalue : 2.2e-13
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0032
        Model : Arm
        Signif: 139.5
                Evalue : 0.00019

i.e. all the model/domain pairs!  So I think it's safe to say that this is a
bug; the last few don't get processed but should.  I'll drop a bug report
into Bugzilla along with the test files and script so it can be confirmed.
This shouldn't be too hard to fix but it make take a few days; I'm pretty
busy here until Saturday.
 
Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Selvi Kadirvel
> Sent: Wednesday, June 28, 2006 3:12 PM
> To: bioperl-l at lists.open-bio.org
> Cc: Selvi Kadirvel
> Subject: Re: [Bioperl-l] Bio::SearchIO - Accessing Model parameters
> (score,evalue, description)
> 
> Sendu,
> 
> >
> > What do you mean by this? What is ??A? ?
> > Is this an option you're supplying to hmmpfam or a bioperl module?
> 
> I was referring to the '-A' option when running hmmpfam. So if I were
> to use  '-A 5', Section 3 will have only the top scoring (first) five
> HSPs.
> 
> >
> > So you can say:
> > $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> > $hit->score;
> >
> > To get the information you want.
> > General information about the result can be had like so:
> > print $result->query_name, " ", $result->algorithm, " ",
> > $result->hmm_name, "\n";
> 
> I do use the same methods that you have suggested. Let me try to
> explain my problem in detail. Lets say I have a report that was
> generated using this "-A 5" option. I want to get the description,
> score, evalue of a model that *does not* have a domain in the top 5
> high scoring HSPs. This information *exists* in the report in Section
> 1 but neither $result->next_hit or $hit->next_hsp can see it.
> 
> Details of ALL domains  are available through:
> 
>      foreach $domain ($result->each_Domain)
>      {
>             $domain-> [ hmmname, hmmacc, start, end, hstart, hend,
> evalue ]
>      }
> 
> where $result is a Bio::Tools::HMMER::Results object. But this again
> represents information in Section 2. It gives us domain scores and
> evalues (and not model scores and evalues.)
> 
> I am working around this by finding the sum of scores (evalues) of
> all domains in a model. But there seems to be no work-around to
> retrieve the description. $domain->hmmacc contains only the first
> string of the description.
> 
> -Selvi
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 01:20:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:20:10 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
Message-ID: <000d01c69b3b$b17776d0$15327e82@pyrimidine>

> I know, see my earlier post.
...
> [I'm using my own data, not the OP's]
...

Sorry, I was typing that one up over a three-hour period in between
experiments, so I didn't go back and check everything before I sent it.
Pretty much the entire file Selvi sent me (and the entire group, grrr) shows
that the domains in the domain table are not completely parsed, and the
number of reported hits correlates with the number of alignments present.
In other words, only five or less hits are reported based on the alignments
and the default max alignments reported per result is five.  I figured out
that it is a bug and plan on submitting it to Bugzilla.

What you are talking about and what Selvi describes are two separate issues.
I dealt with Selvi's for the moment; let's deal with yours.

> > Well, that and SearchIO is set up as a SAX-like parser, so I believe
...
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.

Yes, and that's your opinion.  To tell the truth I'm quite neutral on this;
I'm trying to reason along the lines the contributors for the module
intended.  The fact of the matter is the parser is set up to do it this way,
and it was set up this way by others (not you or I); modifying it to suit
one's personal wants and needs is not our job here.  I don't have issues
while I'm running it so I really don't see what the problem is, well,
besides the reported bug I found along with Selvi's help.

My view on all this before I quit for the night:

I'm really don't want to get into what I consider nit-picky issues (the
'semantics' you mention; it's a simple difference in opinion and a small one
at that).  We can agree to disagree, whatever.  The issue immediately at
hand, what I consider the most important, is that Selvi has uncovered a bug
with the code, as is.  But I'm going to vent here a bit.  It's late, I'm
tired, and this whole thing irks me.  It irks me a great deal. 

Personally, I don't think right now is the time to think about refactoring
this particular module, esp. since I find it essentially works.  I believe
that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
instance, or refactoring SearchIO::blast etc to use hashes instead of
objects to speed things up.  Or creating something yourself.  Or doing what
you currently are doing (Bio::Map).  In other words, areas where use is
high, code is aging, and refactoring is more productive.

I'll add that I'm not trying to dissuade you from trying to build your own
variation of a SearchIO HMMER parser; by all means go ahead.  The above is
how I feel.  You can build your own parser to do what you want; you can even
base it off the current SearchIO HMMER parser and see if you can set it up
to give you the results you want, using a different handler and so on.  Just
don't break the API or modify the current code based strictly on what your
opinion of how it should work is.  It was probably set up this way for a
particular reason.

According to the SearchIO HOWTO the intent for SearchIO was to 'genericize'
parsing reports with 'similar' styles, like BLAST, FASTA, HMMER, and so on.
The most prevalently parsed reports, by a long stretch, are BLAST reports,
which is what the system is based on: 

http://www.bioperl.org/wiki/HOWTO:SearchIO#Design

So the SearchIO system is based on the >assumption< that these reports can
be divi'd up with the data mapped into categories (Results, Hits, HSPs), so
similar objects should be able to handle them.  Domain data are currently
stored in HSP objects (HMMERHSP), but that's nothing more than a convenient
way to store HMMER report data in my opinion; the alignment matches,
strictly speaking, are not HSP's.  You could rename HMMERHit HMMERModel and
HMMERHsp HMMERDomain, but they would still, if they fit into SearchIO and
used the current event handlers, implement HitI/HSPI by inheriting from
GenericHit/GenericHSP.  Ergo, any easy way you go about it here, HMMERHit
is-a HitI and HMMERHsp is-a HSPI.  You could probably work around it by
building the 'correct' object hierarchy by setting up your own handler and
SearchIO plugin, but that risks changing API.  And, really, if you decide to
go down that path, consider what Jason is talking about when he mentions
using "under-the-hood" hashes.

> A hit could end up with no hsps (no domains), but we may not even care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.

For every model (hit) you should have a corresponding domain (HSP) or more
depending on your view of how the parser works, even if the domain (HSP) is
only present in the table and not in an alignment.  You shouldn't have
models w/o domains from your query (hits w/o hsps); that doesn't make any
sense.  If hmmpfam output has this then it's a serious issue, but, again,
that doesn't make sense.  All that information is in the tables in the
hmmpfam output; you can even build objects w/o alignments present (-A0)
straight from the tables.

If you wanted to know whether a particular model hit at all, grab all the
model objects ($result->models) and run through them to see if your expected
model (Annexin, Phosphoribosyl, or whatever) is there using a map/grep
block, regex, or whatever; you could autovivicate a hash or similar data
structure indicating that a particular sequence has x domains of y type.  Or
iterate through them like you would for a BLAST report.  I don't see what's
difficult about this; I do it for BLAST sequences, SeqFeatures, and many
other BioPerl objects all the time!  Yes, it can be slow; that's an issue
with object instantiation and Perl and there is no easy way around it
besides refactoring the SearchIO parsers/eventhandlers to send back hashes,
as Jason has suggested.

> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple hits),
> and next_domain doesn't give you the next domain (because each hit only
> has one domain).

....

> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all hsps
> listed out in sequence position order, would you have multiple hits per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result parsing.

The problem is that the module is geared to parse the output as simply as
possible, so it does it by sequence order, just like the output.  And, as
is, it makes sense to me why Eddy and Co. set it that way, not that I
completely agree with it.  Hmmpfam output is designed for annotating
sequences using Pfam HMM's, so the results are hard-coded to appear in
sequence order, not based on score or evalue.  That's the way it is; not
necessarily the best way IMHO (I would have a way to sort by evalue or model
myself as an option), but it's the only way that's currently available.
Yes, each Model can match more than one domain on a query sequence.  Again,
that this is the 'correct way' to set up this parser is your opinion; if you
want, design your own SearchIO parser.  Like I said, I don't have a problem
with using this module myself.  And I'm a bit reticent to spend the energy
overhaulin' this module when I could spend my time working on something else
I consider more constructive (or destructive, depending on your view).  

And, frankly, it's not up to the user when using code they didn't create.
You have to deal with it.  Or code something yourself to do things the way
you want.  You have the power to do that; most bioperl users don't simply
b/c they probably don't understand the class structure and OO nature of
Bioperl.  It's just a matter of where you want to spend your energy: dealing
with something that interests you or fixing other's people's broken code.


Chris


From cjfields at uiuc.edu  Thu Jun 29 01:23:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:23:03 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <000e01c69b3c$18d58fb0$15327e82@pyrimidine>

...
 
> I think the bug fix you refer to had to do with not returning things
> ordered by E-value -- the creation machinery only only builds Hit
> objects when there are HSP objects being built.  Basically the
> parsing is linear in terms of the file, we read "Model" (Hit) data
> first and store them in a hash keyed by the name of the domain, but
> we only >>build<< the "Hits" when seen HSPs, hence the problem when
> the -A option limits alignments but reports Hits that don't have
> individual alignments.  This has to do with the order of things not
> syncing up and/or dealing with the -A option when there is leftover
> Hit data but no HSPs to populate them.  We also had this problem in
> BLAST reports and had to work around that, but I never bothered
> solving it in HMMER I guess.  Glad there are other people who are
> going to fix the problems!

Yeah, just figured that one out.  I see the two tables are parsed into two
arrays, so it is feasible to have the leftover (Hit/HSP|Model/Domain)
whatever converted into the proper objects like without any alignments (-A0
optional output).  I plan on reporting this in Bugzilla and will work on it,
but can't get to it immediately (probably not 'til Friday-Saturday at the
earliest).  If Sendu wants to tackle it I don't have a problem.

> The one "alignment" (HSP) per hit was a workaround to the problem
> that Hits were being returned in the order the HSPs came in (Sequence
> order) -- because that is the order they were being built in -- not
> in the sorted order of the Hits as seen in the report.

The SAX method, I gather, getting in the way.  

> Feel free to propose an alternative implement for parser as you see
> fit as long as the API is preserved.  you can contibute a new
> SearchIO plugin and HMMERSearchResultListener to deal with it - or I
> guess do what I also do and just run hmmer2table and deal with things
> in a tab-delimited format.

Or set it up as hashes, which you have mentioned before for BLAST.

> Personally my interests lie in the actual domains so the Hit objects
> are superfluous in my own work so it never bothered me to have one
> per Hit and it flows more naturally to things like GFF, etc.  You can
> aggregate them however you like after the fact pretty simply so I
> don't find this too hard to deal with, but if this a major deterrent
> for people I guess have at it ( I think the speed of object creation
> is a larger problem that I hope that someone will work on soon).

Agreed, though now it's finding the time....


Chris 

> I'd appreciate you including the salient points of how the report is
> interpreted on the wiki at some point (with 8X10 glossy pictures and
> circles and arrows on the back...http://en.wikipedia.org/wiki/Alice%
> 27s_Restaurant) so the debate can be archived too.
> 
> -jason
> 
> On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:
> 
> > Chris Fields wrote:
> >>> Sendu Bala wrote:
> > [snip]
> >>> In any case, this is extremely counter-intuitive, especially given
> >>> that next_domain is a synonym of next_hsp. I think either the
> >>> synonym relationship remains and hits have multiple hsps (and there
> >>> is only one hit per model)
> > [snip]
> >
> >> The model (hit-like) table scores are retained and can be retrieved
> >> via $model->significance and the individual domain (hsp-like) evalues
> >> via $model->evalue.
> >
> > I know, see my earlier post.
> >
> >> The reason you don't get all the individual domain evalues is that
> >> only five alignments are returned by default.  You might try changing
> >> the 'A' parameter to see if you can get more alignments; that may
> >> work around the problem of missing domains for now.
> >
> > [I'm using my own data, not the OP's]
> > No, I have all the alignments: 'A' isn't a problem. And I can get all
> > the domains. The problem is I have to check multiple different hits to
> > find them all.
> >
> >
> >> You'll note that the Model/Domain results returned are not based on
> >> top score but what looks like the position of the domain in the
> >> sequence (seq-t in the last table); that's what is stated in the
> >> hmmpfam docs.
> > [...]
> >> Well, that and SearchIO is set up as a SAX-like parser, so I believe
> >> it processes the model-domain alignments as the file is parsed.
> >
> > Yes, this is the problem. The parser does the obvious thing, but in my
> > view it does not do the correct thing.
> >
> >
> >> Model/domain pairs really aren't Hits/HSPs by definition, like the
> >> CVS commit from Jason states.  The way Pfam is set up, you actually
> >> have your query(ies) scanned using a database of Pfam domains (HMM's,
> >> built from protein alignments for various protein families), hence
> >> the alignment in the report is not a HSP since HSPs come from
> >> pairwise sequence alignments.  An HSP is a pair of sequences which,
> >> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> >> has alignments of the sequence and the consensus for the alignment
> >> the HMM is based on (not another sequence, so not an HSP).
> >
> > But this is just semantics. It doesn't /matter/ that its not really
> > truly a sequence that's being aligned. The parser needs to present to
> > the user the information in the file. As we see in the OP's
> > example, it
> > simply fails to do this because the parser isn't model-centric
> > while the
> > file it is parsing /is/.
> >
> > And in any case, your argument doesn't hold because even the current
> > parser /does/ store domains in hsp objects! It just only stores one
> > hsp
> > per hit, repeatedly, which is nonsensical.
> >
> > [to avoid confusion, in the following the use of 'model' is in the
> > programming sense, whilst 'Model' refers to the things generated by
> > hmmer]
> >
> > The correct model to describe the file being parsed is one that is
> > able
> > provide to the user all the available results for all Models that
> > hit a
> > query sequence, even when there are no alignments in the file. To make
> > this fit the SearchIO scheme, we must have one hit per Model. The hit
> > has hsps which are the domains. This perfectly matches the information
> > in the file. It matches something like a Blast, where you have one hit
> > per database sequence/query sequence combo.
> >
> > A hit could end up with no hsps (no domains), but we may not even
> > care.
> > Sometimes you really do just want to know if a particular model hit at
> > all, and with what evalue/score. The current parsing model isn't
> > guaranteed to tell you this even when you can read it yourself in the
> > file being parsed.
> >
> > You can guess at the intent of the original authors, I think, just by
> > looking at those method synonyms. next_hit == next_model. next_hsp ==
> > next_domain. This makes perfect sense. This is the way to correctly
> > model the information in the file. The problem is that next_model
> > doesn't give you the next Model (because each Model has multiple
> > hits),
> > and next_domain doesn't give you the next domain (because each hit
> > only
> > has one domain).
> >
> >
> >> I think the reasoning for keeping single model-domain pairs is that
> >> you should consider each domain's location in the sequence as well as
> >> the number of times they appear, regardless of whether they belong to
> >> the same model or not.  One protein could have three ATP-binding
> >> domains and another two, and they could be located in different
> >> positions on the sequence.  But where they are on the sequence in
> >> relation to other domains and to each other (i.e. positional
> >> information) is just as important, maybe more so, than how many times
> >> that domain appears.
> >
> > Well, that's for the user to decide. But the way the results are
> > presented needs to make sense. If blast results came back with all
> > hsps
> > listed out in sequence position order, would you have multiple hits
> > per
> > database sequence each with one hsp? No, because the meaning is
> > completely wrong. The 'hit' is the collection of alignments of a
> > particular database sequence hitting a query sequence. The alignments
> > are stored in a bunch of hsps. It is absurd to have more than one hit
> > object for a database+query sequence combo, because then we have
> > multiple hit objects duplicating the exact same information, and 'hit'
> > no longer has any meaning - it is a collection of /some/ of the
> > alignments? Yet this is exactly what we have with hmmpfam result
> > parsing.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 29 03:02:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:02:49 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
Message-ID: <44A37B19.7030908@sendu.me.uk>

Chris Fields wrote:
>
> Personally, I don't think right now is the time to think about refactoring
> this particular module, esp. since I find it essentially works.  I believe
> that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
> instance, or refactoring SearchIO::blast etc to use hashes instead of
> objects to speed things up.  Or creating something yourself.  Or doing what
> you currently are doing (Bio::Map).  In other words, areas where use is
> high, code is aging, and refactoring is more productive.

Hmmer parsing happens to be important to me, in fact vital for my work. 
I've been using my own parser up till now, so didn't know what the 
Bioperl one was like. I'd like to use Bioperl for more things, 
preferably everything.


> I'll add that I'm not trying to dissuade you from trying to build your own
> variation of a SearchIO HMMER parser; by all means go ahead.  The above is
> how I feel.  You can build your own parser to do what you want; you can even
> base it off the current SearchIO HMMER parser and see if you can set it up
> to give you the results you want, using a different handler and so on.  Just
> don't break the API or modify the current code based strictly on what your
> opinion of how it should work is.  It was probably set up this way for a
> particular reason.

Well, I don't like the idea of there being multiple SearchIO parsers for 
the same thing.

[...]
> And, frankly, it's not up to the user when using code they didn't create.
> You have to deal with it.  Or code something yourself to do things the way
> you want.  You have the power to do that; most bioperl users don't simply
> b/c they probably don't understand the class structure and OO nature of
> Bioperl.  It's just a matter of where you want to spend your energy: dealing
> with something that interests you or fixing other's people's broken code.

My original question was essentially: does doing it my way make sense? 
And implicitly: would doing it my way be of any harm? Ie. can I go ahead 
and change how the parser reports results and groups them together? I 
don't think it will involve an API change, but the results it generates 
will obviously be very different.


From bix at sendu.me.uk  Thu Jun 29 03:54:50 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:54:50 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <44A3874A.9040803@sendu.me.uk>

Jason Stajich wrote:
>
> Feel free to propose an alternative implement for parser as you see  
> fit as long as the API is preserved.  you can contibute a new  
> SearchIO plugin and HMMERSearchResultListener to deal with it - or [snip]

What's the thinking behind the way SearchIOs work? Is it necessary or 
desirable to always do it with events and listeners? Or is it enough to 
simply return a ResultI regardless of how you made it?


From cjfields at uiuc.edu  Thu Jun 29 09:27:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 08:27:00 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <A1A48284-6FD6-4898-9438-DEEB105496EC@uiuc.edu>


On Jun 29, 2006, at 2:02 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> Personally, I don't think right now is the time to think about  
>> refactoring
>> this particular module, esp. since I find it essentially works.  I  
>> believe
>> that energy is better spent elsewhere, such as SeqIO::genbank/ 
>> swiss/embl for
>> instance, or refactoring SearchIO::blast etc to use hashes instead of
>> objects to speed things up.  Or creating something yourself.  Or  
>> doing what
>> you currently are doing (Bio::Map).  In other words, areas where  
>> use is
>> high, code is aging, and refactoring is more productive.
>
> Hmmer parsing happens to be important to me, in fact vital for my  
> work.
> I've been using my own parser up till now, so didn't know what the
> Bioperl one was like. I'd like to use Bioperl for more things,
> preferably everything.

We're not deterring you from setting up your own parser, something  
both Jason and I suggested.  I just don't see what the major issue  
is; hmmerpfam results never really contain the same number of hits  
per query that BLAST does (I get at the very most 30-40 and that is  
usually based on repeats).  I believe the best place to spend this  
energy first and foremost is fixing the bug.

>> I'll add that I'm not trying to dissuade you from trying to build  
>> your own
>> variation of a SearchIO HMMER parser; by all means go ahead.  The  
>> above is
>> how I feel.  You can build your own parser to do what you want;  
>> you can even
>> base it off the current SearchIO HMMER parser and see if you can  
>> set it up
>> to give you the results you want, using a different handler and so  
>> on.  Just
>> don't break the API or modify the current code based strictly on  
>> what your
>> opinion of how it should work is.  It was probably set up this way  
>> for a
>> particular reason.
>
> Well, I don't like the idea of there being multiple SearchIO  
> parsers for
> the same thing.

See, here's the thing: if the community-at-large decides to use your  
version of the parser then, by default it will become the only HMMER  
SearchIO parser and we'll deprecate the old one.  I just don't think  
this is the way I would go about it.  Jason has mentioned that object  
instantiation is a bigger issue with parsing (speed) than anything  
else; why not, if you plan on doing this, set up a Handler to return  
hashes, or do it completely under-the-hood?  Have it be the 'new,  
faster way to run SearchIO.'  Don't rehash (pardon the bad pun) the  
way things were esp. when proposals are out there to improve the  
toolkit.

> [...]
>> And, frankly, it's not up to the user when using code they didn't  
>> create.
>> You have to deal with it.  Or code something yourself to do things  
>> the way
>> you want.  You have the power to do that; most bioperl users don't  
>> simply
>> b/c they probably don't understand the class structure and OO  
>> nature of
>> Bioperl.  It's just a matter of where you want to spend your  
>> energy: dealing
>> with something that interests you or fixing other's people's  
>> broken code.
>
> My original question was essentially: does doing it my way make sense?
> And implicitly: would doing it my way be of any harm? Ie. can I go  
> ahead
> and change how the parser reports results and groups them together? I
> don't think it will involve an API change, but the results it  
> generates
> will obviously be very different.

And my point is that both ways make sense, at least to me (and it  
sounds like to Jason though I could be wrong).  Again, create a new  
version of the parser based on what you want to do and accomplish.   
Don't just modify something the community at-large uses based on your  
whims. Make the changes to a new module and let the community  
decide.  As an example, BioPerl, for the longest time, had several  
BLAST parsers; we directed everybody over to SearchIO and most people  
seem to like it; hence the others are deprecated.

And changing the results returned by some could be considered  
changing the API or a bug.  If someone using this module has an  
automated pipeline set up for annotation using Pfam, hmmpfam,  
Bioperl, and a database, and their setup expects single model/domain  
pairs, yeah, your changes will break that.  Maybe small,  
inconsequential even, but it's possible (and even true; many genome  
annotation pipelines are set up exactly how I describe).

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ClarkeW at AGR.GC.CA  Thu Jun 29 10:31:14 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 10:31:14 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>

Hi all, 

 
Recently I was working on a project which required some manipulation of
Quality files. I may be wrong in this, but I don't believe that there is
a Quality format for Bio:SeqIO. If there is, someone could point me in
the right direction as I could write a much nicer script then what I
currently have, if not I was wondering if anyone here has any use for
such a thing. I am pretty new to developing but would be willing to give
it a shot, as I feel that for all the use I get out of BioPerl with no
thanks to anyone who spent time on writing something I used, I could try
and contribute my limited amount. Any comments would be appreciated, and
don't be afraid to tell me this is a lost cause. I realize that quality
files tend to be less important than FASTA sequence files. I will give
you a little information on me so that you know what to expect/what I am
working with.

I am a fourth year bioinformatics student, and am currently working as a
summer student. I have some limited experience with writing perl modules
and test scripts. Mostly I write perl to do specific jobs, that I or
someone else has come up with to fill some immediate need of the
company. I am interested in most things bioinformatics/computer
sci/biology and am hoping to do Graduate studies when I finish my
degree.

Well that's enough for now, if you have any comments/suggestions I would
appreciate it.

 
Cheers, Wayne


From cjfields at uiuc.edu  Thu Jun 29 10:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 09:55:16 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <001601c69b8c$08cdce70$15327e82@pyrimidine>

> Recently I was working on a project which required some manipulation of
> Quality files. I may be wrong in this, but I don't believe that there is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could try
> and contribute my limited amount. Any comments would be appreciated, and
> don't be afraid to tell me this is a lost cause. I realize that quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in BioPerl,
which is the sequence implementation for sequences with quality data and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as a
> summer student. I have some limited experience with writing perl modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Thu Jun 29 11:01:52 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 11:01:52 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>


Thanks Chris, 

I don't know how I didn't come up with this before. Can I use
Bio::SeqIO::qual as follows?

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Thursday, June 29, 2006 8:55 AM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] BioPerl and quality files

> Recently I was working on a project which required some manipulation
of
> Quality files. I may be wrong in this, but I don't believe that there
is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to
give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could
try
> and contribute my limited amount. Any comments would be appreciated,
and
> don't be afraid to tell me this is a lost cause. I realize that
quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I
am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in
BioPerl,
which is the sequence implementation for sequences with quality data
and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as
a
> summer student. I have some limited experience with writing perl
modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I
would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 11:21:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:21:21 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>
Message-ID: <002001c69b8f$ad754450$15327e82@pyrimidine>

It should work that way, yes:  

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

# the below should return a Bio::Seq::Quality object
my $seq = $in->next_seq; 

You might want to check the other SeqIO modules as well depending on your
format:

...

Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

...

Chris

> Thanks Chris,
> 
> I don't know how I didn't come up with this before. Can I use
> Bio::SeqIO::qual as follows?
> 
> my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');
> 
> Cheers, Wayne
...


From cjfields at uiuc.edu  Thu Jun 29 11:23:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:23:20 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
Message-ID: <002101c69b8f$f48bd070$15327e82@pyrimidine>

Sendu, 

The HOWTO explains everything:

http://www.bioperl.org/wiki/HOWTO:SearchIO

under "Implementation."  I learned this the hard way when I started working
on SearchIO::blast and wondered why it had so many *_element methods.  

Yes, you will need an EventHandler if you implement SearchIO; the
EventHandler should implement Bio::SearchIO::EventHandlerI interface.  You
might not need one that returns objects though (i.e. it could return
hashes).  And you could possibly get around the event handler somehow,
though if you plan on doing that, why not just work on Bio::Tools::Hmmpfam
as an alternative parser?  We've had other BLAST parsers before
(Bio::Tools::BPLite comes to mind); if they aren't maintained and there is a
viable alternative they can be deprecated.  Hence the reason I mentioned
working on your own version of SearchIO::hmmer; if that module becomes most
prevalently used we can deprecate the older version.

The idea that a SearchIO plugin should act like a SAX parser is based on the
fact that many files being parsed are quite large, so it would be nice to
have everything parsed as a stream (on-the-go) as opposed to preprocessing
everything into an object hierarchy (which can be very memory intensive for
large files).  Whether this is done in practice in all SearchIO modules is
another thing; it may be based upon what particular fixes were made over
time or the contributor's intentions.  

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 29, 2006 2:55 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
> 
> Jason Stajich wrote:
> >
> > Feel free to propose an alternative implement for parser as you see
> > fit as long as the API is preserved.  you can contibute a new
> > SearchIO plugin and HMMERSearchResultListener to deal with it - or
> [snip]
> 
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Thu Jun 29 11:05:54 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 29 Jun 2006 16:05:54 +0100
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <44A3EC52.7030502@colibase.bham.ac.uk>

Hi Wayne.

I think Bio::SeqIO::qual is what you are looking for.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From jason at bioperl.org  Thu Jun 29 14:04:12 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 29 Jun 2006 14:04:12 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
Message-ID: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>

however you want - the idea of listeners at the time was to make it  
more SAX like so we could throw away events we didn't want and speed  
up the whole system when there was some idea of how you wanted the  
data filtered.  That may have been too much wishful thinking and I  
just couldn't do it alone.


On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> Feel free to propose an alternative implement for parser as you see
>> fit as long as the API is preserved.  you can contibute a new
>> SearchIO plugin and HMMERSearchResultListener to deal with it - or  
>> [snip]
>
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it  
> enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From prettyblondegirl222 at yahoo.com  Thu Jun 29 14:23:56 2006
From: prettyblondegirl222 at yahoo.com (S S)
Date: Thu, 29 Jun 2006 11:23:56 -0700 (PDT)
Subject: [Bioperl-l] TAKE ME OFF
Message-ID: <20060629182356.93810.qmail@web51305.mail.yahoo.com>

  
---------------------------------
How low will we go? Check out Yahoo! Messenger?s low  PC-to-Phone call rates.


From cjfields at uiuc.edu  Thu Jun 29 23:53:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 22:53:22 -0500
Subject: [Bioperl-l] SearchIO::blast, was Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
	<166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
Message-ID: <7511BE75-3A87-4E78-BFEA-2B38210BAD85@uiuc.edu>

If we can work around the listener/handler that'll definitely speed  
things up.  I was thinking about tackling the SearchIO::blast parser  
next, refactoring it to use hashes as a separate plugin module; if I  
don't need the handler for that then it'll speed things up a bit.

Chris

On Jun 29, 2006, at 1:04 PM, Jason Stajich wrote:

> however you want - the idea of listeners at the time was to make it
> more SAX like so we could throw away events we didn't want and speed
> up the whole system when there was some idea of how you wanted the
> data filtered.  That may have been too much wishful thinking and I
> just couldn't do it alone.
>
>
> On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:
>
>> Jason Stajich wrote:
>>>
>>> Feel free to propose an alternative implement for parser as you see
>>> fit as long as the API is preserved.  you can contibute a new
>>> SearchIO plugin and HMMERSearchResultListener to deal with it - or
>>> [snip]
>>
>> What's the thinking behind the way SearchIOs work? Is it necessary or
>> desirable to always do it with events and listeners? Or is it
>> enough to
>> simply return a ResultI regardless of how you made it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Fri Jun 30 08:45:15 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 30 Jun 2006 14:45:15 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>

Hi,

>My original question was essentially: does doing it my way make sense?
With respect to Sendu's points, I can only say that a colleague
(developer) and I were surprised that the HMMer parser did not group
the hits as the blast parser does, in "Hit" and "Hsp".
When we realized how hmmer parsing worked we continued with to use it
but used a check for multiple hits of one domain on 1 query sequence
(e.g. in hmmpfam).

Regards,
Bernd


From jason at bioperl.org  Fri Jun 30 10:05:01 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Jun 2006 10:05:01 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
Message-ID: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>

I understand the confusion and it was the intention of having HSPs  
grouped together under the same Hit initialy just like BLAST reports  
- but somehow in the bug-fix-cycle the way to deal with the fact that  
"HSPs" aren't ordered by the overall Hit table led to this design  
decision - the problem before was something with the ordering, but I  
must admit to not being able to remember what specifically was the  
problem t I can't really remember why I changed things to do this.   
Does 1.4 actually do it the way you expect?

Again, more user feedback is definitely critical to make these tools  
useful to everyone so please don't bashful about reporting your  
preferences.

-j

On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:

> Hi,
>
>> My original question was essentially: does doing it my way make  
>> sense?
> With respect to Sendu's points, I can only say that a colleague
> (developer) and I were surprised that the HMMer parser did not group
> the hits as the blast parser does, in "Hit" and "Hsp".
> When we realized how hmmer parsing worked we continued with to use it
> but used a check for multiple hits of one domain on 1 query sequence
> (e.g. in hmmpfam).
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Fri Jun 30 11:56:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 10:56:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
Message-ID: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>

It may have been just simpler to have it be one HSP (domain) per Hit  
(model) as that's how the reports are generated.  My reasoning was  
that using the one domain per model made sense based on what you are  
actually trying to do, which is annotate the sequence based on the  
order the domain appears.  Most others may not view it that way,  
which is fine.  One can always gather the relevant HSP's, convert to  
seqfeatures, then sort them if order is important, I suppose.

I would say, if the overall consensus is to modify it to have  
multiple domain hits per model (similar to BLAST) then Sendu should  
go ahead and make those changes then announce it on the list so no  
one can gripe about it later.  My main concern was not changing  
things so dramatically that it'll break for someone, but seeing as  
we've had a lengthy discussion about it already they should have  
piped up by now!   Well, that and trying to return everything as  
hashes as Jason suggested.  From looking at SearchIO::hmmer we need  
to make sure that both hmmsearch and hmmpfam work the same way (looks  
like they have different sections) and that the reported bug about  
missing hits (Bug 2036) is fixed as well.

Chris

On Jun 30, 2006, at 9:05 AM, Jason Stajich wrote:

> I understand the confusion and it was the intention of having HSPs
> grouped together under the same Hit initialy just like BLAST reports
> - but somehow in the bug-fix-cycle the way to deal with the fact that
> "HSPs" aren't ordered by the overall Hit table led to this design
> decision - the problem before was something with the ordering, but I
> must admit to not being able to remember what specifically was the
> problem t I can't really remember why I changed things to do this.
> Does 1.4 actually do it the way you expect?
>
> Again, more user feedback is definitely critical to make these tools
> useful to everyone so please don't bashful about reporting your
> preferences.
>
> -j
>
> On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:
>
>> Hi,
>>
>>> My original question was essentially: does doing it my way make
>>> sense?
>> With respect to Sendu's points, I can only say that a colleague
>> (developer) and I were surprised that the HMMer parser did not group
>> the hits as the blast parser does, in "Hit" and "Hsp".
>> When we realized how hmmer parsing worked we continued with to use it
>> but used a check for multiple hits of one domain on 1 query sequence
>> (e.g. in hmmpfam).
>>
>> Regards,
>> Bernd
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Jun 30 12:14:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Jun 2006 17:14:05 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
Message-ID: <44A54DCD.3050708@sendu.me.uk>

Chris Fields wrote:
> It may have been just simpler to have it be one HSP (domain) per Hit 
> (model) as that's how the reports are generated.  My reasoning was that 
> using the one domain per model made sense based on what you are actually 
> trying to do, which is annotate the sequence based on the order the 
> domain appears.  Most others may not view it that way, which is fine.  
> One can always gather the relevant HSP's, convert to seqfeatures, then 
> sort them if order is important, I suppose.
> 
> I would say, if the overall consensus is to modify it to have multiple 
> domain hits per model (similar to BLAST) then Sendu should go ahead and 
> make those changes then announce it on the list so no one can gripe 
> about it later.  My main concern was not changing things so dramatically 
> that it'll break for someone

Going on your earlier suggestion, I was thinking about making 
SearchIO::hmmpfam instead, which would get used if you set the format to 
'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I 
suppose I would make a SearchIO::hmmsearch as well, if necessary.


[...]
> that the reported bug about missing hits (Bug 2036) is fixed as well.

However, having never made a SearchIO plugin before, it will be some 
time before I get my head around it. I'll want to make one the current 
HOWTO:SearchIO way before I can think about doing it a better way 
(hashes) as well. So I can say I'll make a move on this at some point in 
the future, but if someone wants to fix Bug 2036 in the mean time, they 
are welcome to. Again as suggested, my priority is Bio::Map right now.


From rmb32 at cornell.edu  Fri Jun 30 13:01:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 10:01:38 -0700
Subject: [Bioperl-l] parser for GeneSeqer
Message-ID: <44A558F2.2050304@cornell.edu>

Hi all,

I find myself needing a parser for GeneSeqer output, so I'm writing one 
(which I will submit for your consideration when it's working).  In a 
nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of 
ESTs to genomic sequence, then using those alignments to predict where 
in the genomic sequence the genes are.  So really what you get from this 
is a bunch of hierarchical features.

I don't really know where I should put it in the bioperl hierarchy 
though.  Probably FeatureIO?

And what's the current fashion for objects it should emit?  
Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 30 13:43:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:43:56 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A54DCD.3050708@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
	<44A54DCD.3050708@sendu.me.uk>
Message-ID: <E2C6F66F-9B85-42D3-B2A0-BD7C8B222572@uiuc.edu>

I'll try looking at it this weekend.  A suggested workaround is to  
either try setting -A for no alignments or setting it to a high  
number to retrieve all of them.  It's pretty serious as the error  
silently dumps those domains, so for those using automated annotation  
pipelines would miss it unless they are also checking the raw output.

You could design a SearchIO::hmmpfam parser then expand it to take in  
hmmsearch output at a later point, or keep them separate.  I like the  
idea of having modules that are more specific about what they parse;  
seems at some point you reach serious code bloat and maintenance  
becomes an issue.  Look at SearchIO::blast; it parses various text  
BLAST output very well but with some serious obfuscation.  Just don't  
know how productive it would be to separate out the PSI-BLAST and  
bl2seq stuff since they are pretty close to a standard BLAST  
report... oh well.

To Jason : good luck on your move.  Drop  us a line here to let us  
know everything went well.

Chris

On Jun 30, 2006, at 11:14 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> It may have been just simpler to have it be one HSP (domain) per Hit
>> (model) as that's how the reports are generated.  My reasoning was  
>> that
>> using the one domain per model made sense based on what you are  
>> actually
>> trying to do, which is annotate the sequence based on the order the
>> domain appears.  Most others may not view it that way, which is fine.
>> One can always gather the relevant HSP's, convert to seqfeatures,  
>> then
>> sort them if order is important, I suppose.
>>
>> I would say, if the overall consensus is to modify it to have  
>> multiple
>> domain hits per model (similar to BLAST) then Sendu should go  
>> ahead and
>> make those changes then announce it on the list so no one can gripe
>> about it later.  My main concern was not changing things so  
>> dramatically
>> that it'll break for someone
>
> Going on your earlier suggestion, I was thinking about making
> SearchIO::hmmpfam instead, which would get used if you set the  
> format to
> 'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I
> suppose I would make a SearchIO::hmmsearch as well, if necessary.
>
>
> [...]
>> that the reported bug about missing hits (Bug 2036) is fixed as well.
>
> However, having never made a SearchIO plugin before, it will be some
> time before I get my head around it. I'll want to make one the current
> HOWTO:SearchIO way before I can think about doing it a better way
> (hashes) as well. So I can say I'll make a move on this at some  
> point in
> the future, but if someone wants to fix Bug 2036 in the mean time,  
> they
> are welcome to. Again as suggested, my priority is Bio::Map right now.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Jun 30 13:54:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:54:23 -0500
Subject: [Bioperl-l] parser for GeneSeqer
In-Reply-To: <44A558F2.2050304@cornell.edu>
References: <44A558F2.2050304@cornell.edu>
Message-ID: <2FB066C7-12E6-46D8-8F4A-BD096BE2A0CA@uiuc.edu>

If you plan on generating seqfeatures from this output you could  
check out the Bio::Tools core modules for examples.  There are a few  
there that take program output and convert them to  
Bio::SeqFeature::Generic objects, including Bio::Tools:RNAMotif and  
Bio::Tools::tRNAscanSE.  If alignments are involved you might want  
something like Bio::SeqFeature::FeaturePair.  Not sure about using  
the SeqFeature::Annotation or others; I thought that the some of the  
Annotation/Annotatable stuff might be changing soon but I may be wrong.

Chris

On Jun 30, 2006, at 12:01 PM, Robert Buels wrote:

> Hi all,
>
> I find myself needing a parser for GeneSeqer output, so I'm writing  
> one
> (which I will submit for your consideration when it's working).  In a
> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> ESTs to genomic sequence, then using those alignments to predict where
> in the genomic sequence the genes are.  So really what you get from  
> this
> is a bunch of hierarchical features.
>
> I don't really know where I should put it in the bioperl hierarchy
> though.  Probably FeatureIO?
>
> And what's the current fashion for objects it should emit?
> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>
> Rob
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From rmb32 at cornell.edu  Fri Jun 30 15:32:11 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 12:32:11 -0700
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
In-Reply-To: <29201430510651801@webmail.iastate.edu>
References: <29201430510651801@webmail.iastate.edu>
Message-ID: <44A57C3B.8040808@cornell.edu>

Aha!  Isn't it amazing what gets revealed when you just get off your 
butt and ask on the mailing list.

I'll look at that code straightaway.  The concept is quite attractive to 
me, since GenomeThreader is the next program that I'm going to be 
integrating into my analysis stuff.  Unfortunately, (I am under the 
impression that) my GeneSeqer parser is almost finished.

This brings us to the next question, what about parsing the 
GenomeThreader XML?  Would be lovely to have a Bioperl interface for 
that.  Is there some code floating about for that too?

Rob

Michael E Sparks wrote:
> Hi Rob,
>
> For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
>  You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/
>
> There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
> an XML format also used by the GenomeThreader spliced alignment program, whose
> schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
> 0README in the above directory will give you an overview of what tools I've made
> available.  Hope you find it useful!
>
> Regards,
> Michael
>
> --
> Thanks,
> Michael E Sparks
> Graduate Assistant, Brendel Lab
> 2128 Molecular Biology Building
> Iowa State University
> Ames, IA 50011-3260
> 1-515-294-4063
> http://www.public.iastate.edu/~mespar1/
>
>
> Forwarded Message:
>   
>> To: <plantgdb at iastate.edu>
>> From: "Shannon D Schlueter" <sds at iastate.edu>
>> Subject: FW: [Bioperl-l] parser for GeneSeqer
>> Date: Fri, 30 Jun 2006 13:01:46 -0500
>> -----
>>     
>>> Date: Fri, 30 Jun 2006 10:01:38 -0700
>>> From: Robert Buels <rmb32 at cornell.edu>
>>> User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
>>> To: bioperl-l at bioperl.org
>>> Subject: [Bioperl-l] parser for GeneSeqer
>>> Sender: bioperl-l-bounces at lists.open-bio.org
>>>
>>> Hi all,
>>>
>>> I find myself needing a parser for GeneSeqer output, so I'm writing one
>>> (which I will submit for your consideration when it's working).  In a
>>> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
>>> ESTs to genomic sequence, then using those alignments to predict where
>>> in the genomic sequence the genes are.  So really what you get from this
>>> is a bunch of hierarchical features.
>>>
>>> I don't really know where I should put it in the bioperl hierarchy
>>> though.  Probably FeatureIO?
>>>
>>> And what's the current fashion for objects it should emit? 
>>> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>>>
>>> Rob
>>>
>>> --
>>> Robert Buels
>>> SGN Bioinformatics Analyst
>>> 252A Emerson Hall, Cornell University
>>> Ithaca, NY  14853
>>> Tel: 503-889-8539
>>> rmb32 at cornell.edu
>>> http://www.sgn.cornell.edu
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>
>
>
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From mespar1 at iastate.edu  Fri Jun 30 15:20:29 2006
From: mespar1 at iastate.edu (Michael E Sparks)
Date: Fri, 30 Jun 2006 14:20:29 -0500 (CDT)
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
Message-ID: <29201430510651801@webmail.iastate.edu>

Hi Rob,

For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
 You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/

There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
an XML format also used by the GenomeThreader spliced alignment program, whose
schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
0README in the above directory will give you an overview of what tools I've made
available.  Hope you find it useful!

Regards,
Michael

--
Thanks,
Michael E Sparks
Graduate Assistant, Brendel Lab
2128 Molecular Biology Building
Iowa State University
Ames, IA 50011-3260
1-515-294-4063
http://www.public.iastate.edu/~mespar1/


Forwarded Message:
> To: <plantgdb at iastate.edu>
> From: "Shannon D Schlueter" <sds at iastate.edu>
> Subject: FW: [Bioperl-l] parser for GeneSeqer
> Date: Fri, 30 Jun 2006 13:01:46 -0500
> -----
> >Date: Fri, 30 Jun 2006 10:01:38 -0700
> >From: Robert Buels <rmb32 at cornell.edu>
> >User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
> >To: bioperl-l at bioperl.org
> >Subject: [Bioperl-l] parser for GeneSeqer
> >Sender: bioperl-l-bounces at lists.open-bio.org
> >
> >Hi all,
> >
> >I find myself needing a parser for GeneSeqer output, so I'm writing one
> >(which I will submit for your consideration when it's working).  In a
> >nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> >ESTs to genomic sequence, then using those alignments to predict where
> >in the genomic sequence the genes are.  So really what you get from this
> >is a bunch of hierarchical features.
> >
> >I don't really know where I should put it in the bioperl hierarchy
> >though.  Probably FeatureIO?
> >
> >And what's the current fashion for objects it should emit? 
> >Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
> >
> >Rob
> >
> >--
> >Robert Buels
> >SGN Bioinformatics Analyst
> >252A Emerson Hall, Cornell University
> >Ithaca, NY  14853
> >Tel: 503-889-8539
> >rmb32 at cornell.edu
> >http://www.sgn.cornell.edu
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jay at jays.net  Thu Jun  1 00:58:29 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 23:58:29 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000001c68528$d1b6ec10$15327e82@pyrimidine>
References: <000001c68528$d1b6ec10$15327e82@pyrimidine>
Message-ID: <447E73F5.40403@jays.net>

Chris Fields wrote:
>> Is the doc/ tree being abandoned?
> 
> Most docs have been moved over to the wiki, which generates nicely formatted
> docs for printing.

Oh. Well, if we've already jumped off that cliff I say we just go for it. Move everything to the wiki, nuke the empty CVS dirs, and call it good.

I hereby volunteer to strip the code out of bptutorial.pl and put it wherever. Where should I put it when I'm done? (examples/tutorial.pl?)

>> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> I believe the reasoning is along these lines: HOWTO's are focused in on
> specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
> has greater detail. The tutorials are more broadly based (sort of a general
> bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
> has additional information over the tutorial (at least it did the last time
> I looked at the tutorial, which has been a while).

Huh. Sounds like a subtle line. I might suggest picking one name or the other and shuffling everything into one list on the wiki. 

>> It's hard for me to dive into a wiki lifestyle for the huge documentation
>> pillars since it can't ever get back into the distro... (can it?)  Small,
>> throw away stuff is great for the wiki, but huge, established, thoughtful,
>> long documents should be left in the distro? Present (and searchable) on
>> the wiki but static?
> 
> Hence the problem we face now.  It is something we need to really look into
> before adding too much more to the wiki.  IMHO, I think we should have very
> little information directly in the distribution itself since it's already
> quite large.  It's almost as easy to have a bare-bones INSTALL file, which
> would point to the wiki for additional information.  But I may be very much
> alone in that train of thought ; >

If the doc/ tree has already moved then I guess I just joined the all-wiki camp. I assume it stores full revision history and we have backups in case somebody blows something up. Any system is better than multiple systems breeding inconsistencies. Keep the spammers/clueless out and/or quickly remove their nonsense and I'm pro-wiki. Revisions email reviewers?

>> Sick of my endless questions yet? -grin-
> 
> Not really.

Give it a few more posts. It'll come. :)

j
Current toy: http://openlab.jays.net/


From ULNJUJERYDIX at spammotel.com  Thu Jun  1 02:53:46 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 1 Jun 2006 14:53:46 +0800
Subject: [Bioperl-l] **Fwd: Re: SOLVED ver2 Bio::Graphics::Panel make
	ruler have neg values
Message-ID: <5b6410e0605312353l1fbf8256hc8a2b85d0f0ac199@mail.gmail.com>

 Thanks Lincoln! Your code worked in ver 1.4 as well.
think the prob i had was due to me just adapting from the blast output
tutorial so i had something like
my $feature =
Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end,
-source=>$source);

and maybe also because I didn't have the + sign for the numbers

on a side note, I think that the ability to offset the ruler might prove
useful for some applications. Will spend more time to understand the
$relative_coords_offset option in the arrow.pm when i can afford to, and
perhaps help contribute an offset option to arrow.pm

cheers
kevin

Content-Disposition: inline
>
> Hi Kevin,
>
> Since you are modifying the Panel.pm source code, why don't you just go
> ahead
> and use the current Bio::Graphics development tree? Since 1.5.1 it
> supports
> negative coordinates. Here's an illustration:
>
> #!/usr/bin/perl
>
> use strict;
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
>
> my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
> my $feature =
> Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
> my $panel   = Bio::Graphics::Panel->new(-start=> -200,
>                                          -end  => +200,
>                                          -width=>800,
>                                          -pad_left=>10,
>                                          -pad_right=>10);
> $panel->add_track($whole,
>                    -glyph=>'arrow',
>                    -double=>1,
>                    -tick=>2);
> $panel->add_track($feature,
>                   -glyph=>'box',
>                    -stranded=>1);
> print $panel->png;
>
> exit 0;
>
> The resulting image is attached.
>
> Lincoln
>
> On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> > I am so sorry for the truncated email accidentally hit reply.
> > if anyone is interested i have opted to change
> >
> > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> > in linux its
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
> >
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
> >
> > to
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
> >
> > just  for this one-off use.
> >
> >
> >
> > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> > option for coords offset?
> >     my $relative_coords_offset =
> $self->option('relative_coords_offset');
> >     $relative_coords_offset    = 1 unless defined
> $relative_coords_offset;
> > but entering the option -relative_coords_offset=>1000 in the arrow
> glyphs
> > didn't do anything...
> >
> >
> >
> > Hi!
> >
> > > oh it was in a slightly different header asking about the create image
> > > map feature.
> > > I am using the stable version 1.4 of bioperl now. In any case I have
> not
> > > added the sequence as a feature annotated seq. as I already have the
> bp
> > > where the TF binds (in 1-1050 numberings) so what I did was to just
> add
> > > graded segments based on the position.
> > > I saw that there is a scale function for the arrow glyp however, it is
> a
> > > multiply function, can it be hacked to take in a offset value (ie
> minus
> > > the
> > > scale by 1000?)
> > >
> > > cheers
> > > kevin
> > >
> > >
> > > Hi,
> > >
> > > > For some reason I didn't see the first posting on this. In current
> > >
> > > bioperl
> > >
> > > > live, the ruler can have negative numberings - I use this routinely.
> > > > You need
> > > > to create a feature that starts in negative coordinates. What is
> > >
> > > happening
> > >
> > > > to
> > > > you when you try this?
> > > >
> > > > Lincoln
> > > >
> > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > > Hi
> > > > > thanks for the help offered thus far!
> > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer
> seq
> > > >
> > > > using
> > > >
> > > > > bioperl. therefore i was asked to make the numberings as such
> (-1000)
> > >
> > > is
> > >
> > > > > there any way at all to do this in bioperl without changing the
> .pm
> > > >
> > > > file?
> > > >
> > > > > thanks guys..
> > > > > kevin
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > Lincoln D. Stein
> > > > Cold Spring Harbor Laboratory
> > > > 1 Bungtown Road
> > > > Cold Spring Harbor, NY 11724
> > > > (516) 367-8380 (voice)
> > > > (516) 367-8389 (fax)
> > > > FOR URGENT MESSAGES & SCHEDULING,
> > > > PLEASE CONTACT MY ASSISTANT,
> > > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 03:59:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 08:59:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <447E9E59.6090709@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
>> Just looking for all return undef;s isn't enough. It's entirely possible
>> to do something like:
>>
>> my $return_value;
>> {
>>    # do something that assigns to return_value on success
>>    # on failure, just do nothing
>> }
>> return $return_value;
> 
> Agreed, though looking for these is obviously much harder.  
> 
> The way to get around those is:
> 
> return $return_value if $return_value;
> return;
> 
> which I've seen used in a number of get/set methods. 

Though if anyone is using that cookie-cutter/macro style, that's much 
worse because now you can't return 0.

return $return_value if defined($return_value);
return;

In any case, it burns the eyes. I share Lincoln's POV. I also fully 
understand your point about not being able to trust the docs 
(Bio::Map::Marker...). But the solution is to change the code so they 
match the docs when the docs make sense, not change the code so that it 
no longer matches the docs[*]. In a massive OO project like bioperl the 
users need to be able to rely on the docs. You can't turn around and say 
"you've used this method for years, but now I'm changing how it works 
because you might have used the method incorrectly". Ideally any code 
changes add functionality or improve it's working without affecting code 
  that uses the method correctly according to its old docs.


* though if there isn't time/interest in changing the code, and the 
method never worked as per the docs, then by all means change the docs 
to avoid confusion - just don't change the docs on a method that worked 
according to the docs, because then you can assume people use the method 
and will be affected by the change


From lstein at cshl.edu  Thu Jun  1 11:40:38 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 1 Jun 2006 11:40:38 -0400
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
Message-ID: <200606011140.38726.lstein@cshl.edu>

Hi,

The border is coming from the HTML <img. To get rid of it, set -border=>0 in 
the img() call.

Lincoln


On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> Hello everybody,
>
> does anybody know how to remove the background color of the Panel.
> Currently, I am not adding anything to it, so I can troubleshot the
> problem, and I have tried setting up
> all color attributes I could find to the panel, but no luck. Whatever I do,
> I get the BLUE border of the panel.
>
> Has anybody faced the same problem?
>
> Thanks in advance,
>
> Jelena
>
> And here is the code I am currently using:
>
> ---------------------------------------------------------------------------
>-------------------------------- my $panel =
>     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
>                               -width => 800,
>                               -pad_left => 10,
>                               -pad_right => 10,
>                               -key_color => 'white',
>                               -bgcolor => 'white',
>                               -gridcolor=>'black',
>                               -fgcolor => 'black',
>                               -grid => 0,
>                               );
>    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
>      -url  => '/tmpimages');
>    #make clickable image
>    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
>    print $map;
>
> ---------------------------------------------------------------------------
>--------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From arareko at campus.iztacala.unam.mx  Thu Jun  1 12:13:05 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:13:05 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A3BD0B.8A2C%osborne1@optonline.net>
References: <C0A3BD0B.8A2C%osborne1@optonline.net>
Message-ID: <447F1211.2010705@campus.iztacala.unam.mx>

You're right Brian. I also think that the text/POD part is more 
important than the script. Since we're more into moving everything to 
the Wiki, I believe this would be the right approach.

Moving the script part of the tutorial into the examples/ directory is 
also a nice idea.

Mauricio.

Brian Osborne wrote:
> Mauricio,
> 
> Bernd didn't say he want the _script_ in the package, he said he wanted
> bptutorial.pl in the package, not indicating whether it was the
> documentation or the script that was important. It's my suspicion that the
> documentation is more important than the script, and this is what my last
> letter was asking, in part: is the script important? Or can we focus on the
> text/POD part?
> 
> Brian O.
> 
> 
> On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> <arareko at campus.iztacala.unam.mx> wrote:
> 
>> I agree with what Bernd Web said in another reply. For some people will
>> be nice to still be able to run the script from the codebase and
>> interact with it.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 12:20:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:20:34 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1211.2010705@campus.iztacala.unam.mx>
Message-ID: <000b01c68597$5026bdf0$15327e82@pyrimidine>

Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
/scripts or /examples then?

Also, what do we do about similar situation with other docs moved to the
wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
distribution pointing out the wiki docs instead?

Chris

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Thursday, June 01, 2006 11:13 AM
> To: Brian Osborne
> Cc: Chris Fields; bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> You're right Brian. I also think that the text/POD part is more
> important than the script. Since we're more into moving everything to
> the Wiki, I believe this would be the right approach.
> 
> Moving the script part of the tutorial into the examples/ directory is
> also a nice idea.
> 
> Mauricio.
> 
> Brian Osborne wrote:
> > Mauricio,
> >
> > Bernd didn't say he want the _script_ in the package, he said he wanted
> > bptutorial.pl in the package, not indicating whether it was the
> > documentation or the script that was important. It's my suspicion that
> the
> > documentation is more important than the script, and this is what my
> last
> > letter was asking, in part: is the script important? Or can we focus on
> the
> > text/POD part?
> >
> > Brian O.
> >
> >
> > On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> > <arareko at campus.iztacala.unam.mx> wrote:
> >
> >> I agree with what Bernd Web said in another reply. For some people will
> >> be nice to still be able to run the script from the codebase and
> >> interact with it.
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 12:28:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:28:38 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447E9E59.6090709@mrc-dunn.cam.ac.uk>
Message-ID: <000c01c68598$704b15d0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 2:59 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> Chris Fields wrote:
> >
> > Sendu Bala wrote:
> >> Just looking for all return undef;s isn't enough. It's entirely
> possible
> >> to do something like:
> >>
> >> my $return_value;
> >> {
> >>    # do something that assigns to return_value on success
> >>    # on failure, just do nothing
> >> }
> >> return $return_value;
> >
> > Agreed, though looking for these is obviously much harder.
> >
> > The way to get around those is:
> >
> > return $return_value if $return_value;
> > return;
> >
> > which I've seen used in a number of get/set methods.
> 
> Though if anyone is using that cookie-cutter/macro style, that's much
> worse because now you can't return 0.
> 
> return $return_value if defined($return_value);
> return;

Makes sense.  Really, this all comes down to semantics and the context of
how the method is called and what is expected as a return value.  I suppose
it also depends on what one considers 'best practice,' which can be
subjective.  I don't want us getting into a situation in which we come
across as critiquing someone else's code w/o some valid points, i.e.
Lincoln's point about complaining.  I think that's why this thread is pretty
important, in that we're getting a broad range of opinions on the issue.

> In any case, it burns the eyes. 

Yep, I agree. 

> I share Lincoln's POV. I also fully
> understand your point about not being able to trust the docs
> (Bio::Map::Marker...). But the solution is to change the code so they
> match the docs when the docs make sense, not change the code so that it
> no longer matches the docs[*]. In a massive OO project like bioperl the

So you know, Lincoln and I both support the idea of an audit.  He also notes
(and I agree) that people will likely complain.  

Anyway, changing the code to match the docs makes sense therotically, but in
practice that doesn't always work.  Any situation where code does not behave
as expected (i.e. as described in the docs) are bugs and can be reported as
such.  The problem arises when the docs are completely wrong, as
Bio::Restriction::IO was before I made changes to it.  In many cases simple
small code changes won't work, such as when methods inherit from an
interface but don't implement all methods (so essentially are incomplete).

Hilmar made the point that we should change the docs to reflect
inconsistencies in particular plugin modules for IO classes (AlignIO has a
few modules with unimplemented write methods, and so on).  When the code
radically varies, such as in the Restriction::IO case (where none of the
write methods worked), the docs should be changed in the IO class to reflect
this.  Of course, you should also add a bit to the TO DO section of POD and
add a bit to the Project Priority List on the wiki to point this out, both
of whichI did.  It comes down to 'truth in advertising', does it do what's
expected.

> users need to be able to rely on the docs. You can't turn around and say
> "you've used this method for years, but now I'm changing how it works
> because you might have used the method incorrectly". Ideally any code

Not what I did, BTW.  The API is intact; you can still use the write methods
if you want (they throw errors just fine).  In fact, I didn't change any
methods except in one module (Restriction::IO::bairoch), where I added a
warning to the read method b/c it didn't work as expected, and I filed a bug
report.  Essentially, the only thing I changed was the docs to reflect what
the code currently can accomplish (at least until you read the TO DO).  We
already had one person email the group asking why code in the synopsis
didn't work.

Adding read and write methods to most of these modules (making the code do
what the docs reflect, in your words) is a lot of work, esp. for someone
like me unfamiliar with the class architecture and methods for those
modules.  IMHO, contributions to bioperl should accomplish what is reflected
in their docs once added to the core; if a write method hasn't been written,
then add it to the docs in a TO DO section or add a warning to the synopsis.
Don't put in the docs what you intend the code to accomplish down the road
but what it does currently.  Is that unreasonable?

Anyway, when something doesn't perform as expected (produces invalid output
or contains errors), it's considered a bug.  That includes misrepresenting
what a module does in the docs.  When we try to fix bugs we have to decipher
what the intent of the original author was from the docs and code, then try
to get it to work by modifying the code.  In extreme cases (such as
unimplemented methods) that may mean writing up entire methods from scratch.
The read and write methods for IO modules are normally the longest methods
in a class.  That's a heck of a lot of effort for something that a large
majority of us aren't interested in taking up, esp. when the submitting
author should have had everything up to spec (i.e. what's in the docs) when
adding it to the core.

> changes add functionality or improve it's working without affecting code
>   that uses the method correctly according to its old docs.
> 
> 
> * though if there isn't time/interest in changing the code, and the
> method never worked as per the docs, then by all means change the docs
> to avoid confusion - just don't change the docs on a method that worked
> according to the docs, because then you can assume people use the method
> and will be affected by the change

Again, didn't do that.  The methods in the docs either didn't exist (not
implemented) or didn't work (contained bugs).  The docs were changed b/c
they were misleading.

-chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 12:36:07 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:36:07 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
References: <C0A31929.89F9%osborne1@optonline.net> <447E48B9.4080503@jays.net>
Message-ID: <447F1777.3070906@campus.iztacala.unam.mx>

Jay Hannah wrote:
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should 
>> point to the Tutorials page instead of the Bptutorial.pl page please let 
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Nice idea, I'll check with Jason if it's possible (in mediawiki) to 
create a new Documentation sidebar to hold this 4 sections.

> (What's the conceptual difference between a HOWTO and a tutorial?)

My concept is that Tutorials cover a wider aspect of BioPerl, contrary 
to the HOWTO's which focus on a certain topic.

> Why isn't the short "Current events" just listed on the top of the "News" page?

I don't know, maybe because it was important when Jason started the Wiki 
a couple of months ago. Do you think it should be erased from the sidebar?

> Sick of my endless questions yet? -grin-
> 
> j
> 

Of course not! :)

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 12:46:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:46:03 +0100
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <000c01c68598$704b15d0$15327e82@pyrimidine>
References: <000c01c68598$704b15d0$15327e82@pyrimidine>
Message-ID: <447F19CB.4090607@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
[snip]
>> users need to be able to rely on the docs. You can't turn around and say
>> "you've used this method for years, but now I'm changing how it works
>> because you might have used the method incorrectly". Ideally any code
> 
> Not what I did, BTW.
[snip]
>> * though if there isn't time/interest in changing the code, and the
>> method never worked as per the docs, then by all means change the docs
>> to avoid confusion - just don't change the docs on a method that worked
>> according to the docs, because then you can assume people use the method
>> and will be affected by the change
> 
> Again, didn't do that.

I'm very sorry that I allowed the ambiguity, but my comments were 
certainly not directed at your recent changes to Bio::Restriction::IO. 
In fact, I put in the above * comment to exclude your changes from my 
discussion; you changed the docs because the code never did what they 
said they did (the docs were bad). That's fine (good!). My comments were 
a general point, slightly directed at the idea of changing all the 
return undef;s - changing the code so that it no longer matches the docs 
of a previously working method. That's what I think is bad. Though in 
this particular case it shouldn't make any difference at all.


From osborne1 at optonline.net  Thu Jun  1 12:46:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 12:46:02 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <C0A4920A.8A5B%osborne1@optonline.net>

Chris,

I think the INSTALL* files should be in the package, this is the de facto
convention for 99% of the packages I've ever seen. Then any Wiki page just
links to the file in CVS.

Personally I don't like the idea of maintaining a Wiki page and a file that
both say essentially the same thing (this is what has happened with the
INSTALL and INSTALL.WIN files). I've spent plenty of time merging redundant
text and removing files that contained these redundancies so it's
unfortunate to see them appear anew, sooner or later they'll get out of sync
despite best intentions. The most likely cause will be someone other than
the person who created the initial duplication (and promised to maintain
both) making a change in one of the two files.

Brian O.


On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 12:57:27 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:57:27 +0100
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
References: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <447F1C77.5040403@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
> /scripts or /examples then?
> 
> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?

Imho, something like an installation document should be there in full so 
once you've downloaded you can install without reference to anything 
else. Also, an installation document could be considered specific to the 
release version. Which is to say, it never goes out of date even if new 
versions of bioperl are released with new installation instructions - it 
applies to the installation directory it is found in.

The wiki can have the latest installation instructions, and you don't 
have to worry about keeping things synced.


From cjfields at uiuc.edu  Thu Jun  1 13:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:13:30 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1C77.5040403@mrc-dunn.cam.ac.uk>
Message-ID: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>

So basically have a minimal set of installation instructions in CVS and a
more detailed installation instructions on the wiki.  Sounds reasonable
enough but bioperl is a pretty complex distribution (lots of additional
modules required, platform-specific issues, so on).  Maybe we can come up
with a pared-down INSTALL file which combines the basic elements for
installing on UNIX/Windows/Mac/FreeBSD and points out dependencies.  

I still like the idea of just having a simple conversion from wiki->txt
direct from the web page (i.e. best of both worlds).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 11:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris Fields wrote:
> > Sounds good to me.  I guess the tutorial (post-stripping)would be moved
> to
> > /scripts or /examples then?
> >
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?
> 
> Imho, something like an installation document should be there in full so
> once you've downloaded you can install without reference to anything
> else. Also, an installation document could be considered specific to the
> release version. Which is to say, it never goes out of date even if new
> versions of bioperl are released with new installation instructions - it
> applies to the installation directory it is found in.
> 
> The wiki can have the latest installation instructions, and you don't
> have to worry about keeping things synced.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s-merchant at northwestern.edu  Thu Jun  1 13:17:32 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Thu, 1 Jun 2006 12:17:32 -0500
Subject: [Bioperl-l] Bio::OntologyIO
Message-ID: <000001c6859f$446f7fd0$c2987ca5@pc13>

Hi Everyone,

    I would like to announce the availability of an obo format parser
which can parse GO, PO, PATO and other ontology files in obo format. The
parser can be used through the Bio::OntologyIO module. Thanks to HIlamar
Lapp and Chris Mungall for their invaluable contributions.

 
Thanks,

Sohel Merchant.

 
Sohel Merchant

dictyBase

Bioinformatics Software Engineer

Center for Genetic Medicine

Northwestern University

676 St. Clair Street, Suite 1206

Chicago IL 60611

 
From cjfields at uiuc.edu  Thu Jun  1 13:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:35 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4920A.8A5B%osborne1@optonline.net>
Message-ID: <001101c685a3$53f4bf70$15327e82@pyrimidine>

I understand your point, though I think the wiki gives us an opportunity add
helpful links and use markup to help clarify things a bit more.  I have seen
several distributions which don't have INSTALL files, just simple README
with very basic instructions (Bio::ASN1::EntrezGene is one).  

I've been reluctant to mess around with the wiki Install pages too much more
b/c of syncing problems, just as you mentioned.  I will look into thing a
bit more to see if there's an easier way to go about converting wiki->text.

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 11:46 AM
> To: Chris Fields; 'Mauricio Herrera Cuadra'
> Cc: bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> I think the INSTALL* files should be in the package, this is the de facto
> convention for 99% of the packages I've ever seen. Then any Wiki page just
> links to the file in CVS.
> 
> Personally I don't like the idea of maintaining a Wiki page and a file
> that
> both say essentially the same thing (this is what has happened with the
> INSTALL and INSTALL.WIN files). I've spent plenty of time merging
> redundant
> text and removing files that contained these redundancies so it's
> unfortunate to see them appear anew, sooner or later they'll get out of
> sync
> despite best intentions. The most likely cause will be someone other than
> the person who created the initial duplication (and promised to maintain
> both) making a change in one of the two files.
> 
> Brian O.
> 
> 
> On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?


From cjfields at uiuc.edu  Thu Jun  1 13:46:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:45 -0500
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <447F19CB.4090607@mrc-dunn.cam.ac.uk>
Message-ID: <001201c685a3$59d78da0$15327e82@pyrimidine>


....

> > Again, didn't do that.
> 
> I'm very sorry that I allowed the ambiguity, but my comments were
> certainly not directed at your recent changes to Bio::Restriction::IO.
> In fact, I put in the above * comment to exclude your changes from my
> discussion; you changed the docs because the code never did what they
> said they did (the docs were bad). That's fine (good!). My comments were
> a general point, slightly directed at the idea of changing all the
> return undef;s - changing the code so that it no longer matches the docs
> of a previously working method. That's what I think is bad. Though in
> this particular case it shouldn't make any difference at all.

Agreed.  In any case, if tests have been properly set up then they should
catch problems.  This is, of course, if they are properly set up.  

Chris


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gad14 at cornell.edu  Thu Jun  1 15:10:31 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Thu, 01 Jun 2006 15:10:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447D5668.7070500@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu>
	<447BFB20.40501@mrc-dunn.cam.ac.uk>	<447C7985.9000404@cornell.edu>
	<447D5668.7070500@mrc-dunn.cam.ac.uk>
Message-ID: <447F3BA7.9030500@cornell.edu>

Problem solved, albeit, in a slightly hacky way.

I tried to make seek() work for a good long while with the SearchIO 
blast results object, but I just couldn't get it to work. (Probably b/c 
seek wants to see a genuine file handle-- not a SearchIO filehandle.) I 
used SearchIO's fh() to get the handle and could while(<$fh>) through 
the data but when I used seek($fh,0,0) to reset the cursor position in 
the handle in prep for another loop, i got an error complaining about my 
use of seek() by indicating that "SEEK" could not be found in Seekable.pm.

I concluded that it was not going to be possible and instead made an 
array if SeqFeature objects which contain all the relevant blast output 
data (i.e. the m8/hit table stuff).

It still seems unfortunate that one can't reuse the SearchIO object for 
cases when the SearchIO blast report needs to be accessed mltiple times.

Thanks for your help,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>>Thanks for your comment Sendu, it was very helpful. I think this must be 
>>what's going on.. I am using $blast_report->next_result in both 
>>subroutines. It appears that analyzing the blast results first w/ my 
>>sort subroutine empties (?) the $blast_result object so that when I try 
>>to print, there is nothing left to print. (and visa-versa when I print 
>>first then try to sort).
>>So, from the looks of things, using next_result has the effect of 
>>popping the Bio::Search::Result::ResultI objects off of the SearchIO 
>>blast report object??
> 
> 
> Not quite. It's more or less exactly like opening a file and then trying 
> to read it all twice like this:
> open(FILE, "file");
> while (<FILE>) {
>      print # prints each line in the file
> }
> while (<FILE>) {
>      print # never happens, we never enter this while loop
> }
> 
> To get the second while loop to print anything we need to say seek(FILE, 
> 0, 0) before it. Or in the first while loop store each line in an array, 
> and then make the second loop a foreach through that array.
> 
> 
> 
>>It seems I could get around this by making a copy of the blast report by 
>>setting it to another new variable...(not the most elegant solution) but 
>>I'm having trouble with this...
>>
>>If I do:
>>
>>    my $blast_report_copy = $blast_report;
>>
>>I'm just copying the reference to the SearchIO blast result, so it 
>>doesn't help me. How can I make another physical copy of this blast 
>>result object? Seems like a simple thing but how to do it is escaping me.
> 
> 
> Not really a good idea, and it may not work anyway if the object 
> contains a filehandle. But for a simple object you might recursively 
> loop through the data structure and copy each element out into a similar 
> data structure.
> 
> 
> 
>>But better yet, the way to go is to 'reset the counter,' or to find a 
>>way to look at/print/sort the results without removing data from the 
>>blast result object. How is this done though??
> 
> 
> It would be rather nice if this worked:
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);
> 
> Finally, the most sensible way (assuming bioperl provides no methods of 
> its own for this) of solving the problem is, the first time you go 
> through each next_result, next_hit and next_hsp, just store the returned 
> objects in an array of arrays of arrays. Then the second time get the 
> objects from your array structure instead of with the method calls.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jelenaob at gmail.com  Thu Jun  1 11:45:49 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Thu, 1 Jun 2006 08:45:49 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <200606011140.38726.lstein@cshl.edu>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
	<200606011140.38726.lstein@cshl.edu>
Message-ID: <5042a62b0606010845u79a5d5b3h131c4ed54f90fee3@mail.gmail.com>

Thanks Lincoln.

I figure out the solution just after I post a question, Murpfy's law ... but
my post left hanging in my email ... :(

The problem is in CGI->img method.

Instead of  print $cgi->img({-src=>$url,-usemap=>"#$mapname"});

I should have used: rint $cgi->img({-src=>$url,-usemap=>"#$mapname",
-border=>undef});

Thanks anyways for your help.

Cheers,

Jelena

On 6/1/06, Lincoln Stein <lstein at cshl.edu> wrote:
>
> Hi,
>
> The border is coming from the HTML <img. To get rid of it, set -border=>0
> in
> the img() call.
>
> Lincoln
>
>
>
> On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> > Hello everybody,
> >
> > does anybody know how to remove the background color of the Panel.
> > Currently, I am not adding anything to it, so I can troubleshot the
> > problem, and I have tried setting up
> > all color attributes I could find to the panel, but no luck. Whatever I
> do,
> > I get the BLUE border of the panel.
> >
> > Has anybody faced the same problem?
> >
> > Thanks in advance,
> >
> > Jelena
> >
> > And here is the code I am currently using:
> >
> >
> ---------------------------------------------------------------------------
> >-------------------------------- my $panel =
> >     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
> >                               -width => 800,
> >                               -pad_left => 10,
> >                               -pad_right => 10,
> >                               -key_color => 'white',
> >                               -bgcolor => 'white',
> >                               -gridcolor=>'black',
> >                               -fgcolor => 'black',
> >                               -grid => 0,
> >                               );
> >    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url'
> ,
> >      -url  => '/tmpimages');
> >    #make clickable image
> >    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
> >    print $map;
> >
> >
> ---------------------------------------------------------------------------
> >--------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>


From osborne1 at optonline.net  Thu Jun  1 15:36:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:36:27 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>
Message-ID: <C0A4B9FB.8A71%osborne1@optonline.net>

Chris,

Right - how would this be done?

Brian O.


On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> I still like the idea of just having a simple conversion from wiki->txt
> direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 15:44:13 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:44:13 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <C0A4BBCD.8A74%osborne1@optonline.net>

Jay,

You asked about the doc/ directory. The only directory I see in my
bioperl-live/doc directory is examples/, the reason this remains is that it
contains scripts and images related to the Graphics HOWTO, in theory these
could be moved to the Wiki and the examples/ directory deleted. One
explanation for why you see doc/html and all those other dirs is that you
aren't using the 'cvs -d' option (there are other explanations) when you
update.

If examples/ is removed then presumably the README can be removed and
makedoc.pl moved elsewhere.

Brian O.


On 5/31/06 9:54 PM, "Jay Hannah" <jay at jays.net> wrote:

> Brian Osborne wrote:
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
> 
> We certainly wouldn't want to try to maintain two copies, one POD one in wiki.
> That would be the worst of all options. One option that hasn't been mentioned
> yet is to keep maintenance of that in POD in the distro (leaving the cool
> runability alone), and then flag that document as unchangeable in the wiki
> with a note on top "Maintenance of this document is done in POD in the distro.
> Submit POD patches to bioperl-l and we'll re-post an updated copy to this
> wiki."
> 
> Just a thought.
> 
>> - What do we do with the script part of bptutorial.pl? It certainly could be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?
> 
> doc/faq        (empty?)
> doc/howto      
> doc/howto/examples
> doc/howto/figs (empty?)
> doc/howto/html (empty?)
> doc/howto/pdf  (empty?)
> doc/howto/sgml (empty?)
> doc/howto/txt  (empty?)
> doc/howto/xml  (empty?)
> 
> Does all that stuff officially live in and is being changed in the wiki, never
> to return to the distro?
> 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
>> Jay, looks like there are still some weird formatting issues with the
>> bptutorial wiki page, something which I ran into before when getting the
>> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
>> spaces preceding a line denotes code for some reason).  Not much you can do
>> in these cases except remove the extra spaces in those spots.  Looking good
>> though!  
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know nothing
> about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran
> off to work. :)
> 
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should
>> point to the Tutorials page instead of the Bptutorial.pl page please let
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials
> 
> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on the
> wiki but static?
> 
> Why isn't the short "Current events" just listed on the top of the "News"
> page?
> 
> Sick of my endless questions yet? -grin-
> 
> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun  1 15:47:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 14:47:40 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4B9FB.8A71%osborne1@optonline.net>
Message-ID: <001301c685b4$3dbfb820$15327e82@pyrimidine>

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 2:36 PM
> To: Chris Fields; 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> Right - how would this be done?

I'll look into a few of the wiki converters, there are a few things that
claim to convert wiki to other formats (and vice versa).  It may not be
direct, though.  I'll post anything if I figure something out.

Chris
 
> Brian O.
> 
> 
> On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > I still like the idea of just having a simple conversion from wiki->txt
> > direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 15:45:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:45:39 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E73F5.40403@jays.net>
Message-ID: <C0A4BC23.8A75%osborne1@optonline.net>

Jay,

Yes, good idea, thank you for volunteering.

Brian O.


On 6/1/06 12:58 AM, "Jay Hannah" <jay at jays.net> wrote:

> I hereby volunteer to strip the code out of bptutorial.pl and put it wherever.
> Where should I put it when I'm done? (examples/tutorial.pl?)


From hubert.prielinger at gmx.at  Thu Jun  1 16:33:45 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 01 Jun 2006 14:33:45 -0600
Subject: [Bioperl-l] remoteblast xml problem
Message-ID: <447F4F29.9070600@gmx.at>

hi,
I have the following program and it worked quite well, for retrieving 
remoteblast results in a textfile,
now I have altered it to to xml, and it didn't work anymore.....
it takes all the parameter at the commandline, submits the query, but I 
don't retrieve any results file anymore.....

it seems that it hangs in a endless loop......
the only output I get is:  $rc is not a ref! over and over..... it 
doesn't enter the else term anymore....

every help is appreciated, thanks in advance


#!/usr/bin/perl -w

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use IO::String;
use Bio::SearchIO;


#use lib qw(/usr/local/bioperl/bioperl-1.5.1);

print "Please insert database:\t";
my $db_STD = <STDIN>;
chomp $db_STD;

print "Please insert matrix:\t";
my $matrix_STD = <STDIN>;
chomp $matrix_STD;

print "Please insert count:\t";
my $count_STD = <STDIN>;
chomp $count_STD;

print "Please insert gapcosts:\t";
my $gapcosts_STD = <STDIN>;
chomp $gapcosts_STD;

my $prog   = 'blastp';
my $db     = $db_STD;           
my $e_val  = '20000';
my $matrix = $matrix_STD;               
my $wordSize = '2';


my @data;
my $line_dataArray;
my $rid;
my $count = $count_STD;           
my @params = (
  '-prog'   => $prog,
  '-data'   => $db,
  '-expect' => $e_val,
  '-MATRIX_NAME' => $matrix,
  '-readmethod' => 'xml',
  '-WORD_SIZE' => $wordSize,
);

my $seqio_obj = Bio::SeqIO->new(
  -file   => "aloneblosum62.txt",
  -format => "raw",
);

print "entering blast....";

my $xmlFactory = Bio::Tools::Run::RemoteBlast->new(@params);


$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
    $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = 
$gapcosts_STD;                   
    $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000';
     $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'ALIGNMENTS'} = '1000';
    $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML';
   

print "Blast entered successfully \n";

while ( my $query = $seqio_obj->next_seq ) {
  print "submit Sequence...just do it....\n";
 
  my $r = $xmlFactory->submit_blast($query);
  print $query->seq;
  print "\n";
 
 
#    sleep 30;

  # Wait for the reply and save the output file
  print "entering while loop for saving Output.... \n";
 
  while ( my @rids = $xmlFactory->each_rid ) {
      foreach my $rid (@rids) {
           
          my $rc = $xmlFactory->retrieve_blast($rid);
          if ( !ref($rc) ) {
              print '$rc is not a ref!', "\n";
              if ( $rc < 0 ) {
                  print "Remove rid ...\n";
                  $xmlFactory->remove_rid($rid);
              }
              # sleep 5;
          }
          else {

              print "retrieved Results successfully \n";
              print $rid;
              print "\n";
              my $filename = "comp80swiss$count.xml";
              $xmlFactory->save_output($filename);
              print "File saved successfully \n";
              my $checkinput = $xmlFactory->file;
              open(my $fh,"<$checkinput") or die $!;
              while(<$fh>){
                print;
              }
              close $fh;
              $count++;
              $xmlFactory->remove_rid($rid);
          }
      }
      print "\n";
      print "\n";

  }
}


From emmanuel.quevillon at versailles.inra.fr  Thu Jun  1 17:15:42 2006
From: emmanuel.quevillon at versailles.inra.fr (Emmanuel Quevillon)
Date: Thu, 01 Jun 2006 23:15:42 +0200
Subject: [Bioperl-l] How to submit new module?
Message-ID: <447F58FE.7020603@versailles.inra.fr>

Hi,

I just created some new parsers for TargetP, TandemRepeatFinder and
RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
like to know the differents steps procedure to submit them to BioPerl
and to be integrated in the next release (I hope)?
Is there any documentation about it?

Thanks

-- 
Emmanuel

---------------------------------------------------------------------
Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>

INRA-URGI / Bayer CropScience
523 Place des Terrasses             http://www.infobiogen.fr
91000 EVRY                          http://urgi.infobiogen.fr
Tel : 01 60 87 37 42                http://www.bayercropscience.com

PGP public key server : http://pgp.mit.edu/
Key ID : 0x0B84357F
---------------------------------------------------------------------


From cjfields at uiuc.edu  Thu Jun  1 17:36:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:36:05 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447F3BA7.9030500@cornell.edu>
Message-ID: <001b01c685c3$63840070$15327e82@pyrimidine>

Genevieve, 

seek() won't work here; all the file IO is handled through Bio::Root::IO
methods.  The SearchIO system is set up like an XML SAX parser so if you
want to save objects as they come you'll have to store the object refs in an
array, like so:

my @hsps;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      while ($hsp = $hit->next_hsp) {
         push @hsps, $hsp;
      }
   }
}

Or similarly with hits: 

my @hits;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      push @hits, $hit;
   }
}

Or you could use more complex data structures (array of arrays) as Sendu
suggested.  You should be able to sort like anything else by calling methods
within the sort:

# total number of hsps
my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;

# if you really like your accessions in alphabetical order
my @sorted = sort {$a->accession cmp $b->accession} @hits;

Then if you wanted to print later you could sort based on something else,
like the score:

my @sort_score = sort {$a->score <=> $b->score} @hits;

So you would end up with something like the following subroutines:

sub sort_results{
   my $report = shift;
   while($result = $report->next_result()){
      while(my $hit = $result->next_hit()){
         push @hits, $hit;
      }
   }
   my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;
   print $_->accession,"\t",$_->num_hsps,"\n" for @sorted;
}

sub print_blast_results{
   my $report = shift;
   my @sort_score = sort {$a->score <=> $b->score} @hits;
   for my $h (@sort_score) {
      while (my $hsp = $h->next_hsp) {
         # might use something else here like hit->name or accession,
         # not sure what you want
         my $q_name = $hsp->seq_id; 
         print join(", ",$q_name,$h->name,$hsp->bits)."\n";
         }
   }
}


Just so you know, I couldn't get display_id or display_name to work when
using the Bio::Search::HSP::GenericHSP object.  Your results may vary.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Genevieve DeClerck
> Sent: Thursday, June 01, 2006 2:11 PM
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Problem solved, albeit, in a slightly hacky way.
> 
> I tried to make seek() work for a good long while with the SearchIO
> blast results object, but I just couldn't get it to work. (Probably b/c
> seek wants to see a genuine file handle-- not a SearchIO filehandle.) I
> used SearchIO's fh() to get the handle and could while(<$fh>) through
> the data but when I used seek($fh,0,0) to reset the cursor position in
> the handle in prep for another loop, i got an error complaining about my
> use of seek() by indicating that "SEEK" could not be found in Seekable.pm.
> 
> I concluded that it was not going to be possible and instead made an
> array if SeqFeature objects which contain all the relevant blast output
> data (i.e. the m8/hit table stuff).
> 
> It still seems unfortunate that one can't reuse the SearchIO object for
> cases when the SearchIO blast report needs to be accessed mltiple times.
> 
> Thanks for your help,
> Genevieve
> 
> 
> 
> Sendu Bala wrote:
> 
> > Genevieve DeClerck wrote:
> >
> >>Thanks for your comment Sendu, it was very helpful. I think this must be
> >>what's going on.. I am using $blast_report->next_result in both
> >>subroutines. It appears that analyzing the blast results first w/ my
> >>sort subroutine empties (?) the $blast_result object so that when I try
> >>to print, there is nothing left to print. (and visa-versa when I print
> >>first then try to sort).
> >>So, from the looks of things, using next_result has the effect of
> >>popping the Bio::Search::Result::ResultI objects off of the SearchIO
> >>blast report object??
> >
> >
> > Not quite. It's more or less exactly like opening a file and then trying
> > to read it all twice like this:
> > open(FILE, "file");
> > while (<FILE>) {
> >      print # prints each line in the file
> > }
> > while (<FILE>) {
> >      print # never happens, we never enter this while loop
> > }
> >
> > To get the second while loop to print anything we need to say seek(FILE,
> > 0, 0) before it. Or in the first while loop store each line in an array,
> > and then make the second loop a foreach through that array.
> >
> >
> >
> >>It seems I could get around this by making a copy of the blast report by
> >>setting it to another new variable...(not the most elegant solution) but
> >>I'm having trouble with this...
> >>
> >>If I do:
> >>
> >>    my $blast_report_copy = $blast_report;
> >>
> >>I'm just copying the reference to the SearchIO blast result, so it
> >>doesn't help me. How can I make another physical copy of this blast
> >>result object? Seems like a simple thing but how to do it is escaping
> me.
> >
> >
> > Not really a good idea, and it may not work anyway if the object
> > contains a filehandle. But for a simple object you might recursively
> > loop through the data structure and copy each element out into a similar
> > data structure.
> >
> >
> >
> >>But better yet, the way to go is to 'reset the counter,' or to find a
> >>way to look at/print/sort the results without removing data from the
> >>blast result object. How is this done though??
> >
> >
> > It would be rather nice if this worked:
> > my $blast_report = $factory->blastall($ref_seq_objs);
> > my $blast_fh = $blast_report->fh();
> > while (<$blast_fh>) {
> >      # $_ is a ResultI object, use as normal
> > }
> > seek($blast_fh, 0, 0); # this would be great, but does it work?
> > while <$blast_fh>) {
> >      # go through the results again in your second subroutine
> > }
> >
> > An alternative hacky way of doing it, which may also not work, would be
> > to go through your $blast_report as normal, but then before going
> > through it a second time, say
> > my $fh = $blast_report->_fh;
> > seek($fh, 0, 0);
> >
> > Finally, the most sensible way (assuming bioperl provides no methods of
> > its own for this) of solving the problem is, the first time you go
> > through each next_result, next_hit and next_hsp, just store the returned
> > objects in an array of arrays of arrays. Then the second time get the
> > objects from your array structure instead of with the method calls.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 17:49:30 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 16:49:30 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
References: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <447F60EA.1050608@campus.iztacala.unam.mx>

Hi Emmanuel,

Take a look into the BioPerl FAQ:

http://bioperl.org/wiki/FAQ

It contains some info that will guide you through the appropriate steps 
depending on your situation.

Regards,
Mauricio.

Emmanuel Quevillon wrote:
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 17:47:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:47:11 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <001c01c685c4$f01e7550$15327e82@pyrimidine>

The Bioperl FAQ on the wiki answers this:

http://www.bioperl.org/wiki/FAQ#I.27ve_got_an_idea_for_a_module_how_do_I_con
tribute_it.3F

Basically, you've already done the first step, but you might want to
resubmit the email in a different form, with something about "New parsers
for TargetP, TandemRepeatFinder and RepeatMasker" in the Subject line to get
more input about those from the users-at-large.  

BTW, there is already a Bio::Tools::RepeatMasker, so you should check it out
to make sure there isn't any redundancy between your version and the
bioperl-live version.  The developers may be reluctant to replace the
bioperl-live version with yours to prevent API problems with end users,
unless you provide some serious justification (like the current one is
broken, not complete, etc).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Emmanuel Quevillon
> Sent: Thursday, June 01, 2006 4:16 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] How to submit new module?
> 
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 
> --
> Emmanuel
> 
> ---------------------------------------------------------------------
> Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>
> 
> INRA-URGI / Bayer CropScience
> 523 Place des Terrasses             http://www.infobiogen.fr
> 91000 EVRY                          http://urgi.infobiogen.fr
> Tel : 01 60 87 37 42                http://www.bayercropscience.com
> 
> PGP public key server : http://pgp.mit.edu/
> Key ID : 0x0B84357F
> ---------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  2 03:52:07 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 2 Jun 2006 09:52:07 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <001201c685a3$59d78da0$15327e82@pyrimidine>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
Message-ID: <200606020952.08034.heikki@sanbi.ac.za>

I've started going through the files that have 'return undef' lines.
I'll report back later.

Initial impression is that there are a few cases where the context indicates 
list to be returned but failure returns an explicit undef. I'll fix those.

Most of the cases are much more ambiguous. Even when documentation says the 
failure returns undef, it is clearly meant to mean false. In most cases 
documentation does not comment on return value at all. Luckily the context is 
almost always scalar and therefore it does not matter too much.

I seem to be changing 'return undef' to plain 'return' a bit overzealously, so 
do not take it personally.

	-Heikki

On Thursday 01 June 2006 19:46, Chris Fields wrote:
> ....
>
> > > Again, didn't do that.
> >
> > I'm very sorry that I allowed the ambiguity, but my comments were
> > certainly not directed at your recent changes to Bio::Restriction::IO.
> > In fact, I put in the above * comment to exclude your changes from my
> > discussion; you changed the docs because the code never did what they
> > said they did (the docs were bad). That's fine (good!). My comments were
> > a general point, slightly directed at the idea of changing all the
> > return undef;s - changing the code so that it no longer matches the docs
> > of a previously working method. That's what I think is bad. Though in
> > this particular case it shouldn't make any difference at all.
>
> Agreed.  In any case, if tests have been properly set up then they should
> catch problems.  This is, of course, if they are properly set up.
>
> Chris
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Fri Jun  2 05:04:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 02 Jun 2006 10:04:18 +0100
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447F4F29.9070600@gmx.at>
References: <447F4F29.9070600@gmx.at>
Message-ID: <447FFF12.506@mrc-dunn.cam.ac.uk>

Hubert Prielinger wrote:
> hi,
> I have the following program and it worked quite well, for retrieving 
> remoteblast results in a textfile,
> now I have altered it to to xml, and it didn't work anymore.....
> it takes all the parameter at the commandline, submits the query, but I 
> don't retrieve any results file anymore.....
> 
> it seems that it hangs in a endless loop......
> the only output I get is:  $rc is not a ref! over and over..... it 
> doesn't enter the else term anymore....

There is no problem with your code. The problem is with the NCBI server 
and should be reported to them. You can visit the site and do a blast, 
requesting xml format, and you will typically get one normal 'waiting' 
message and the promise that it will be updated in x seconds, but 
subsequent attempts to get progress information result in an xml error 
page because the NCBI server doesn't actually send any data.

Unfortunately the way that the bioperl code is written, it treats no 
data as 'waiting' instead of an error. I've offered a patch to fix this 
at this bug page:
http://bugzilla.bioperl.org/show_bug.cgi?id=2015


From cjfields at uiuc.edu  Fri Jun  2 10:30:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 09:30:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447FFF12.506@mrc-dunn.cam.ac.uk>
Message-ID: <001a01c68651$12925250$15327e82@pyrimidine>

Sendu, Hubert,


Hubert, your code looks fine so Sendu's patch should fix the problem (break
out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
works.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, June 02, 2006 4:04 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> Hubert Prielinger wrote:
> > hi,
> > I have the following program and it worked quite well, for retrieving
> > remoteblast results in a textfile,
> > now I have altered it to to xml, and it didn't work anymore.....
> > it takes all the parameter at the commandline, submits the query, but I
> > don't retrieve any results file anymore.....
> >
> > it seems that it hangs in a endless loop......
> > the only output I get is:  $rc is not a ref! over and over..... it
> > doesn't enter the else term anymore....
> 
> There is no problem with your code. The problem is with the NCBI server
> and should be reported to them. You can visit the site and do a blast,
> requesting xml format, and you will typically get one normal 'waiting'
> message and the promise that it will be updated in x seconds, but
> subsequent attempts to get progress information result in an xml error
> page because the NCBI server doesn't actually send any data.
> 
> Unfortunately the way that the bioperl code is written, it treats no
> data as 'waiting' instead of an error. I've offered a patch to fix this
> at this bug page:
> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  2 15:13:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 14:13:31 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
Message-ID: <000301c68678$a3cdaa40$15327e82@pyrimidine>

Heikki,

I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
when running AlignIO.t (I was fixing bug 2000):

http://bugzilla.open-bio.org/show_bug.cgi?id=2016

Not sure what's going on there but using read_aln and write_aln seem to work
normally.  It may have something to do with Bio::SimpleAlign but I'm not
absolutely sure.

Any ideas what may be going on here?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Fri Jun  2 17:11:41 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 15:11:41 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <001a01c68651$12925250$15327e82@pyrimidine>
References: <001a01c68651$12925250$15327e82@pyrimidine>
Message-ID: <4480A98D.6010501@gmx.at>

hi,
sorry, but I have updated the remoteblast module and I have run several 
attempts with the same results as before. It didn't work.
I didn't get any results.

regards
Hubert


Chris Fields wrote:
> Sendu, Hubert,
>
>
> Hubert, your code looks fine so Sendu's patch should fix the problem (break
> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
> works.  
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Friday, June 02, 2006 4:04 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> Hubert Prielinger wrote:
>>     
>>> hi,
>>> I have the following program and it worked quite well, for retrieving
>>> remoteblast results in a textfile,
>>> now I have altered it to to xml, and it didn't work anymore.....
>>> it takes all the parameter at the commandline, submits the query, but I
>>> don't retrieve any results file anymore.....
>>>
>>> it seems that it hangs in a endless loop......
>>> the only output I get is:  $rc is not a ref! over and over..... it
>>> doesn't enter the else term anymore....
>>>       
>> There is no problem with your code. The problem is with the NCBI server
>> and should be reported to them. You can visit the site and do a blast,
>> requesting xml format, and you will typically get one normal 'waiting'
>> message and the promise that it will be updated in x seconds, but
>> subsequent attempts to get progress information result in an xml error
>> page because the NCBI server doesn't actually send any data.
>>
>> Unfortunately the way that the bioperl code is written, it treats no
>> data as 'waiting' instead of an error. I've offered a patch to fix this
>> at this bug page:
>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri Jun  2 17:54:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 16:54:20 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480A98D.6010501@gmx.at>
Message-ID: <000001c6868f$1b68dbe0$15327e82@pyrimidine>

Hubert, 

Could you post this on bugzilla with your script and test data so I can try
to replicate you error?  I may not get to it until Monday.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, June 02, 2006 4:12 PM
> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> sorry, but I have updated the remoteblast module and I have run several
> attempts with the same results as before. It didn't work.
> I didn't get any results.
> 
> regards
> Hubert
> 
> 
> Chris Fields wrote:
> > Sendu, Hubert,
> >
> >
> > Hubert, your code looks fine so Sendu's patch should fix the problem
> (break
> > out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
> CVS;
> > it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
> it
> > works.
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >> Sent: Friday, June 02, 2006 4:04 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>
> >> Hubert Prielinger wrote:
> >>
> >>> hi,
> >>> I have the following program and it worked quite well, for retrieving
> >>> remoteblast results in a textfile,
> >>> now I have altered it to to xml, and it didn't work anymore.....
> >>> it takes all the parameter at the commandline, submits the query, but
> I
> >>> don't retrieve any results file anymore.....
> >>>
> >>> it seems that it hangs in a endless loop......
> >>> the only output I get is:  $rc is not a ref! over and over..... it
> >>> doesn't enter the else term anymore....
> >>>
> >> There is no problem with your code. The problem is with the NCBI server
> >> and should be reported to them. You can visit the site and do a blast,
> >> requesting xml format, and you will typically get one normal 'waiting'
> >> message and the promise that it will be updated in x seconds, but
> >> subsequent attempts to get progress information result in an xml error
> >> page because the NCBI server doesn't actually send any data.
> >>
> >> Unfortunately the way that the bioperl code is written, it treats no
> >> data as 'waiting' instead of an error. I've offered a patch to fix this
> >> at this bug page:
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri Jun  2 19:19:40 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 17:19:40 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <000001c68691$8c4eeb40$15327e82@pyrimidine>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
Message-ID: <4480C78C.1000701@gmx.at>

hi,
I have submitted the bug -> Bug 2017
with the script and input file, just start it from command line

thank you very much
greetings

Hubert

Chris Fields wrote:
> Hubert,
>
> I have a script that's using blastxml and XML output which seems to work.
> I'll try looking at it to get a better idea this weekend.
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, June 02, 2006 4:12 PM
>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> sorry, but I have updated the remoteblast module and I have run several
>> attempts with the same results as before. It didn't work.
>> I didn't get any results.
>>
>> regards
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Sendu, Hubert,
>>>
>>>
>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>       
>> (break
>>     
>>> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
>>>       
>> CVS;
>>     
>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
>>>       
>> it
>>     
>>> works.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> Hubert Prielinger wrote:
>>>>
>>>>         
>>>>> hi,
>>>>> I have the following program and it worked quite well, for retrieving
>>>>> remoteblast results in a textfile,
>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>> it takes all the parameter at the commandline, submits the query, but
>>>>>           
>> I
>>     
>>>>> don't retrieve any results file anymore.....
>>>>>
>>>>> it seems that it hangs in a endless loop......
>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>> doesn't enter the else term anymore....
>>>>>
>>>>>           
>>>> There is no problem with your code. The problem is with the NCBI server
>>>> and should be reported to them. You can visit the site and do a blast,
>>>> requesting xml format, and you will typically get one normal 'waiting'
>>>> message and the promise that it will be updated in x seconds, but
>>>> subsequent attempts to get progress information result in an xml error
>>>> page because the NCBI server doesn't actually send any data.
>>>>
>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>> data as 'waiting' instead of an error. I've offered a patch to fix this
>>>> at this bug page:
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at uiuc.edu  Fri Jun  2 20:33:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:33:48 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480C78C.1000701@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
Message-ID: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>

You need to add the input conditions as well (you have several  
<STDIN> lines which may play a role; I would like to know what you  
normally enter for those).

How long did you let the script run?  I ran a quick check on your  
sequences; you have almost 1600, so you have to expect that you'll  
run into some problems here!  Most here (including me) would suggest  
you try installing a local blast setup for something like this.

Chris

On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:

> hi,
> I have submitted the bug -> Bug 2017
> with the script and input file, just start it from command line
>
> thank you very much
> greetings
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> I have a script that's using blastxml and XML output which seems  
>> to work.
>> I'll try looking at it to get a better idea this weekend.
>>
>> Chris
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>> Sent: Friday, June 02, 2006 4:12 PM
>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>
>>> hi,
>>> sorry, but I have updated the remoteblast module and I have run  
>>> several
>>> attempts with the same results as before. It didn't work.
>>> I didn't get any results.
>>>
>>> regards
>>> Hubert
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> Sendu, Hubert,
>>>>
>>>>
>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>> problem
>>>>
>>> (break
>>>
>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>> RemoteBlast in
>>>>
>>> CVS;
>>>
>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to  
>>>> see if
>>>>
>>> it
>>>
>>>> works.
>>>>
>>>> Chris
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi,
>>>>>> I have the following program and it worked quite well, for  
>>>>>> retrieving
>>>>>> remoteblast results in a textfile,
>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>> query, but
>>>>>>
>>> I
>>>
>>>>>> don't retrieve any results file anymore.....
>>>>>>
>>>>>> it seems that it hangs in a endless loop......
>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>> over..... it
>>>>>> doesn't enter the else term anymore....
>>>>>>
>>>>>>
>>>>> There is no problem with your code. The problem is with the  
>>>>> NCBI server
>>>>> and should be reported to them. You can visit the site and do a  
>>>>> blast,
>>>>> requesting xml format, and you will typically get one normal  
>>>>> 'waiting'
>>>>> message and the promise that it will be updated in x seconds, but
>>>>> subsequent attempts to get progress information result in an  
>>>>> xml error
>>>>> page because the NCBI server doesn't actually send any data.
>>>>>
>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>> treats no
>>>>> data as 'waiting' instead of an error. I've offered a patch to  
>>>>> fix this
>>>>> at this bug page:
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Fri Jun  2 20:49:15 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 18:49:15 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
Message-ID: <4480DC8B.7070005@gmx.at>

hi,
input database: swissprot
         matrix: pam30
         count: 1
         gapcosts: 9 1

I know that there are  a lot of sequences, but that doesn't matter, you 
can delete all of them except one, the amount of the sequences is not 
the problem, the script reads one line and submits it.....then the 
second line and so on.....I have tried it with only one sequence either 
and I got the same result.... the script run at that time for more than 
20 minutes!!!!!! .....and that should be enough time to retrieve the 
results for ONE sequence, I guess

regards
Hubert


Chris Fields wrote:
> You need to add the input conditions as well (you have several <STDIN> 
> lines which may play a role; I would like to know what you normally 
> enter for those).
>
> How long did you let the script run?  I ran a quick check on your 
> sequences; you have almost 1600, so you have to expect that you'll run 
> into some problems here!  Most here (including me) would suggest you 
> try installing a local blast setup for something like this.
>
> Chris
>
> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>
>> hi,
>> I have submitted the bug -> Bug 2017
>> with the script and input file, just start it from command line
>>
>> thank you very much
>> greetings
>>
>> Hubert
>>
>> Chris Fields wrote:
>>> Hubert,
>>>
>>> I have a script that's using blastxml and XML output which seems to 
>>> work.
>>> I'll try looking at it to get a better idea this weekend.
>>>
>>> Chris
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> hi,
>>>> sorry, but I have updated the remoteblast module and I have run 
>>>> several
>>>> attempts with the same results as before. It didn't work.
>>>> I didn't get any results.
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>> Sendu, Hubert,
>>>>>
>>>>>
>>>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>>>
>>>> (break
>>>>
>>>>> out of that infinite loop).  I applied Sendu's patch to 
>>>>> RemoteBlast in
>>>>>
>>>> CVS;
>>>>
>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to 
>>>>> see if
>>>>>
>>>> it
>>>>
>>>>> works.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> Hubert Prielinger wrote:
>>>>>>
>>>>>>
>>>>>>> hi,
>>>>>>> I have the following program and it worked quite well, for 
>>>>>>> retrieving
>>>>>>> remoteblast results in a textfile,
>>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>>> it takes all the parameter at the commandline, submits the 
>>>>>>> query, but
>>>>>>>
>>>> I
>>>>
>>>>>>> don't retrieve any results file anymore.....
>>>>>>>
>>>>>>> it seems that it hangs in a endless loop......
>>>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>>>> doesn't enter the else term anymore....
>>>>>>>
>>>>>>>
>>>>>> There is no problem with your code. The problem is with the NCBI 
>>>>>> server
>>>>>> and should be reported to them. You can visit the site and do a 
>>>>>> blast,
>>>>>> requesting xml format, and you will typically get one normal 
>>>>>> 'waiting'
>>>>>> message and the promise that it will be updated in x seconds, but
>>>>>> subsequent attempts to get progress information result in an xml 
>>>>>> error
>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>
>>>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>>>> data as 'waiting' instead of an error. I've offered a patch to 
>>>>>> fix this
>>>>>> at this bug page:
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Fri Jun  2 20:57:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:57:37 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480DC8B.7070005@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
Message-ID: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>

Yes, I see the same error you do.  But I have a similar script  
(blastp, XML blast report, XML parsing, similar loop structure) that  
works fine.  I'm trying to dissect the problem but I think it may be  
something logically wrong here (something not so obvious) and not a  
bug...

What I'm trying to say is, when you send sequences using remoteblast  
like, this you are essentially spamming the NCBI BLAST server with  
~1600 requests.  This script wasn't set up with that intent in mind;  
you should really try to set up your own local blast database if  
possible.  If you can't, try running this script in off-hours  
(10pm-6am EST or something like that).


Chris

On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:

> hi,
> input database: swissprot
>         matrix: pam30
>         count: 1
>         gapcosts: 9 1
>
> I know that there are  a lot of sequences, but that doesn't matter,  
> you can delete all of them except one, the amount of the sequences  
> is not the problem, the script reads one line and submits  
> it.....then the second line and so on.....I have tried it with only  
> one sequence either and I got the same result.... the script run at  
> that time for more than 20 minutes!!!!!! .....and that should be  
> enough time to retrieve the results for ONE sequence, I guess
>
> regards
> Hubert
>
>
>
> Chris Fields wrote:
>> You need to add the input conditions as well (you have several  
>> <STDIN> lines which may play a role; I would like to know what you  
>> normally enter for those).
>>
>> How long did you let the script run?  I ran a quick check on your  
>> sequences; you have almost 1600, so you have to expect that you'll  
>> run into some problems here!  Most here (including me) would  
>> suggest you try installing a local blast setup for something like  
>> this.
>>
>> Chris
>>
>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I have submitted the bug -> Bug 2017
>>> with the script and input file, just start it from command line
>>>
>>> thank you very much
>>> greetings
>>>
>>> Hubert
>>>
>>> Chris Fields wrote:
>>>> Hubert,
>>>>
>>>> I have a script that's using blastxml and XML output which seems  
>>>> to work.
>>>> I'll try looking at it to get a better idea this weekend.
>>>>
>>>> Chris
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>> Bala'
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> hi,
>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>> several
>>>>> attempts with the same results as before. It didn't work.
>>>>> I didn't get any results.
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Sendu, Hubert,
>>>>>>
>>>>>>
>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>> problem
>>>>>>
>>>>> (break
>>>>>
>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>> RemoteBlast in
>>>>>>
>>>>> CVS;
>>>>>
>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>> to see if
>>>>>>
>>>>> it
>>>>>
>>>>>> works.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>> hi,
>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>> retrieving
>>>>>>>> remoteblast results in a textfile,
>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>> anymore.....
>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>> query, but
>>>>>>>>
>>>>> I
>>>>>
>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>
>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>> over..... it
>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>
>>>>>>>>
>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>> NCBI server
>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>> a blast,
>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>> 'waiting'
>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>> but
>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>> xml error
>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>
>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>> treats no
>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>> to fix this
>>>>>>> at this bug page:
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Fri Jun  2 21:36:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 19:36:42 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
Message-ID: <4480E7AA.3020603@gmx.at>

hi chris,
thanks but I never intended to run the remoteblast with so much, only a 
few of them, acutally I goal is to run the phiblast with regular 
expression, so that i just don't need that
file anymore.

another question for parsing the xml output....is there a xml parser 
available for blast xml output or how to start.....
I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
I'm not sure how to start....sorry, I guess I'm too stupid....
is their maybe another introduction or an example.

thanks
Hubert


Chris Fields wrote:
> Yes, I see the same error you do.  But I have a similar script  
> (blastp, XML blast report, XML parsing, similar loop structure) that  
> works fine.  I'm trying to dissect the problem but I think it may be  
> something logically wrong here (something not so obvious) and not a  
> bug...
>
> What I'm trying to say is, when you send sequences using remoteblast  
> like, this you are essentially spamming the NCBI BLAST server with  
> ~1600 requests.  This script wasn't set up with that intent in mind;  
> you should really try to set up your own local blast database if  
> possible.  If you can't, try running this script in off-hours  
> (10pm-6am EST or something like that).
>
>
> Chris
>
> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>
>   
>> hi,
>> input database: swissprot
>>         matrix: pam30
>>         count: 1
>>         gapcosts: 9 1
>>
>> I know that there are  a lot of sequences, but that doesn't matter,  
>> you can delete all of them except one, the amount of the sequences  
>> is not the problem, the script reads one line and submits  
>> it.....then the second line and so on.....I have tried it with only  
>> one sequence either and I got the same result.... the script run at  
>> that time for more than 20 minutes!!!!!! .....and that should be  
>> enough time to retrieve the results for ONE sequence, I guess
>>
>> regards
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> You need to add the input conditions as well (you have several  
>>> <STDIN> lines which may play a role; I would like to know what you  
>>> normally enter for those).
>>>
>>> How long did you let the script run?  I ran a quick check on your  
>>> sequences; you have almost 1600, so you have to expect that you'll  
>>> run into some problems here!  Most here (including me) would  
>>> suggest you try installing a local blast setup for something like  
>>> this.
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>
>>>       
>>>> hi,
>>>> I have submitted the bug -> Bug 2017
>>>> with the script and input file, just start it from command line
>>>>
>>>> thank you very much
>>>> greetings
>>>>
>>>> Hubert
>>>>
>>>> Chris Fields wrote:
>>>>         
>>>>> Hubert,
>>>>>
>>>>> I have a script that's using blastxml and XML output which seems  
>>>>> to work.
>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>>> Bala'
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> hi,
>>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>>> several
>>>>>> attempts with the same results as before. It didn't work.
>>>>>> I didn't get any results.
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Sendu, Hubert,
>>>>>>>
>>>>>>>
>>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>>> problem
>>>>>>>
>>>>>>>               
>>>>>> (break
>>>>>>
>>>>>>             
>>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>>> RemoteBlast in
>>>>>>>
>>>>>>>               
>>>>>> CVS;
>>>>>>
>>>>>>             
>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>>> to see if
>>>>>>>
>>>>>>>               
>>>>>> it
>>>>>>
>>>>>>             
>>>>>>> works.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> hi,
>>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>>> retrieving
>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>>> anymore.....
>>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>>> query, but
>>>>>>>>>
>>>>>>>>>                   
>>>>>> I
>>>>>>
>>>>>>             
>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>
>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>>> over..... it
>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>>> NCBI server
>>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>>> a blast,
>>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>>> 'waiting'
>>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>>> but
>>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>>> xml error
>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>
>>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>>> treats no
>>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>>> to fix this
>>>>>>>> at this bug page:
>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Sat Jun  3 00:35:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 23:35:21 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>


On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:

> hi chris,
> thanks but I never intended to run the remoteblast with so much,  
> only a few of them, acutally I goal is to run the phiblast with  
> regular expression, so that i just don't need that
> file anymore

Not a problem.  Just to let you know, I did manage to get the script  
working, so I'm marking the bug INVALID.  I think the problem isn't  
that there is an infinite loop so much as setting composition-based  
statistics causes the search to take much much longer; try removing  
that line to see what I mean.

Just so you know, using $result->query_name doesn't get you what you  
would expect (it gives you a part of the RID, which you don't want;  
this is something in the XML output that is beyond our control).  You  
might want to change it to something else or you'll get filenames  
with numerical names.

> another question for parsing the xml output....is there a xml  
> parser available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
> but I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

Bio::SearchIO objects are used to parse BLAST XML output if you have  
it saved to a file.  For instance:

my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');

while (my $result = $factory->next_result) {
   while (my $hit = $result->next_hit) {
      while (my $hsp = $hit->next_hsp {
         #do stuff here
       }
    }
}

The only thing that changes in parsing a text BLAST report from an  
XML BLAST report is the -format line (similar to the -readmethod  
parameter in RemoteBlast).  You shouldn't need to look up any more  
documentation other than these on the wiki:

http://www.bioperl.org/wiki/HOWTO:SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml

Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
up parsing.

Chris

> thanks
> Hubert
>
>
> Chris Fields wrote:
>> Yes, I see the same error you do.  But I have a similar script   
>> (blastp, XML blast report, XML parsing, similar loop structure)  
>> that  works fine.  I'm trying to dissect the problem but I think  
>> it may be  something logically wrong here (something not so  
>> obvious) and not a  bug...
>>
>> What I'm trying to say is, when you send sequences using  
>> remoteblast  like, this you are essentially spamming the NCBI  
>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>> that intent in mind;  you should really try to set up your own  
>> local blast database if  possible.  If you can't, try running this  
>> script in off-hours  (10pm-6am EST or something like that).
>>
>>
>> Chris
>>
>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>
>>
>>> hi,
>>> input database: swissprot
>>>         matrix: pam30
>>>         count: 1
>>>         gapcosts: 9 1
>>>
>>> I know that there are  a lot of sequences, but that doesn't  
>>> matter,  you can delete all of them except one, the amount of the  
>>> sequences  is not the problem, the script reads one line and  
>>> submits  it.....then the second line and so on.....I have tried  
>>> it with only  one sequence either and I got the same result....  
>>> the script run at  that time for more than 20  
>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>> the results for ONE sequence, I guess
>>>
>>> regards
>>> Hubert
>>>
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> You need to add the input conditions as well (you have several   
>>>> <STDIN> lines which may play a role; I would like to know what  
>>>> you  normally enter for those).
>>>>
>>>> How long did you let the script run?  I ran a quick check on  
>>>> your  sequences; you have almost 1600, so you have to expect  
>>>> that you'll  run into some problems here!  Most here (including  
>>>> me) would  suggest you try installing a local blast setup for  
>>>> something like  this.
>>>>
>>>> Chris
>>>>
>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi,
>>>>> I have submitted the bug -> Bug 2017
>>>>> with the script and input file, just start it from command line
>>>>>
>>>>> thank you very much
>>>>> greetings
>>>>>
>>>>> Hubert
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Hubert,
>>>>>>
>>>>>> I have a script that's using blastxml and XML output which  
>>>>>> seems  to work.
>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>> 'Sendu  Bala'
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> hi,
>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>> run  several
>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>> I didn't get any results.
>>>>>>>
>>>>>>> regards
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Sendu, Hubert,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>> the  problem
>>>>>>>>
>>>>>>>>
>>>>>>> (break
>>>>>>>
>>>>>>>
>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>> RemoteBlast in
>>>>>>>>
>>>>>>>>
>>>>>>> CVS;
>>>>>>>
>>>>>>>
>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>> CVS  to see if
>>>>>>>>
>>>>>>>>
>>>>>>> it
>>>>>>>
>>>>>>>
>>>>>>>> works.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>
>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>> for  retrieving
>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>> anymore.....
>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>> the  query, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>
>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>
>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>> over..... it
>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>> the  NCBI server
>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>> do  a blast,
>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>> normal  'waiting'
>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>> seconds,  but
>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>> an  xml error
>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>
>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>> treats no
>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>> patch  to fix this
>>>>>>>>> at this bug page:
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 11:10:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:10:51 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149084373.447da2d5c5339@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
	<1149084373.447da2d5c5339@128.91.55.38>
Message-ID: <9206E0B2-15DC-4AB2-B71B-5EA9D1D11AEC@duke.edu>

The bootstrap is stored as the node ID because that is a limitation  
of the newick format, there isn't a formal way to distinguish  
internal IDs from bootstraps.  There are several differents ways that  
programs encode the internal ID and a bootstrap value in that one  
slot - we try and parse it out if the the bootstrap is stored in  
brackets like INTERNALID[BOOTSTRAP].

Formats like nhx explicitly solve this problem, but most programs  
only use the simple newick.  if you know your data it is a simple  
procedure to move the internal ID data into the bootstrap slot.

in terms of ignoreoverwrite you just need to send in a second  
parameter which is true
$node->add_Descendent($childnode, 1);

-jason


On May 31, 2006, at 10:06 AM, Lucia Peixoto wrote:

> Hi
> Thanks
> a couple more questions
> why is the bootstrap value stored as the node id? Is that right?
>
> also, in the add_descendant method, how do you set the  
> $ignoreoverwrite
> parameter to true?
>
> Lucia
>
> Quoting Jason Stajich <jason.stajich at duke.edu>:
>
>> you need to special case the root - it won't have an ancestor.  just
>> protect the my $parent = $node->ancestor with an if statement as I
>> did below
>>
>> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>>
>>> Hi
>>> OK that was silly, but what I have in my code is what you just wrote
>>> But the problem is that if I write
>>>
>>> $parent->add_Descendent($child)
>>>
>>> it tells me that I am calling  the method "ass_Descendent" on an
>>> undefined value
>>> (but I did define $parent before??)
>>>
>>> So here it goes the code so far:
>>>
>>> use Bio::TreeIO;
>>>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>>>                           -format => 'newick');
>>>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>>>                            -format => 'newick');
>>>  while( my $tree = $in->next_tree ) {
>>>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes 
>>> () ) {
>>>     my $bootstrap=$node->_creation_id;
>>>
>>>     if ($bootstrap < 70 ){
>>>>>> if(        my $parent = $node->ancestor ) {
>>>               my @children=$node->get_all_Descendents;
>>>               foreach my $child (@children){
>>>                  $parent->add_Descendent($child);
>>>               }
>>          }
>>>
>>> ........
>>>
>>> eventually I'll add (once I assigned the children to the parent
>>> succesfully):
>>> $tree->remove_Node($node);
>>>
>>>         }
>>>     }
>>>     $out->write_tree($tree);
>>> }
>>>
>>> Quoting aaron.j.mackey at gsk.com:
>>>
>>>>> foreach $child (@children){
>>>>>          $parent=add_Descendent->$child;
>>>>> }
>>>>
>>>> I think what you want is $parent->add_Descendent($child)
>>>>
>>>> -Aaron
>>>>
>>>
>>>
>>> Lucia Peixoto
>>> Department of Biology,SAS
>>> University of Pennsylvania
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sat Jun  3 11:29:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:29:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>

you can get all the Hits or hsps with the following method:
my @hits = $result->hits;
my @hsps = $hit->hsps;


You can also reset the counter since these implementations are in- 
memory and already parsed (and not a stream processor per se).   
next_XX just iterates through the list stored in the parent object.

$result->rewind;

   and

$hit->rewind;


For example, the rewind needs to be called if you want to use a  
ResultWriter object and filter some of the values for the final  
writing after first inspecting them.

-jason


On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:

> Thanks for your comment Sendu, it was very helpful. I think this  
> must be
> what's going on.. I am using $blast_report->next_result in both
> subroutines. It appears that analyzing the blast results first w/ my
> sort subroutine empties (?) the $blast_result object so that when I  
> try
> to print, there is nothing left to print. (and visa-versa when I print
> first then try to sort).
> So, from the looks of things, using next_result has the effect of
> popping the Bio::Search::Result::ResultI objects off of the SearchIO
> blast report object??
>
> It seems I could get around this by making a copy of the blast  
> report by
> setting it to another new variable...(not the most elegant  
> solution) but
> I'm having trouble with this...
>
> If I do:
>
> 	my $blast_report_copy = $blast_report;
>
> I'm just copying the reference to the SearchIO blast result, so it
> doesn't help me. How can I make another physical copy of this blast
> result object? Seems like a simple thing but how to do it is  
> escaping me.
>
> But better yet, the way to go is to 'reset the counter,' or to find a
> way to look at/print/sort the results without removing data from the
> blast result object. How is this done though??
>
> Sendu and Brian, I didn't post the sort_results subroutine because  
> it is
> sprawling, as is a lot of my code. The code I provided was more  
> like an
> aid for my explanation of the problem.. it doesn't actually run -  
> sorry
> for the confusion, I should have more clear on that.  The important
> thing to know perhaps is that both sort_results and  
> print_blast_results
> contain a foreach loop where I am using the 'next_results' method to
> view blast results. (And to clarify for Torsten, the blastall() is
> working just fine - the analysis/viewing of the results object is  
> where
> I am encountering the problem.)
>
>
> Any other ideas would be greatly appreciated...
>
> Thank you,
> Genevieve
>
>
>
>
> Sendu Bala wrote:
>
>> Genevieve DeClerck wrote:
>>
>>> Hi,
>>
>> [snip]
>>
>>> If I've sorted the results the sorted-results will print to screen,
>>> however when I try to print the Hit Table results nothing is  
>>> returned,
>>> as if the blast results have evaporated.... and visa versa, if i
>>> comment out the part where i point my sorting subroutine to the  
>>> blast
>>> results reference,  my hit table results suddenly prints to screen.
>>
>> [snip]
>>
>>> Here's an abbreviated version of my code:
>>
>> [snip]
>>
>>> #######
>>> ### the following 2 actions seem to be mutually exclusive.
>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>> # SeqFeature objs stored in arrays. arrays are then printed
>>> # to stdout
>>> &sort_results($blast_report);
>>>
>>> # 2) print blast results
>>> &print_blast_results($blast_report);
>>
>>
>>> sub print_blast_results{
>>>    my $report = shift;
>>>    while(my $result = $report->next_result()){
>>
>> [snip]
>>
>> You didn't give us your sort_results subroutine, but is it as  
>> simple as
>> they both use $report->next_result (and/or $result->next_hit), but  
>> you
>> don't reset the internal counter back to the start, so the second
>> subroutine tries to get the next_result and finds the first  
>> subroutine
>> has already looked at the last result and so next_result returns  
>> false?
>>
>>  From a quick look it wasn't obvious how to reset the counter.  
>> Hopefully
>> this can be done and someone else knows how.
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun  3 15:13:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 3 Jun 2006 14:13:22 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
Message-ID: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>

Nice!  Didn't know I could do that.  Maybe we should add some of this  
to the HOWTO (or is it already in there?).

Chris

On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:

> you can get all the Hits or hsps with the following method:
> my @hits = $result->hits;
> my @hsps = $hit->hsps;
>
>
> You can also reset the counter since these implementations are in-
> memory and already parsed (and not a stream processor per se).
> next_XX just iterates through the list stored in the parent object.
>
> $result->rewind;
>
>    and
>
> $hit->rewind;
>
>
> For example, the rewind needs to be called if you want to use a
> ResultWriter object and filter some of the values for the final
> writing after first inspecting them.
>
> -jason
>
>
> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>
>> Thanks for your comment Sendu, it was very helpful. I think this
>> must be
>> what's going on.. I am using $blast_report->next_result in both
>> subroutines. It appears that analyzing the blast results first w/ my
>> sort subroutine empties (?) the $blast_result object so that when I
>> try
>> to print, there is nothing left to print. (and visa-versa when I  
>> print
>> first then try to sort).
>> So, from the looks of things, using next_result has the effect of
>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>> blast report object??
>>
>> It seems I could get around this by making a copy of the blast
>> report by
>> setting it to another new variable...(not the most elegant
>> solution) but
>> I'm having trouble with this...
>>
>> If I do:
>>
>> 	my $blast_report_copy = $blast_report;
>>
>> I'm just copying the reference to the SearchIO blast result, so it
>> doesn't help me. How can I make another physical copy of this blast
>> result object? Seems like a simple thing but how to do it is
>> escaping me.
>>
>> But better yet, the way to go is to 'reset the counter,' or to find a
>> way to look at/print/sort the results without removing data from the
>> blast result object. How is this done though??
>>
>> Sendu and Brian, I didn't post the sort_results subroutine because
>> it is
>> sprawling, as is a lot of my code. The code I provided was more
>> like an
>> aid for my explanation of the problem.. it doesn't actually run -
>> sorry
>> for the confusion, I should have more clear on that.  The important
>> thing to know perhaps is that both sort_results and
>> print_blast_results
>> contain a foreach loop where I am using the 'next_results' method to
>> view blast results. (And to clarify for Torsten, the blastall() is
>> working just fine - the analysis/viewing of the results object is
>> where
>> I am encountering the problem.)
>>
>>
>> Any other ideas would be greatly appreciated...
>>
>> Thank you,
>> Genevieve
>>
>>
>>
>>
>> Sendu Bala wrote:
>>
>>> Genevieve DeClerck wrote:
>>>
>>>> Hi,
>>>
>>> [snip]
>>>
>>>> If I've sorted the results the sorted-results will print to screen,
>>>> however when I try to print the Hit Table results nothing is
>>>> returned,
>>>> as if the blast results have evaporated.... and visa versa, if i
>>>> comment out the part where i point my sorting subroutine to the
>>>> blast
>>>> results reference,  my hit table results suddenly prints to screen.
>>>
>>> [snip]
>>>
>>>> Here's an abbreviated version of my code:
>>>
>>> [snip]
>>>
>>>> #######
>>>> ### the following 2 actions seem to be mutually exclusive.
>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>> # to stdout
>>>> &sort_results($blast_report);
>>>>
>>>> # 2) print blast results
>>>> &print_blast_results($blast_report);
>>>
>>>
>>>> sub print_blast_results{
>>>>    my $report = shift;
>>>>    while(my $result = $report->next_result()){
>>>
>>> [snip]
>>>
>>> You didn't give us your sort_results subroutine, but is it as
>>> simple as
>>> they both use $report->next_result (and/or $result->next_hit), but
>>> you
>>> don't reset the internal counter back to the start, so the second
>>> subroutine tries to get the next_result and finds the first
>>> subroutine
>>> has already looked at the last result and so next_result returns
>>> false?
>>>
>>>  From a quick look it wasn't obvious how to reset the counter.
>>> Hopefully
>>> this can be done and someone else knows how.
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 15:31:59 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 15:31:59 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
Message-ID: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>

In the HOWTO hits() and hsps() were there, I just added rewind in the  
table of methods.
If someone wanted to write a little section in the HOWTO about  
resetting the iterator that would be great.

-jason
On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:

> Nice!  Didn't know I could do that.  Maybe we should add some of this
> to the HOWTO (or is it already in there?).
>
> Chris
>
> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>
>> you can get all the Hits or hsps with the following method:
>> my @hits = $result->hits;
>> my @hsps = $hit->hsps;
>>
>>
>> You can also reset the counter since these implementations are in-
>> memory and already parsed (and not a stream processor per se).
>> next_XX just iterates through the list stored in the parent object.
>>
>> $result->rewind;
>>
>>    and
>>
>> $hit->rewind;
>>
>>
>> For example, the rewind needs to be called if you want to use a
>> ResultWriter object and filter some of the values for the final
>> writing after first inspecting them.
>>
>> -jason
>>
>>
>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>
>>> Thanks for your comment Sendu, it was very helpful. I think this
>>> must be
>>> what's going on.. I am using $blast_report->next_result in both
>>> subroutines. It appears that analyzing the blast results first w/ my
>>> sort subroutine empties (?) the $blast_result object so that when I
>>> try
>>> to print, there is nothing left to print. (and visa-versa when I
>>> print
>>> first then try to sort).
>>> So, from the looks of things, using next_result has the effect of
>>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>>> blast report object??
>>>
>>> It seems I could get around this by making a copy of the blast
>>> report by
>>> setting it to another new variable...(not the most elegant
>>> solution) but
>>> I'm having trouble with this...
>>>
>>> If I do:
>>>
>>> 	my $blast_report_copy = $blast_report;
>>>
>>> I'm just copying the reference to the SearchIO blast result, so it
>>> doesn't help me. How can I make another physical copy of this blast
>>> result object? Seems like a simple thing but how to do it is
>>> escaping me.
>>>
>>> But better yet, the way to go is to 'reset the counter,' or to  
>>> find a
>>> way to look at/print/sort the results without removing data from the
>>> blast result object. How is this done though??
>>>
>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>> it is
>>> sprawling, as is a lot of my code. The code I provided was more
>>> like an
>>> aid for my explanation of the problem.. it doesn't actually run -
>>> sorry
>>> for the confusion, I should have more clear on that.  The important
>>> thing to know perhaps is that both sort_results and
>>> print_blast_results
>>> contain a foreach loop where I am using the 'next_results' method to
>>> view blast results. (And to clarify for Torsten, the blastall() is
>>> working just fine - the analysis/viewing of the results object is
>>> where
>>> I am encountering the problem.)
>>>
>>>
>>> Any other ideas would be greatly appreciated...
>>>
>>> Thank you,
>>> Genevieve
>>>
>>>
>>>
>>>
>>> Sendu Bala wrote:
>>>
>>>> Genevieve DeClerck wrote:
>>>>
>>>>> Hi,
>>>>
>>>> [snip]
>>>>
>>>>> If I've sorted the results the sorted-results will print to  
>>>>> screen,
>>>>> however when I try to print the Hit Table results nothing is
>>>>> returned,
>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>> comment out the part where i point my sorting subroutine to the
>>>>> blast
>>>>> results reference,  my hit table results suddenly prints to  
>>>>> screen.
>>>>
>>>> [snip]
>>>>
>>>>> Here's an abbreviated version of my code:
>>>>
>>>> [snip]
>>>>
>>>>> #######
>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>> # to stdout
>>>>> &sort_results($blast_report);
>>>>>
>>>>> # 2) print blast results
>>>>> &print_blast_results($blast_report);
>>>>
>>>>
>>>>> sub print_blast_results{
>>>>>    my $report = shift;
>>>>>    while(my $result = $report->next_result()){
>>>>
>>>> [snip]
>>>>
>>>> You didn't give us your sort_results subroutine, but is it as
>>>> simple as
>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>> you
>>>> don't reset the internal counter back to the start, so the second
>>>> subroutine tries to get the next_result and finds the first
>>>> subroutine
>>>> has already looked at the last result and so next_result returns
>>>> false?
>>>>
>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>> Hopefully
>>>> this can be done and someone else knows how.
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Sat Jun  3 19:54:20 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 04 Jun 2006 09:54:20 +1000
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <4482212C.3000908@infotech.monash.edu.au>

Hubert,

> another question for parsing the xml output....is there a xml parser 
> available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
> I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

I think we already answered this question for you on 20 May 2006:

http://bioperl.org/pipermail/bioperl-l/2006-May/021574.html
http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#How_to_parse_BLAST_XML_output

http://www.bioperl.org/wiki/HOWTO:SearchIO (search for "blastxml")

--Torsten Seemann


From cjfields at uiuc.edu  Sun Jun  4 01:17:46 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 00:17:46 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
Message-ID: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>

There's an interesting addition to this I found while checking this  
out; looks like if you use:

my @hits =  $result->hits;

to get all the hits, you don't need to use '$result->rewind'.  The  
rewind method resets the iterator for the hit list back back to the  
beginning, but using the hits method to grab all the hits doesn't use  
the iterator at all.  This works either pre- or post-iteration  
through the Hit::BlastHit objects.

Another thing; Genevieve was passing the SearchIO report object (i.e.  
the parser object which was returned from StandAloneBlast,  
$blast_report) to the methods, not the  
Bio::Search::Result::BlastResult object; looks like there was some  
confusion between the two object types since she refers to the report  
as the result object when it's actually the SearchIO parser object.   
So, once the parser was passed into the first method, a result object  
was generated, then destroyed.  When entering the second method, the  
parser had already read parsed the report and generated the objects,  
so it ended with no output.

Though passing the BlastResult object is better since one should only  
have to parse the report once and use the objects, for curiosity's  
sake, is there a method to rewind the parser itself (in other words,  
read through the report again)?

Chris


On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:

> In the HOWTO hits() and hsps() were there, I just added rewind in the
> table of methods.
> If someone wanted to write a little section in the HOWTO about
> resetting the iterator that would be great.
>
> -jason
> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>
>> Nice!  Didn't know I could do that.  Maybe we should add some of this
>> to the HOWTO (or is it already in there?).
>>
>> Chris
>>
>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>
>>> you can get all the Hits or hsps with the following method:
>>> my @hits = $result->hits;
>>> my @hsps = $hit->hsps;
>>>
>>>
>>> You can also reset the counter since these implementations are in-
>>> memory and already parsed (and not a stream processor per se).
>>> next_XX just iterates through the list stored in the parent object.
>>>
>>> $result->rewind;
>>>
>>>    and
>>>
>>> $hit->rewind;
>>>
>>>
>>> For example, the rewind needs to be called if you want to use a
>>> ResultWriter object and filter some of the values for the final
>>> writing after first inspecting them.
>>>
>>> -jason
>>>
>>>
>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>
>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>> must be
>>>> what's going on.. I am using $blast_report->next_result in both
>>>> subroutines. It appears that analyzing the blast results first  
>>>> w/ my
>>>> sort subroutine empties (?) the $blast_result object so that when I
>>>> try
>>>> to print, there is nothing left to print. (and visa-versa when I
>>>> print
>>>> first then try to sort).
>>>> So, from the looks of things, using next_result has the effect of
>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>> SearchIO
>>>> blast report object??
>>>>
>>>> It seems I could get around this by making a copy of the blast
>>>> report by
>>>> setting it to another new variable...(not the most elegant
>>>> solution) but
>>>> I'm having trouble with this...
>>>>
>>>> If I do:
>>>>
>>>> 	my $blast_report_copy = $blast_report;
>>>>
>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>> doesn't help me. How can I make another physical copy of this blast
>>>> result object? Seems like a simple thing but how to do it is
>>>> escaping me.
>>>>
>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>> find a
>>>> way to look at/print/sort the results without removing data from  
>>>> the
>>>> blast result object. How is this done though??
>>>>
>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>> it is
>>>> sprawling, as is a lot of my code. The code I provided was more
>>>> like an
>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>> sorry
>>>> for the confusion, I should have more clear on that.  The important
>>>> thing to know perhaps is that both sort_results and
>>>> print_blast_results
>>>> contain a foreach loop where I am using the 'next_results'  
>>>> method to
>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>> working just fine - the analysis/viewing of the results object is
>>>> where
>>>> I am encountering the problem.)
>>>>
>>>>
>>>> Any other ideas would be greatly appreciated...
>>>>
>>>> Thank you,
>>>> Genevieve
>>>>
>>>>
>>>>
>>>>
>>>> Sendu Bala wrote:
>>>>
>>>>> Genevieve DeClerck wrote:
>>>>>
>>>>>> Hi,
>>>>>
>>>>> [snip]
>>>>>
>>>>>> If I've sorted the results the sorted-results will print to
>>>>>> screen,
>>>>>> however when I try to print the Hit Table results nothing is
>>>>>> returned,
>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>> blast
>>>>>> results reference,  my hit table results suddenly prints to
>>>>>> screen.
>>>>>
>>>>> [snip]
>>>>>
>>>>>> Here's an abbreviated version of my code:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> #######
>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>> # to stdout
>>>>>> &sort_results($blast_report);
>>>>>>
>>>>>> # 2) print blast results
>>>>>> &print_blast_results($blast_report);
>>>>>
>>>>>
>>>>>> sub print_blast_results{
>>>>>>    my $report = shift;
>>>>>>    while(my $result = $report->next_result()){
>>>>>
>>>>> [snip]
>>>>>
>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>> simple as
>>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>>> you
>>>>> don't reset the internal counter back to the start, so the second
>>>>> subroutine tries to get the next_result and finds the first
>>>>> subroutine
>>>>> has already looked at the last result and so next_result returns
>>>>> false?
>>>>>
>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>> Hopefully
>>>>> this can be done and someone else knows how.
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sun Jun  4 10:08:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 4 Jun 2006 10:08:29 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
Message-ID: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>

right - you don't need rewind if you aren't going to use the iterator  
(next_XXX) -- we provide two different ways to get access to the data.
you can do
for my $hit ( $result->hits ) {

}
or
while( my $hit = $result->next_hit ) {
}


If you want to rewind the parser then (assuming you are using a  
filestream and not a data stream from the web or zcat or something)  
just reset the filehandle
seek($searchio->_fh, 0);

but then you'll have to re-parse everything and pay that cost twice -  
it makes more sense to me to just save the results and put them in  
list if you are going to deliberately make two passes over all the  
results.    You either pay the cost of memory (keeping all the  
objects) or time (reparse the results).


-jason
On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:

> There's an interesting addition to this I found while checking this  
> out; looks like if you use:
>
> my @hits =  $result->hits;
>
> to get all the hits, you don't need to use '$result->rewind'.  The  
> rewind method resets the iterator for the hit list back back to the  
> beginning, but using the hits method to grab all the hits doesn't  
> use the iterator at all.  This works either pre- or post-iteration  
> through the Hit::BlastHit objects.
>
> Another thing; Genevieve was passing the SearchIO report object  
> (i.e. the parser object which was returned from StandAloneBlast,  
> $blast_report) to the methods, not the  
> Bio::Search::Result::BlastResult object; looks like there was some  
> confusion between the two object types since she refers to the  
> report as the result object when it's actually the SearchIO parser  
> object.  So, once the parser was passed into the first method, a  
> result object was generated, then destroyed.  When entering the  
> second method, the parser had already read parsed the report and  
> generated the objects, so it ended with no output.
>
> Though passing the BlastResult object is better since one should  
> only have to parse the report once and use the objects, for  
> curiosity's sake, is there a method to rewind the parser itself (in  
> other words, read through the report again)?
>
> Chris
>
>
> On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:
>
>> In the HOWTO hits() and hsps() were there, I just added rewind in the
>> table of methods.
>> If someone wanted to write a little section in the HOWTO about
>> resetting the iterator that would be great.
>>
>> -jason
>> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>>
>>> Nice!  Didn't know I could do that.  Maybe we should add some of  
>>> this
>>> to the HOWTO (or is it already in there?).
>>>
>>> Chris
>>>
>>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>>
>>>> you can get all the Hits or hsps with the following method:
>>>> my @hits = $result->hits;
>>>> my @hsps = $hit->hsps;
>>>>
>>>>
>>>> You can also reset the counter since these implementations are in-
>>>> memory and already parsed (and not a stream processor per se).
>>>> next_XX just iterates through the list stored in the parent object.
>>>>
>>>> $result->rewind;
>>>>
>>>>    and
>>>>
>>>> $hit->rewind;
>>>>
>>>>
>>>> For example, the rewind needs to be called if you want to use a
>>>> ResultWriter object and filter some of the values for the final
>>>> writing after first inspecting them.
>>>>
>>>> -jason
>>>>
>>>>
>>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>>
>>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>>> must be
>>>>> what's going on.. I am using $blast_report->next_result in both
>>>>> subroutines. It appears that analyzing the blast results first  
>>>>> w/ my
>>>>> sort subroutine empties (?) the $blast_result object so that  
>>>>> when I
>>>>> try
>>>>> to print, there is nothing left to print. (and visa-versa when I
>>>>> print
>>>>> first then try to sort).
>>>>> So, from the looks of things, using next_result has the effect of
>>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>>> SearchIO
>>>>> blast report object??
>>>>>
>>>>> It seems I could get around this by making a copy of the blast
>>>>> report by
>>>>> setting it to another new variable...(not the most elegant
>>>>> solution) but
>>>>> I'm having trouble with this...
>>>>>
>>>>> If I do:
>>>>>
>>>>> 	my $blast_report_copy = $blast_report;
>>>>>
>>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>>> doesn't help me. How can I make another physical copy of this  
>>>>> blast
>>>>> result object? Seems like a simple thing but how to do it is
>>>>> escaping me.
>>>>>
>>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>>> find a
>>>>> way to look at/print/sort the results without removing data  
>>>>> from the
>>>>> blast result object. How is this done though??
>>>>>
>>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>>> it is
>>>>> sprawling, as is a lot of my code. The code I provided was more
>>>>> like an
>>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>>> sorry
>>>>> for the confusion, I should have more clear on that.  The  
>>>>> important
>>>>> thing to know perhaps is that both sort_results and
>>>>> print_blast_results
>>>>> contain a foreach loop where I am using the 'next_results'  
>>>>> method to
>>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>>> working just fine - the analysis/viewing of the results object is
>>>>> where
>>>>> I am encountering the problem.)
>>>>>
>>>>>
>>>>> Any other ideas would be greatly appreciated...
>>>>>
>>>>> Thank you,
>>>>> Genevieve
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sendu Bala wrote:
>>>>>
>>>>>> Genevieve DeClerck wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> If I've sorted the results the sorted-results will print to
>>>>>>> screen,
>>>>>>> however when I try to print the Hit Table results nothing is
>>>>>>> returned,
>>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>>> blast
>>>>>>> results reference,  my hit table results suddenly prints to
>>>>>>> screen.
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> Here's an abbreviated version of my code:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> #######
>>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>>> # to stdout
>>>>>>> &sort_results($blast_report);
>>>>>>>
>>>>>>> # 2) print blast results
>>>>>>> &print_blast_results($blast_report);
>>>>>>
>>>>>>
>>>>>>> sub print_blast_results{
>>>>>>>    my $report = shift;
>>>>>>>    while(my $result = $report->next_result()){
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>>> simple as
>>>>>> they both use $report->next_result (and/or $result->next_hit),  
>>>>>> but
>>>>>> you
>>>>>> don't reset the internal counter back to the start, so the second
>>>>>> subroutine tries to get the next_result and finds the first
>>>>>> subroutine
>>>>>> has already looked at the last result and so next_result returns
>>>>>> false?
>>>>>>
>>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>>> Hopefully
>>>>>> this can be done and someone else knows how.
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From cjfields at uiuc.edu  Sun Jun  4 11:51:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 10:51:53 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
Message-ID: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>


On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:

> right - you don't need rewind if you aren't going to use the  
> iterator (next_XXX) -- we provide two different ways to get access  
> to the data.
> you can do
> for my $hit ( $result->hits ) {
>
> }
> or
> while( my $hit = $result->next_hit ) {
> }
>
>
> If you want to rewind the parser then (assuming you are using a  
> filestream and not a data stream from the web or zcat or something)  
> just reset the filehandle
> seek($searchio->_fh, 0);
>
> but then you'll have to re-parse everything and pay that cost twice  
> - it makes more sense to me to just save the results and put them  
> in list if you are going to deliberately make two passes over all  
> the results.    You either pay the cost of memory (keeping all the  
> objects) or time (reparse the results).

I agree there isn't any really good reason to rewind the parser; I  
was mainly just curious how this was accomlished.  Your point about a  
memory or time hit might be a point we want to make in the HOWTO.  I  
already added some example code about rewinding the iterator and  
hits, so I'll add a bit about this.

I think a good deal of confusion here comes from not knowing how  
SearchIO works (i.e. that parsing a report can return several  
results, in turn which can return hits, in tur returning HSP's).  Of  
course that doesn't include iterations in the case of PSI-BLAST.    
The HOWTO, I think, explains this all well so it may be a matter of  
just RTM (I left the 'F' out to be a bit more polite).

Chris

> -jason
> On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:
>
...


Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Mon Jun  5 04:16:59 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 05 Jun 2006 16:16:59 +0800
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
Message-ID: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>


Dear Lincoln and experts

Curently I have a CGI application that does this:

1.  read and uploaded file 
2. check the content of the file whether fasta or not
3. print out the content of the file.


Now the problem I'm facing is that
on step three. The content of the file handled is altered
namely the very first line does not get printed. 

So for example if "test1.fasta" looks like this:

>Seq0
ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC
 
When it was printed it gives only:

ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC

Why is this happening? 

Below is the complete cgi script that 
does the task  I mentioned earlier.

Did I missed out anything in my code?


__BEGIN__
#!/usr/bin/perl -w

use CGI qw/:standard :html3/;
use CGI::Carp qw( fatalsToBrowser );
use Data::Dumper;

BEGIN {
    if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {

        # Blindly untaint.  Taintchecking is to protect
        # from Web data;
        # the environment is under our control.
        eval "use lib '$_';" foreach (
            reverse
            split( /:/, $1 )
        );
    }
}


use Bio::Tools::GuessSeqFormat;

print header,
    start_html('file upload'),
    h1('file upload!');
print_form()    unless param;
print_results() if param;
print end_html;

sub print_form {
    print start_multipart_form(),
       filefield(-name=>'upload',-size=>60),br,
       submit(-label=>'Upload File'),
       end_form;
}

sub print_results {
    my $length;
    my $file = param('upload');
    my $fh_upload = upload('upload');

    my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload );
    my $format_upload  = $guesser_upload->guess;

    if ( !$file ) {
        print "No file uploaded.";
        return;
    }
    print h2('File name'),      $file;
    print h2('Format'), $format_upload;
    print h2('The content is'),br;

    while (<$fh_upload>) {

     # The very first line of the file is not get printed here
     # Why?

        print;
        print br;
        $length += length($_);
    }
    print h2('File length'), $length;
}


__END__

Hope to hear from you again.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 05:02:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:02:48 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <4483F338.7090909@mrc-dunn.cam.ac.uk>

Wijaya Edward wrote:
> Dear Lincoln and experts
> 
> Curently I have a CGI application that does this:
> 
> 1.  read and uploaded file 
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
> 
> 
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed. 

The problem is almost certainly that the guessing is done by reading the 
first line of the filehandle, so that your subsequent while loop on that 
same filehandle starts at the second line.
Just seek the filehandle back to the start before trying to print the 
contents out.

...
my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
my $format_upload  = $guesser_upload->guess;
seek($fh_upload, 0, 0);
...
while (<$fh_upload>) {
     ...
}

An alternative might be to pass GuessSeqFormat the filename in which 
case it would make its own filehandle and close it, leaving your own 
filehandle untouched.


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 05:57:52 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:57:52 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
Message-ID: <44840020.4020604@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
> 
>> If you want to rewind the parser then (assuming you are using a 
>> filestream and not a data stream from the web or zcat or something) 
>> just reset the filehandle
>> seek($searchio->_fh, 0);
>>
>> but then you'll have to re-parse everything and pay that cost twice - 
>> it makes more sense to me to just save the results and put them in 
>> list if you are going to deliberately make two passes over all the 
>> results.    You either pay the cost of memory (keeping all the 
>> objects) or time (reparse the results).
> 
> I agree there isn't any really good reason to rewind the parser; I was 
> mainly just curious how this was accomlished.

Didn't you already explain why seeking a SearchIO wouldn't work? And 
indeed, didn't Genevieve already try to do this after I suggested it and 
  found that it didn't work?

Confused...


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 09:19:12 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 14:19:12 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
Message-ID: <44842F50.7090408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> 
> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
> 
>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>> indeed, didn't Genevieve already try to do this after I suggested it and
>> found that it didn't work?
>>
>> Confused...
>>
> There is an internal _rewind if you are using the next_XX methods that 
> resets the internal iterator (all the data has already been parsed).
> 
> You >>can<< reseek the internal filehandle (accessible by calling 
> $object->_fh ), but you can't call seek on the searchio object itsself.

... poor choice of words on my part. Or maybe I'm not understanding 
you... I already suggested to Genevieve that she try:

# in the following, $blast_report is a SearchIO
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);

She reported that neither way of doing it worked. You seem to be saying 
that at least the second way should have. Is that right?
rewind() would of course be preferable, I just wanted to know if my 
assumption about seek working was correct or not.


From jason at bioperl.org  Mon Jun  5 09:45:40 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:45:40 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44842F50.7090408@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
Message-ID: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>

It depends on how you have run StandAloneBlast -- if the stream you  
are dealing with is not a file, but a datastream as in the STDOUT  
from BLAST, then the seek won't work (as it wouldn't work for a zcat  
on gzipped file).  I think the default StandAloneBlast behavior is to  
operate on a STDOUT stream so seeking won't work no matter what.


On Jun 5, 2006, at 9:19 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
>>
>>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>>> indeed, didn't Genevieve already try to do this after I suggested  
>>> it and
>>> found that it didn't work?
>>>
>>> Confused...
>>>
>> There is an internal _rewind if you are using the next_XX methods  
>> that
>> resets the internal iterator (all the data has already been parsed).
>>
>> You >>can<< reseek the internal filehandle (accessible by calling
>> $object->_fh ), but you can't call seek on the searchio object  
>> itsself.
>
> ... poor choice of words on my part. Or maybe I'm not understanding
> you... I already suggested to Genevieve that she try:
>
> # in the following, $blast_report is a SearchIO
>> my $blast_report = $factory->blastall($ref_seq_objs);
>> my $blast_fh = $blast_report->fh();
>> while (<$blast_fh>) {
>>      # $_ is a ResultI object, use as normal
>> }
>> seek($blast_fh, 0, 0); # this would be great, but does it work?
>> while <$blast_fh>) {
>>      # go through the results again in your second subroutine
>> }
>>
>> An alternative hacky way of doing it, which may also not work,  
>> would be
>> to go through your $blast_report as normal, but then before going
>> through it a second time, say
>> my $fh = $blast_report->_fh;
>> seek($fh, 0, 0);
>
> She reported that neither way of doing it worked. You seem to be  
> saying
> that at least the second way should have. Is that right?
> rewind() would of course be preferable, I just wanted to know if my
> assumption about seek working was correct or not.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 10:13:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 15:13:03 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
	<E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
Message-ID: <44843BEF.6080609@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> It depends on how you have run StandAloneBlast -- if the stream you are 
> dealing with is not a file, but a datastream as in the STDOUT from 
> BLAST, then the seek won't work (as it wouldn't work for a zcat on 
> gzipped file).  I think the default StandAloneBlast behavior is to 
> operate on a STDOUT stream so seeking won't work no matter what.

As far as I can see, when you say blastall() on a StandAloneBlast, it 
eventually does:

if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
     $blast_obj = Bio::SearchIO->new(-file=>$outfile,
			            -format => 'blast' );
}

So seeking should work? Tools like StandAloneBlast creating temp files 
for their results prior to parsing is actually one of things I don't 
like about the bioperl tool system.


From lstein at cshl.edu  Mon Jun  5 10:51:52 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 5 Jun 2006 10:51:52 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <200606051051.52648.lstein@cshl.edu>

Hi,

From the Synopsis for GuessSeqFormat:

           # To guess the format from an already open filehandle:
           my $guesser = new Bio::Tools::GuessSeqFormat( -fh => $filehandle );
           my $format  = $guesser->guess;
           # If the filehandle is seekable (STDIN isn't), it will be
           # returned to its original position.

The filehandle returned by CGI.pm is not seekable.

Lincoln

On Monday 05 June 2006 04:16, Wijaya Edward wrote:
> Dear Lincoln and experts
>
> Curently I have a CGI application that does this:
>
> 1.  read and uploaded file
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
>
>
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed.
>
> So for example if "test1.fasta" looks like this:
> >Seq0
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> When it was printed it gives only:
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> Why is this happening?
>
> Below is the complete cgi script that
> does the task  I mentioned earlier.
>
> Did I missed out anything in my code?
>
>
>
> __BEGIN__
> #!/usr/bin/perl -w
>
> use CGI qw/:standard :html3/;
> use CGI::Carp qw( fatalsToBrowser );
> use Data::Dumper;
>
> BEGIN {
>     if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {
>
>         # Blindly untaint.  Taintchecking is to protect
>         # from Web data;
>         # the environment is under our control.
>         eval "use lib '$_';" foreach (
>             reverse
>             split( /:/, $1 )
>         );
>     }
> }
>
>
> use Bio::Tools::GuessSeqFormat;
>
> print header,
>     start_html('file upload'),
>     h1('file upload!');
> print_form()    unless param;
> print_results() if param;
> print end_html;
>
> sub print_form {
>     print start_multipart_form(),
>        filefield(-name=>'upload',-size=>60),br,
>        submit(-label=>'Upload File'),
>        end_form;
> }
>
> sub print_results {
>     my $length;
>     my $file = param('upload');
>     my $fh_upload = upload('upload');
>
>     my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload
> ); my $format_upload  = $guesser_upload->guess;
>
>     if ( !$file ) {
>         print "No file uploaded.";
>         return;
>     }
>     print h2('File name'),      $file;
>     print h2('Format'), $format_upload;
>     print h2('The content is'),br;
>
>     while (<$fh_upload>) {
>
>      # The very first line of the file is not get printed here
>      # Why?
>
>         print;
>         print br;
>         $length += length($_);
>     }
>     print h2('File length'), $length;
> }
>
>
> __END__
>
> Hope to hear from you again.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the
> intended recipient, please delete it and notify us immediately. Please do
> not copy or use it for any purpose, or disclose its contents to any other
> person. Thank you. --------------------------------------------------------

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060605/0d6f7bb0/attachment-0003.bin>

From cjfields at uiuc.edu  Mon Jun  5 12:30:41 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 11:30:41 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44843BEF.6080609@mrc-dunn.cam.ac.uk>
Message-ID: <006001c688bd$62d48850$15327e82@pyrimidine>

If you want flexibility or added functionality then you can always
contribute a patch, such as adding an option for filehandles, IO::String,
pipes/forks, or whatever you wish.  Or you could suggest such to the module
maintainer, Torsten, and then it's his choice whether he wants to make it a
priority to implement it.  Simply stating this is 'one of things I don't
like about the bioperl tool system' isn't productive here.   It hasn't been
a top priority to implement something along those lines since the module
works for them as is, so if you want these options you'll have to add them,
and add the appropriate tests.

As for the seek issue, the file handle you get by using '$blast_report-fh()'
isn't the raw input file stream but is a tied filehandle of a stream of
ResultI objects:
==================================
Jason's version:
# seek called on the >>internal<< filehandle (from Bio::Root::IO)
# this is the raw data input stream from a file, so should work
seek($searchio->_fh, 0);
==================================
Your version:
# seek called on SearchIO object filehandle
my $blast_report = $factory->blastall($ref_seq_objs);
# this is a tied filehandle for an output stream of objects from SearchIO,
# NOT the raw input stream
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
	# a stream of Bio::Search::Result::BlastResult objects 
} 
# can't use seek on a tied filehandle, won't work unless 
# SEEK class method is implemented (and it's not)
seek($blast_fh, 0, 0); 
==================================

There's a good deal in Programming Perl about tied filehandles.  You'll
notice that Bio::SearchIO implements TIEHANDLE, READLINE, DESTROY, and PRINT
methods, but not SEEK since we've never needed it.  You can always add one
if you want but I really don't see the point based on reasons Jason and I
outlined before.

Seems there is not much overall documentation on newFh or $blast_report->fh,
but I believe it's analogous to the SeqIO version which is covered a bit in
the bptutorial file, now on the wiki:

http://www.bioperl.org/wiki/Bptutorial.pl#III.2.1_Transforming_sequence_file
s_.28SeqIO.29

$in  = Bio::SeqIO->newFh(-file => "inputfilename" ,
                          -format => 'fasta');
$out = Bio::SeqIO->newFh(-format => 'embl');
print $out $_ while <$in>;

Wouldn't hurt if someone wants to add a bit more about these to the SearchIO
HOWTO.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 05, 2006 9:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Jason Stajich wrote:
> > It depends on how you have run StandAloneBlast -- if the stream you are
> > dealing with is not a file, but a datastream as in the STDOUT from
> > BLAST, then the seek won't work (as it wouldn't work for a zcat on
> > gzipped file).  I think the default StandAloneBlast behavior is to
> > operate on a STDOUT stream so seeking won't work no matter what.
> 
> As far as I can see, when you say blastall() on a StandAloneBlast, it
> eventually does:
> 
> if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
>      $blast_obj = Bio::SearchIO->new(-file=>$outfile,
> 			            -format => 'blast' );
> }
> 
> So seeking should work? Tools like StandAloneBlast creating temp files
> for their results prior to parsing is actually one of things I don't
> like about the bioperl tool system.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Jun  5 09:02:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:02:02 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44840020.4020604@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
Message-ID: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>


On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
>>
>>> If you want to rewind the parser then (assuming you are using a
>>> filestream and not a data stream from the web or zcat or something)
>>> just reset the filehandle
>>> seek($searchio->_fh, 0);
>>>
>>> but then you'll have to re-parse everything and pay that cost  
>>> twice -
>>> it makes more sense to me to just save the results and put them in
>>> list if you are going to deliberately make two passes over all the
>>> results.    You either pay the cost of memory (keeping all the
>>> objects) or time (reparse the results).
>>
>> I agree there isn't any really good reason to rewind the parser; I  
>> was
>> mainly just curious how this was accomlished.
>
> Didn't you already explain why seeking a SearchIO wouldn't work? And
> indeed, didn't Genevieve already try to do this after I suggested  
> it and
>   found that it didn't work?
>
> Confused...
>
There is an internal _rewind if you are using the next_XX methods  
that resets the internal iterator (all the data has already been  
parsed).

You >>can<< reseek the internal filehandle (accessible by calling  
$object->_fh ), but you can't call seek on the searchio object itsself.

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 13:23:36 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 18:23:36 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006001c688bd$62d48850$15327e82@pyrimidine>
References: <006001c688bd$62d48850$15327e82@pyrimidine>
Message-ID: <44846898.8020001@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> If you want flexibility or added functionality then you can always
> contribute a patch, such as adding an option for filehandles, IO::String,
> pipes/forks, or whatever you wish.

Well, it wouldn't be a new feature per se, but just changing the way the 
modules work under the hood.


> Or you could suggest such to the module
> maintainer, Torsten, and then it's his choice whether he wants to make it a
> priority to implement it.  Simply stating this is 'one of things I don't
> like about the bioperl tool system' isn't productive here.

Yes, I apologise for that. I had thought too much would need to be 
changed and backward compatibility wouldn't be possible, but just 
changing StandAloneBlast should be possible.

I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... is there any point in working 
up a patch (would something of questionable reliability ever be 
committed into bioperl)?


> As for the seek issue, the file handle you get by using '$blast_report-fh()'
> isn't the raw input file stream but is a tied filehandle of a stream of
> ResultI objects:
> ==================================
> Jason's version:
> # seek called on the >>internal<< filehandle (from Bio::Root::IO)
> # this is the raw data input stream from a file, so should work
> seek($searchio->_fh, 0);
> ==================================
> Your version:
> # seek called on SearchIO object filehandle
> my $blast_report = $factory->blastall($ref_seq_objs);
> # this is a tied filehandle for an output stream of objects from SearchIO,
> # NOT the raw input stream
> my $blast_fh = $blast_report->fh();

For academic interest, how do I get the 'raw input stream'? Wasn't that 
what my second version did?

 > my $fh = $blast_report->_fh;
 > seek($fh, 0, 0);


From hubert.prielinger at gmx.at  Mon Jun  5 14:17:53 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 12:17:53 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>	<4480E7AA.3020603@gmx.at>
	<720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
Message-ID: <44847551.7040705@gmx.at>

hi,
you were right, removing the composition-based statistics solved the 
problem. Now I get the result viewed on STDIN, but it doesn't save the 
output in the file.
I haved tried it by reopening the file and writing it to an other file 
again, but it doesn't work.....
The strange thing is that if I retrieve text instead of xml output it 
works without any problem. Don't know why

Hubert


Chris Fields wrote:
> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>
>   
>> hi chris,
>> thanks but I never intended to run the remoteblast with so much,  
>> only a few of them, acutally I goal is to run the phiblast with  
>> regular expression, so that i just don't need that
>> file anymore
>>     
>
> Not a problem.  Just to let you know, I did manage to get the script  
> working, so I'm marking the bug INVALID.  I think the problem isn't  
> that there is an infinite loop so much as setting composition-based  
> statistics causes the search to take much much longer; try removing  
> that line to see what I mean.
>
> Just so you know, using $result->query_name doesn't get you what you  
> would expect (it gives you a part of the RID, which you don't want;  
> this is something in the XML output that is beyond our control).  You  
> might want to change it to something else or you'll get filenames  
> with numerical names.
>
>   
>> another question for parsing the xml output....is there a xml  
>> parser available for blast xml output or how to start.....
>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>> is their maybe another introduction or an example.
>>     
>
> Bio::SearchIO objects are used to parse BLAST XML output if you have  
> it saved to a file.  For instance:
>
> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>
> while (my $result = $factory->next_result) {
>    while (my $hit = $result->next_hit) {
>       while (my $hsp = $hit->next_hsp {
>          #do stuff here
>        }
>     }
> }
>
> The only thing that changes in parsing a text BLAST report from an  
> XML BLAST report is the -format line (similar to the -readmethod  
> parameter in RemoteBlast).  You shouldn't need to look up any more  
> documentation other than these on the wiki:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>
> Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
> up parsing.
>
> Chris
>
>   
>> thanks
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Yes, I see the same error you do.  But I have a similar script   
>>> (blastp, XML blast report, XML parsing, similar loop structure)  
>>> that  works fine.  I'm trying to dissect the problem but I think  
>>> it may be  something logically wrong here (something not so  
>>> obvious) and not a  bug...
>>>
>>> What I'm trying to say is, when you send sequences using  
>>> remoteblast  like, this you are essentially spamming the NCBI  
>>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>>> that intent in mind;  you should really try to set up your own  
>>> local blast database if  possible.  If you can't, try running this  
>>> script in off-hours  (10pm-6am EST or something like that).
>>>
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi,
>>>> input database: swissprot
>>>>         matrix: pam30
>>>>         count: 1
>>>>         gapcosts: 9 1
>>>>
>>>> I know that there are  a lot of sequences, but that doesn't  
>>>> matter,  you can delete all of them except one, the amount of the  
>>>> sequences  is not the problem, the script reads one line and  
>>>> submits  it.....then the second line and so on.....I have tried  
>>>> it with only  one sequence either and I got the same result....  
>>>> the script run at  that time for more than 20  
>>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>>> the results for ONE sequence, I guess
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> You need to add the input conditions as well (you have several   
>>>>> <STDIN> lines which may play a role; I would like to know what  
>>>>> you  normally enter for those).
>>>>>
>>>>> How long did you let the script run?  I ran a quick check on  
>>>>> your  sequences; you have almost 1600, so you have to expect  
>>>>> that you'll  run into some problems here!  Most here (including  
>>>>> me) would  suggest you try installing a local blast setup for  
>>>>> something like  this.
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> I have submitted the bug -> Bug 2017
>>>>>> with the script and input file, just start it from command line
>>>>>>
>>>>>> thank you very much
>>>>>> greetings
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Hubert,
>>>>>>>
>>>>>>> I have a script that's using blastxml and XML output which  
>>>>>>> seems  to work.
>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>>> 'Sendu  Bala'
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>>> run  several
>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>> I didn't get any results.
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Sendu, Hubert,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>>> the  problem
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> (break
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>>> RemoteBlast in
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> CVS;
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>>> CVS  to see if
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> it
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> hi,
>>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>>> for  retrieving
>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>>> anymore.....
>>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>>> the  query, but
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>> I
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>
>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>>> over..... it
>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>>> the  NCBI server
>>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>>> do  a blast,
>>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>>> normal  'waiting'
>>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>>> seconds,  but
>>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>>> an  xml error
>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>
>>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>>> treats no
>>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>>> patch  to fix this
>>>>>>>>>> at this bug page:
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Mon Jun  5 14:32:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:32:47 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44847551.7040705@gmx.at>
Message-ID: <006101c688ce$7185c330$15327e82@pyrimidine>

Hubert, 

Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
option to save XML was committed relatively recently (last month or so).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Monday, June 05, 2006 1:18 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> you were right, removing the composition-based statistics solved the
> problem. Now I get the result viewed on STDIN, but it doesn't save the
> output in the file.
> I haved tried it by reopening the file and writing it to an other file
> again, but it doesn't work.....
> The strange thing is that if I retrieve text instead of xml output it
> works without any problem. Don't know why
> 
> Hubert
> 
> 
> 
> Chris Fields wrote:
> > On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
> >
> >
> >> hi chris,
> >> thanks but I never intended to run the remoteblast with so much,
> >> only a few of them, acutally I goal is to run the phiblast with
> >> regular expression, so that i just don't need that
> >> file anymore
> >>
> >
> > Not a problem.  Just to let you know, I did manage to get the script
> > working, so I'm marking the bug INVALID.  I think the problem isn't
> > that there is an infinite loop so much as setting composition-based
> > statistics causes the search to take much much longer; try removing
> > that line to see what I mean.
> >
> > Just so you know, using $result->query_name doesn't get you what you
> > would expect (it gives you a part of the RID, which you don't want;
> > this is something in the XML output that is beyond our control).  You
> > might want to change it to something else or you'll get filenames
> > with numerical names.
> >
> >
> >> another question for parsing the xml output....is there a xml
> >> parser available for blast xml output or how to start.....
> >> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
> >> but I'm not sure how to start....sorry, I guess I'm too stupid....
> >> is their maybe another introduction or an example.
> >>
> >
> > Bio::SearchIO objects are used to parse BLAST XML output if you have
> > it saved to a file.  For instance:
> >
> > my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
> >
> > while (my $result = $factory->next_result) {
> >    while (my $hit = $result->next_hit) {
> >       while (my $hsp = $hit->next_hsp {
> >          #do stuff here
> >        }
> >     }
> > }
> >
> > The only thing that changes in parsing a text BLAST report from an
> > XML BLAST report is the -format line (similar to the -readmethod
> > parameter in RemoteBlast).  You shouldn't need to look up any more
> > documentation other than these on the wiki:
> >
> > http://www.bioperl.org/wiki/HOWTO:SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
> >
> > Pay attention to the fact you'll need to install XML::SAX (CPAN) and
> > that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
> > up parsing.
> >
> > Chris
> >
> >
> >> thanks
> >> Hubert
> >>
> >>
> >> Chris Fields wrote:
> >>
> >>> Yes, I see the same error you do.  But I have a similar script
> >>> (blastp, XML blast report, XML parsing, similar loop structure)
> >>> that  works fine.  I'm trying to dissect the problem but I think
> >>> it may be  something logically wrong here (something not so
> >>> obvious) and not a  bug...
> >>>
> >>> What I'm trying to say is, when you send sequences using
> >>> remoteblast  like, this you are essentially spamming the NCBI
> >>> BLAST server with  ~1600 requests.  This script wasn't set up with
> >>> that intent in mind;  you should really try to set up your own
> >>> local blast database if  possible.  If you can't, try running this
> >>> script in off-hours  (10pm-6am EST or something like that).
> >>>
> >>>
> >>> Chris
> >>>
> >>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
> >>>
> >>>
> >>>
> >>>> hi,
> >>>> input database: swissprot
> >>>>         matrix: pam30
> >>>>         count: 1
> >>>>         gapcosts: 9 1
> >>>>
> >>>> I know that there are  a lot of sequences, but that doesn't
> >>>> matter,  you can delete all of them except one, the amount of the
> >>>> sequences  is not the problem, the script reads one line and
> >>>> submits  it.....then the second line and so on.....I have tried
> >>>> it with only  one sequence either and I got the same result....
> >>>> the script run at  that time for more than 20
> >>>> minutes!!!!!! .....and that should be  enough time to retrieve
> >>>> the results for ONE sequence, I guess
> >>>>
> >>>> regards
> >>>> Hubert
> >>>>
> >>>>
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> You need to add the input conditions as well (you have several
> >>>>> <STDIN> lines which may play a role; I would like to know what
> >>>>> you  normally enter for those).
> >>>>>
> >>>>> How long did you let the script run?  I ran a quick check on
> >>>>> your  sequences; you have almost 1600, so you have to expect
> >>>>> that you'll  run into some problems here!  Most here (including
> >>>>> me) would  suggest you try installing a local blast setup for
> >>>>> something like  this.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> hi,
> >>>>>> I have submitted the bug -> Bug 2017
> >>>>>> with the script and input file, just start it from command line
> >>>>>>
> >>>>>> thank you very much
> >>>>>> greetings
> >>>>>>
> >>>>>> Hubert
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Hubert,
> >>>>>>>
> >>>>>>> I have a script that's using blastxml and XML output which
> >>>>>>> seems  to work.
> >>>>>>> I'll try looking at it to get a better idea this weekend.
> >>>>>>>
> >>>>>>> Chris
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> >>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
> >>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
> >>>>>>>> 'Sendu  Bala'
> >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>
> >>>>>>>> hi,
> >>>>>>>> sorry, but I have updated the remoteblast module and I have
> >>>>>>>> run  several
> >>>>>>>> attempts with the same results as before. It didn't work.
> >>>>>>>> I didn't get any results.
> >>>>>>>>
> >>>>>>>> regards
> >>>>>>>> Hubert
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Chris Fields wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Sendu, Hubert,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
> >>>>>>>>> the  problem
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> (break
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
> >>>>>>>>> RemoteBlast in
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> CVS;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
> >>>>>>>>> CVS  to see if
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> it
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> works.
> >>>>>>>>>
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
> >>>>>>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>>>
> >>>>>>>>>> Hubert Prielinger wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> hi,
> >>>>>>>>>>> I have the following program and it worked quite well,
> >>>>>>>>>>> for  retrieving
> >>>>>>>>>>> remoteblast results in a textfile,
> >>>>>>>>>>> now I have altered it to to xml, and it didn't work
> >>>>>>>>>>> anymore.....
> >>>>>>>>>>> it takes all the parameter at the commandline, submits
> >>>>>>>>>>> the  query, but
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>> I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> don't retrieve any results file anymore.....
> >>>>>>>>>>>
> >>>>>>>>>>> it seems that it hangs in a endless loop......
> >>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
> >>>>>>>>>>> over..... it
> >>>>>>>>>>> doesn't enter the else term anymore....
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> There is no problem with your code. The problem is with
> >>>>>>>>>> the  NCBI server
> >>>>>>>>>> and should be reported to them. You can visit the site and
> >>>>>>>>>> do  a blast,
> >>>>>>>>>> requesting xml format, and you will typically get one
> >>>>>>>>>> normal  'waiting'
> >>>>>>>>>> message and the promise that it will be updated in x
> >>>>>>>>>> seconds,  but
> >>>>>>>>>> subsequent attempts to get progress information result in
> >>>>>>>>>> an  xml error
> >>>>>>>>>> page because the NCBI server doesn't actually send any data.
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately the way that the bioperl code is written, it
> >>>>>>>>>> treats no
> >>>>>>>>>> data as 'waiting' instead of an error. I've offered a
> >>>>>>>>>> patch  to fix this
> >>>>>>>>>> at this bug page:
> >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>> Christopher Fields
> >>> Postdoctoral Researcher
> >>> Lab of Dr. Robert Switzer
> >>> Dept of Biochemistry
> >>> University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jun  5 14:56:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:56:18 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44846898.8020001@mrc-dunn.cam.ac.uk>
Message-ID: <006201c688d1$bad2aff0$15327e82@pyrimidine>


> Chris Fields wrote:
> > If you want flexibility or added functionality then you can always
> > contribute a patch, such as adding an option for filehandles,
> IO::String,
> > pipes/forks, or whatever you wish.
> 
> Well, it wouldn't be a new feature per se, but just changing the way the
> modules work under the hood.

...

> I use IPC::Open3 for blasts and have never run into problems, but it
> pretty much falls into the 'apt to cause deadlock' camp. It may pass
> tests on one machine but fail on others... is there any point in working
> up a patch (would something of questionable reliability ever be
> committed into bioperl)?

The main thing you should avoid is major API changes or issues which break
this module on other OS's.  I'm not sure that StandAloneBlast is 'broken' by
using a tempfile as the location of the BLAST report.  

Any way you go about it, you'll have to capture the BLAST output as a stream
and get it to persist in a SearchIO object somehow.  It's can be a pretty
decent memory hit to keep that report hanging around, esp. if it is larger.

...

> For academic interest, how do I get the 'raw input stream'? Wasn't that
> what my second version did?
> 
>  > my $fh = $blast_report->_fh;
>  > seek($fh, 0, 0);

That should work, yes.  Didn't see that one your previous response.  I can
get it work w/o problems with SearchIO directly but I haven't tried it with
StandAloneBlast.  Below is my script.  Commenting the seek line below
doesn't move the file pointer so the second round of parsing won't happen.

my $parser = Bio::SearchIO->new(  -file => shift,
                                  -format => 'blast');

my $fh = $parser->_fh;

while (<$fh>) {
     print;
}

seek($fh, 0,0);

$fh = $parser->fh;

print "Second round:\n";
while (<$fh>) {
    while (my $hit = $_->next_hit) {
        print $hit->accession,"\n";
    }
}


Chris


From hubert.prielinger at gmx.at  Mon Jun  5 15:12:37 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 13:12:37 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <006101c688ce$7185c330$15327e82@pyrimidine>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
Message-ID: <44848225.8080003@gmx.at>

hi chris,
sorry, I have tried it with the latest CVS version:

# $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $

but it still doesn't work.

Hubert

Chris Fields wrote:
> Hubert, 
>
> Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
> option to save XML was committed relatively recently (last month or so).
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Monday, June 05, 2006 1:18 PM
>> To: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> you were right, removing the composition-based statistics solved the
>> problem. Now I get the result viewed on STDIN, but it doesn't save the
>> output in the file.
>> I haved tried it by reopening the file and writing it to an other file
>> again, but it doesn't work.....
>> The strange thing is that if I retrieve text instead of xml output it
>> works without any problem. Don't know why
>>
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi chris,
>>>> thanks but I never intended to run the remoteblast with so much,
>>>> only a few of them, acutally I goal is to run the phiblast with
>>>> regular expression, so that i just don't need that
>>>> file anymore
>>>>
>>>>         
>>> Not a problem.  Just to let you know, I did manage to get the script
>>> working, so I'm marking the bug INVALID.  I think the problem isn't
>>> that there is an infinite loop so much as setting composition-based
>>> statistics causes the search to take much much longer; try removing
>>> that line to see what I mean.
>>>
>>> Just so you know, using $result->query_name doesn't get you what you
>>> would expect (it gives you a part of the RID, which you don't want;
>>> this is something in the XML output that is beyond our control).  You
>>> might want to change it to something else or you'll get filenames
>>> with numerical names.
>>>
>>>
>>>       
>>>> another question for parsing the xml output....is there a xml
>>>> parser available for blast xml output or how to start.....
>>>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
>>>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>>>> is their maybe another introduction or an example.
>>>>
>>>>         
>>> Bio::SearchIO objects are used to parse BLAST XML output if you have
>>> it saved to a file.  For instance:
>>>
>>> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>>>
>>> while (my $result = $factory->next_result) {
>>>    while (my $hit = $result->next_hit) {
>>>       while (my $hsp = $hit->next_hsp {
>>>          #do stuff here
>>>        }
>>>     }
>>> }
>>>
>>> The only thing that changes in parsing a text BLAST report from an
>>> XML BLAST report is the -format line (similar to the -readmethod
>>> parameter in RemoteBlast).  You shouldn't need to look up any more
>>> documentation other than these on the wiki:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>>>
>>> Pay attention to the fact you'll need to install XML::SAX (CPAN) and
>>> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
>>> up parsing.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> thanks
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Yes, I see the same error you do.  But I have a similar script
>>>>> (blastp, XML blast report, XML parsing, similar loop structure)
>>>>> that  works fine.  I'm trying to dissect the problem but I think
>>>>> it may be  something logically wrong here (something not so
>>>>> obvious) and not a  bug...
>>>>>
>>>>> What I'm trying to say is, when you send sequences using
>>>>> remoteblast  like, this you are essentially spamming the NCBI
>>>>> BLAST server with  ~1600 requests.  This script wasn't set up with
>>>>> that intent in mind;  you should really try to set up your own
>>>>> local blast database if  possible.  If you can't, try running this
>>>>> script in off-hours  (10pm-6am EST or something like that).
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> input database: swissprot
>>>>>>         matrix: pam30
>>>>>>         count: 1
>>>>>>         gapcosts: 9 1
>>>>>>
>>>>>> I know that there are  a lot of sequences, but that doesn't
>>>>>> matter,  you can delete all of them except one, the amount of the
>>>>>> sequences  is not the problem, the script reads one line and
>>>>>> submits  it.....then the second line and so on.....I have tried
>>>>>> it with only  one sequence either and I got the same result....
>>>>>> the script run at  that time for more than 20
>>>>>> minutes!!!!!! .....and that should be  enough time to retrieve
>>>>>> the results for ONE sequence, I guess
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> You need to add the input conditions as well (you have several
>>>>>>> <STDIN> lines which may play a role; I would like to know what
>>>>>>> you  normally enter for those).
>>>>>>>
>>>>>>> How long did you let the script run?  I ran a quick check on
>>>>>>> your  sequences; you have almost 1600, so you have to expect
>>>>>>> that you'll  run into some problems here!  Most here (including
>>>>>>> me) would  suggest you try installing a local blast setup for
>>>>>>> something like  this.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> hi,
>>>>>>>> I have submitted the bug -> Bug 2017
>>>>>>>> with the script and input file, just start it from command line
>>>>>>>>
>>>>>>>> thank you very much
>>>>>>>> greetings
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Hubert,
>>>>>>>>>
>>>>>>>>> I have a script that's using blastxml and XML output which
>>>>>>>>> seems  to work.
>>>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
>>>>>>>>>> 'Sendu  Bala'
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> sorry, but I have updated the remoteblast module and I have
>>>>>>>>>> run  several
>>>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>>>> I didn't get any results.
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Chris Fields wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Sendu, Hubert,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
>>>>>>>>>>> the  problem
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> (break
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
>>>>>>>>>>> RemoteBlast in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> CVS;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
>>>>>>>>>>> CVS  to see if
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>>>
>>>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>>>> hi,
>>>>>>>>>>>>> I have the following program and it worked quite well,
>>>>>>>>>>>>> for  retrieving
>>>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>>>> now I have altered it to to xml, and it didn't work
>>>>>>>>>>>>> anymore.....
>>>>>>>>>>>>> it takes all the parameter at the commandline, submits
>>>>>>>>>>>>> the  query, but
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>>>
>>>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
>>>>>>>>>>>>> over..... it
>>>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> There is no problem with your code. The problem is with
>>>>>>>>>>>> the  NCBI server
>>>>>>>>>>>> and should be reported to them. You can visit the site and
>>>>>>>>>>>> do  a blast,
>>>>>>>>>>>> requesting xml format, and you will typically get one
>>>>>>>>>>>> normal  'waiting'
>>>>>>>>>>>> message and the promise that it will be updated in x
>>>>>>>>>>>> seconds,  but
>>>>>>>>>>>> subsequent attempts to get progress information result in
>>>>>>>>>>>> an  xml error
>>>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the way that the bioperl code is written, it
>>>>>>>>>>>> treats no
>>>>>>>>>>>> data as 'waiting' instead of an error. I've offered a
>>>>>>>>>>>> patch  to fix this
>>>>>>>>>>>> at this bug page:
>>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher
>>>>>>> Lab of Dr. Robert Switzer
>>>>>>> Dept of Biochemistry
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 15:14:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 20:14:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006201c688d1$bad2aff0$15327e82@pyrimidine>
References: <006201c688d1$bad2aff0$15327e82@pyrimidine>
Message-ID: <44848280.1080703@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>>> If you want flexibility or added functionality then you can 
>>> always contribute a patch, such as adding an option for 
>>> filehandles, IO::String, pipes/forks, or whatever you wish.
>> 
>> Well, it wouldn't be a new feature per se, but just changing the 
>> way the modules work under the hood.
> 
> ...
> 
>> I use IPC::Open3 for blasts and have never run into problems, but 
>> it pretty much falls into the 'apt to cause deadlock' camp. It may
>> pass tests on one machine but fail on others... is there any point
>> in working up a patch (would something of questionable reliability
>> ever be committed into bioperl)?
> 
> The main thing you should avoid is major API changes or issues which
> break this module on other OS's.  I'm not sure that StandAloneBlast
> is 'broken' by using a tempfile as the location of the BLAST report.
> 
> 
> 
> Any way you go about it, you'll have to capture the BLAST output as a
> stream and get it to persist in a SearchIO object somehow.  It's can
> be a pretty decent memory hit to keep that report hanging around, 
> esp. if it is larger.

Well at the moment StandAloneBlast runs the blast program and stores its
output to a temp file, then gives the temp file name as an arg to
SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
of the blast program directly to my parser. The question is, why wasn't
this done in StandAloneBlast? I would get the blast program output
handle and pass it directly to SearchIO with the -fh option of new().
The only difference here is it's faster and more efficient with the
direct pipe, but you can't subsequently seek the SearchIO's internal
filehandle (as we discussing in this thread). There are no (additional)
issues with memory.

If it isn't done using IPC::Open3 (or similar) because the original
author already knew it wouldn't be reliable enough, or for some other
reason(s), fine. Does anyone know the reasons?


From cjfields at uiuc.edu  Mon Jun  5 15:43:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 14:43:50 -0500
Subject: [Bioperl-l] StandAloneBlast
In-Reply-To: <44848280.1080703@mrc-dunn.cam.ac.uk>
Message-ID: <006301c688d8$5e4ce910$15327e82@pyrimidine>

> Well at the moment StandAloneBlast runs the blast program and stores its
> output to a temp file, then gives the temp file name as an arg to
> SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
> of the blast program directly to my parser. The question is, why wasn't
> this done in StandAloneBlast? 

Probably for the reasons you outlined before:

'I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... '

Why would we take a chance on using something that works on one OS/machine
and fails to work on another?  

> I would get the blast program output handle and pass it directly to 
> SearchIO with the -fh option of new().
> The only difference here is it's faster and more efficient with the
> direct pipe, but you can't subsequently seek the SearchIO's internal
> filehandle (as we discussing in this thread). There are no (additional)
> issues with memory.

Like I said before, you can make changes and submit a patch.  The code here
is over five years old, and many many things have changed since then, so you
might find something works now which wasn't available or didn't work then.
It hasn't really been a priority (it certainly hasn't been mine).  Most
people don't care b/c it just works and a vast majority don't worry/care
about the internals.  

The issue at hand is whether any code changes will work on all OS's, not
just yours.  BioPerl is used the world over on just about every OS, so ANY
code changes need to take that into consideration.  I can guarantee that if
you made changes that break or reduce performance on 50% of the OS's, it'll
likely get rolled back.  You need the best cross-platform compatibility
possible.

We've now veered WAY off topic here.  If we intend on continuing this, we
need to switch the thread topic.

Chris

> If it isn't done using IPC::Open3 (or similar) because the original
> author already knew it wouldn't be reliable enough, or for some other
> reason(s), fine. Does anyone know the reasons?


From cjfields at uiuc.edu  Mon Jun  5 16:30:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 15:30:01 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
Message-ID: <006401c688de$d38035b0$15327e82@pyrimidine>

I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
finished yet (BioSQL and Bioperl-guts isn't done yet) and there are probably
some mangld worsd in there so have mercy on me!  It's been a busy month.

http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006

Fling your mud and abuses by responding to this thread per usual

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Mon Jun  5 23:42:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 22:42:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44848225.8080003@gmx.at>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
	<44848225.8080003@gmx.at>
Message-ID: <D7A85F26-1ADD-446E-A5F3-8C3420746364@uiuc.edu>

Hubert,

I had no trouble getting this to work; the script scans through each  
sequence and save the XML output to a file on both Windows and Mac OS  
X, both using bioperl-live.  The older RemoteBlast would only save  
text; otherwise it saved an empty file.  Using your script I get  
several XML BLAST output files (1.xml, 2.xml, etc) based on a  
counter, each about 1 MB.  All were parseable by SearchIO.

I did notice that if certain parameters weren't entered in correctly  
then you will get no data (such as setting the database to 'swiss'  
instead of 'swissprot').  A warning pops up stating that no data was  
returned when this occurs (it doesn't tell you what was wrong, just  
that no data came back from NCBI).  If you see this then that is  
likely the problem.  Besides that, I don't know what else it can be.

Chris

On Jun 5, 2006, at 2:12 PM, Hubert Prielinger wrote:

> hi chris,
> sorry, I have tried it with the latest CVS version:
>
> # $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $
>
> but it still doesn't work.
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> Make sure you have the latest Bio::Tools::Run::RemoteBlast from  
>> CVS.  The
>> option to save XML was committed relatively recently (last month  
>> or so).
>>
>> Chris
>>

...

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From heikki at sanbi.ac.za  Tue Jun  6 03:40:06 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 09:40:06 +0200
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <200606060940.07285.heikki@sanbi.ac.za>

Chris,

I am mystified. I'll try to get the massive 'return undef' change done first 
and the have an other look.

	-Heikki

On Friday 02 June 2006 21:13, Chris Fields wrote:
> Heikki,
>
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
>
> Not sure what's going on there but using read_aln and write_aln seem to
> work normally.  It may have something to do with Bio::SimpleAlign but I'm
> not absolutely sure.
>
> Any ideas what may be going on here?
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Tue Jun  6 04:04:00 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 10:04:00 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606020952.08034.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
Message-ID: <200606061004.01193.heikki@sanbi.ac.za>


OK. I've gone through all cases where return and undef are on the same lines.
I've done changes in 185 files.

My aims have ben the following:

1. Remove undef from return undef when not necessary.
	This will make it easier to spot cases where undef matters in the future
	Most of the changes fall into this category. The context is clearly scalar.

2. Returning undef when user expects en empty list is bad

./Bio/Tools/Est2Genome.pm fixed
./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
                               not fixed
./Bio/Matrix/PSM/SiteMatrix.pm  fixed
./Bio/Matrix/PSM/Psm  fixed
./Bio/DB/Taxonomy::entrez.pm fixed

3. If docs say method returns nothing, explicit undef is not the right thing 
to return

4. do not return an explicit undef if the method is supposed to return false 
on failure


Before I do the commit, I'd like to see number people to do 'make test' on 
bioperl-live and report back after the commit they see changes. There are 
quite a few tests that fail currently.

I'll do the commit tomorrow Wednesday at 9 o'cock GMT.

	-Heikki


On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> I've started going through the files that have 'return undef' lines.
> I'll report back later.
>
> Initial impression is that there are a few cases where the context
> indicates list to be returned but failure returns an explicit undef. I'll
> fix those.
>
> Most of the cases are much more ambiguous. Even when documentation says the
> failure returns undef, it is clearly meant to mean false. In most cases
> documentation does not comment on return value at all. Luckily the context
> is almost always scalar and therefore it does not matter too much.
>
> I seem to be changing 'return undef' to plain 'return' a bit overzealously,
> so do not take it personally.
>
> 	-Heikki
>
> On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > ....
> >
> > > > Again, didn't do that.
> > >
> > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > certainly not directed at your recent changes to Bio::Restriction::IO.
> > > In fact, I put in the above * comment to exclude your changes from my
> > > discussion; you changed the docs because the code never did what they
> > > said they did (the docs were bad). That's fine (good!). My comments
> > > were a general point, slightly directed at the idea of changing all the
> > > return undef;s - changing the code so that it no longer matches the
> > > docs of a previously working method. That's what I think is bad. Though
> > > in this particular case it shouldn't make any difference at all.
> >
> > Agreed.  In any case, if tests have been properly set up then they should
> > catch problems.  This is, of course, if they are properly set up.
> >
> > Chris
> >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 05:17:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 10:17:48 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <4485483C.4080505@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Heikki,
> 
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> 
> Not sure what's going on there but using read_aln and write_aln seem to work
> normally.  It may have something to do with Bio::SimpleAlign but I'm not
> absolutely sure.
> 
> Any ideas what may be going on here?

Yes, see my replies on the bug page. But so more people see the 
question, I'll ask here: can anyone offer examples of metafasta files, 
especially multiple alignments?


From cjfields at uiuc.edu  Tue Jun  6 10:30:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:30:17 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <4485483C.4080505@mrc-dunn.cam.ac.uk>
Message-ID: <000901c68975$bb9968d0$15327e82@pyrimidine>

Sendu,

This is Heikki's original submission for the specs for meta format:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
sta

So it's really a specialized FASTA format used to store meta information
about sequences.  Seems mainly useful for amino acid sequences, but is
extended to include properties of nucleotides like DNA content, RNA sec.
structure, and so on.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Tuesday, June 06, 2006 4:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> work
> > normally.  It may have something to do with Bio::SimpleAlign but I'm not
> > absolutely sure.
> >
> > Any ideas what may be going on here?
> 
> Yes, see my replies on the bug page. But so more people see the
> question, I'll ask here: can anyone offer examples of metafasta files,
> especially multiple alignments?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun  6 10:36:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:36:16 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <200606060940.07285.heikki@sanbi.ac.za>
Message-ID: <000a01c68976$9479e300$15327e82@pyrimidine>

Heikki,

I agree it's all a bit weird.  Not too concerning at the moment though since
it works at the moment but it might take some tinkering with SimpleAlign to
get it to behave.

This alignment format has some of the same characteristics as Stockholm
alignment format but looks easier to work with.  I work with RNA,
specifically one with a conserved secondary structure so this format appeals
to me quite a bit.  If I get time (probably not for a while) I may tinker
with Bio::AlignIO::stockholm to get a write_aln() method up-and-running and
see if I can convert back-and-forth from the two.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Tuesday, June 06, 2006 2:40 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris,
> 
> I am mystified. I'll try to get the massive 'return undef' change done
> first
> and the have an other look.
> 
> 	-Heikki
> 
> On Friday 02 June 2006 21:13, Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> > work normally.  It may have something to do with Bio::SimpleAlign but
> I'm
> > not absolutely sure.
> >
> > Any ideas what may be going on here?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 11:40:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 16:40:05 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000901c68975$bb9968d0$15327e82@pyrimidine>
References: <000901c68975$bb9968d0$15327e82@pyrimidine>
Message-ID: <4485A1D5.5090805@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu,
> 
> This is Heikki's original submission for the specs for meta format:
> 
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
> sta
> 
> So it's really a specialized FASTA format used to store meta information
> about sequences.  Seems mainly useful for amino acid sequences, but is
> extended to include properties of nucleotides like DNA content, RNA sec.
> structure, and so on.  

Thanks. It's not really clear to me if the meta data needs to be 
considered in the context of an alignment. That is, if you have two meta 
sequences with the same primary sequence, will all their meta data 
necessarily be the same? Or could they be different?

If the same, then the test data and test need to be fixed so my patched 
version of Bio::AlignIO::metafasta passes the tests.

If different, how should the meta data be handled? Like the test implies 
with its expected value for the consensus (just treat the primary 
sequence and all meta data as one long string)?
Is it really the intent to include characters from the meta data names 
when considering what symbols we've seen with symbol_chars() method?
Do we include the meta data name symbols when numbering?

Thoughts anyone?


From cjfields at uiuc.edu  Tue Jun  6 17:07:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 16:07:39 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <006401c688de$d38035b0$15327e82@pyrimidine>
Message-ID: <000601c689ad$3e6aec20$15327e82@pyrimidine>

I hate talking to myself...

I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
(appropriately enough, on 6-6-06).  I am trying out a new script which helps
with all the developer list noise; hope everybody likes it.

Cheers,

Chris   

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Monday, June 05, 2006 3:30 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] ListSummaries for May 10-31.
> 
> I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
> finished yet (BioSQL and Bioperl-guts isn't done yet) and there are
> probably
> some mangld worsd in there so have mercy on me!  It's been a busy month.
> 
> http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006
> 
> Fling your mud and abuses by responding to this thread per usual
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun  6 20:41:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 19:41:08 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <44861D47.7090205@infotech.monash.edu.au>
Message-ID: <000601c689cb$11f568a0$15327e82@pyrimidine>

I could do something like that.  Right now I have a script that just grabs
the text from the web page:

http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html

and uses regexes and hashes to sort everything and make some sense of the
noise.  The resolution for a bug isn't on that page but in the linked
message so I would need to grab the link from HTML, go to that page, then
get the resolution if there is one, so at the moment I just check each one
(thanks for the bug hunt Jason!).  I usually have to do a little touching up
afterwards, such as fix links and such, but the script really saves on time.
As you can tell, it's been a busy month!

I'm (very slowly) updating the script to go through the mail list threads
recursively but haven't really gotten anywhere with that yet.  Benchwork has
intervened yet again!

Chris

> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Tuesday, June 06, 2006 7:27 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
> 
> > I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> > (appropriately enough, on 6-6-06).  I am trying out a new script which
> helps
> > with all the developer list noise; hope everybody likes it.
> 
> I like the CVS summaries.
> 
> For the bug summaries, would it make sense to categorise/sort by
> category/status eg. RESOLVED, WORKSFORME etc?
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Tue Jun  6 20:26:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 07 Jun 2006 10:26:47 +1000
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689ad$3e6aec20$15327e82@pyrimidine>
References: <000601c689ad$3e6aec20$15327e82@pyrimidine>
Message-ID: <44861D47.7090205@infotech.monash.edu.au>

> I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> (appropriately enough, on 6-6-06).  I am trying out a new script which helps
> with all the developer list noise; hope everybody likes it.

I like the CVS summaries.

For the bug summaries, would it make sense to categorise/sort by 
category/status eg. RESOLVED, WORKSFORME etc?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jason at bioperl.org  Wed Jun  7 00:04:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Jun 2006 00:04:02 -0400
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689cb$11f568a0$15327e82@pyrimidine>
References: <000601c689cb$11f568a0$15327e82@pyrimidine>
Message-ID: <8D9B514C-ADB4-409F-A55F-DC0C3DA9354A@bioperl.org>

It is possible some of this can be extracted from the bugzilla as a  
query (all the changes from X to Y) and generate RSS or text that can  
be processed.

-jason
On Jun 6, 2006, at 8:41 PM, Chris Fields wrote:

> I could do something like that.  Right now I have a script that  
> just grabs
> the text from the web page:
>
> http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html
>
> and uses regexes and hashes to sort everything and make some sense  
> of the
> noise.  The resolution for a bug isn't on that page but in the linked
> message so I would need to grab the link from HTML, go to that  
> page, then
> get the resolution if there is one, so at the moment I just check  
> each one
> (thanks for the bug hunt Jason!).  I usually have to do a little  
> touching up
> afterwards, such as fix links and such, but the script really saves  
> on time.
> As you can tell, it's been a busy month!
>
> I'm (very slowly) updating the script to go through the mail list  
> threads
> recursively but haven't really gotten anywhere with that yet.   
> Benchwork has
> intervened yet again!
>
> Chris
>
>> -----Original Message-----
>> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
>> Sent: Tuesday, June 06, 2006 7:27 PM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
>>
>>> I have updated the ListSummaries to include BioSQL-l and Bioperl- 
>>> guts-l
>>> (appropriately enough, on 6-6-06).  I am trying out a new script  
>>> which
>> helps
>>> with all the developer list noise; hope everybody likes it.
>>
>> I like the CVS summaries.
>>
>> For the bug summaries, would it make sense to categorise/sort by
>> category/status eg. RESOLVED, WORKSFORME etc?
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From heikki at sanbi.ac.za  Wed Jun  7 05:57:47 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Jun 2006 11:57:47 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606061004.01193.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
	<200606061004.01193.heikki@sanbi.ac.za>
Message-ID: <200606071157.47736.heikki@sanbi.ac.za>

Committed.

Please report any surprising changes in functionality to the list.

	-Heikki

On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> OK. I've gone through all cases where return and undef are on the same
> lines. I've done changes in 185 files.
>
> My aims have ben the following:
>
> 1. Remove undef from return undef when not necessary.
> 	This will make it easier to spot cases where undef matters in the future
> 	Most of the changes fall into this category. The context is clearly
> scalar.
>
> 2. Returning undef when user expects en empty list is bad
>
> ./Bio/Tools/Est2Genome.pm fixed
> ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
>                                not fixed
> ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> ./Bio/Matrix/PSM/Psm  fixed
> ./Bio/DB/Taxonomy::entrez.pm fixed
>
> 3. If docs say method returns nothing, explicit undef is not the right
> thing to return
>
> 4. do not return an explicit undef if the method is supposed to return
> false on failure
>
>
> Before I do the commit, I'd like to see number people to do 'make test' on
> bioperl-live and report back after the commit they see changes. There are
> quite a few tests that fail currently.
>
> I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
>
> 	-Heikki
>
> On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > I've started going through the files that have 'return undef' lines.
> > I'll report back later.
> >
> > Initial impression is that there are a few cases where the context
> > indicates list to be returned but failure returns an explicit undef. I'll
> > fix those.
> >
> > Most of the cases are much more ambiguous. Even when documentation says
> > the failure returns undef, it is clearly meant to mean false. In most
> > cases documentation does not comment on return value at all. Luckily the
> > context is almost always scalar and therefore it does not matter too
> > much.
> >
> > I seem to be changing 'return undef' to plain 'return' a bit
> > overzealously, so do not take it personally.
> >
> > 	-Heikki
> >
> > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > ....
> > >
> > > > > Again, didn't do that.
> > > >
> > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > certainly not directed at your recent changes to
> > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > exclude your changes from my discussion; you changed the docs because
> > > > the code never did what they said they did (the docs were bad).
> > > > That's fine (good!). My comments were a general point, slightly
> > > > directed at the idea of changing all the return undef;s - changing
> > > > the code so that it no longer matches the docs of a previously
> > > > working method. That's what I think is bad. Though in this particular
> > > > case it shouldn't make any difference at all.
> > >
> > > Agreed.  In any case, if tests have been properly set up then they
> > > should catch problems.  This is, of course, if they are properly set
> > > up.
> > >
> > > Chris
> > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Michael.Muratet at operon.com  Tue Jun  6 14:34:38 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Tue, 6 Jun 2006 13:34:38 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>

Greetings

I am trying to install bioperl-db in preparation for installing a biosql database. I'm running on a Dell PowerEdge with quad dual-core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl 1.5.1.  I have installed mysql v5.0.21 from source with --with-innodb set for the configuration. I installed bioperl-db from cvs. I have the latest DBI and DBD:mysql installed a few weeks ago from CPAN. The installation has been working well with perl otherwise, for example, the Ensembl core API works OK. SHOW ENGINES indicates that innodb is enabled.  I have attached a snippet from the top of the output below. I searched the web and the bioperl-db list and haven't found anything that appears to be relevant. I've done several of these installs and they've pretty much completed without a single glitch. Does anyone have any ideas how to isolate the problem?

Thanks

Mike

[mmuratet at HSV-PROBE bioperl-db]$ make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/01dbadaptor.....ok 14/19
------------- EXCEPTION  -------------
MSG: failed to open connection: Transactions not supported by database
STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1477
STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm:518
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
STACK toplevel t/01dbadaptor.t:62


From hlapp at gmx.net  Wed Jun  7 08:52:22 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 08:52:22 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
Message-ID: <4F23D2EA-2218-4023-A3F6-3284912952BE@gmx.net>

Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From nlhepler at umd.edu  Wed Jun  7 09:46:32 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 09:46:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
Message-ID: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>

Hello,

I am having some difficulty here.  I have a list of accessions, which  
are the parameters for a get_Stream_by_acc() function on a  
Bio::DB::GenBank object.  None of the returned GenBank information  
for any of my accessions seems to contain variation data, no matter  
how I try to coax it out with unflattener and typemapper.  This data  
is, however, available via the web interface of NCBI Nucleotide, as  
an optional feature (SNP).  I was wondering if there was some option  
I'm missing in the initialization of the Bio::DB::GenBank object (no  
options currently) that will coax the database into giving me this  
data?  Or something else that I'm missing altogether.  The organism  
of interest is human, taxon:9606.

Nicolaus Lance Hepler
nlhepler at mail dot umd dot edu


From cjfields at uiuc.edu  Wed Jun  7 09:56:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 08:56:16 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606071157.47736.heikki@sanbi.ac.za>
Message-ID: <000601c68a3a$265552a0$15327e82@pyrimidine>

Yikes!  I'll download a tarball from anon CVS and run a comparison (vs my
pre-updated bioperl-live) on WinXP and Mac OS X 10.4 (Intel) and report back
success/fail; may be a bit.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> 	-Heikki
> 
> On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> > OK. I've gone through all cases where return and undef are on the same
> > lines. I've done changes in 185 files.
> >
> > My aims have ben the following:
> >
> > 1. Remove undef from return undef when not necessary.
> > 	This will make it easier to spot cases where undef matters in the
> future
> > 	Most of the changes fall into this category. The context is clearly
> > scalar.
> >
> > 2. Returning undef when user expects en empty list is bad
> >
> > ./Bio/Tools/Est2Genome.pm fixed
> > ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
> >                                not fixed
> > ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> > ./Bio/Matrix/PSM/Psm  fixed
> > ./Bio/DB/Taxonomy::entrez.pm fixed
> >
> > 3. If docs say method returns nothing, explicit undef is not the right
> > thing to return
> >
> > 4. do not return an explicit undef if the method is supposed to return
> > false on failure
> >
> >
> > Before I do the commit, I'd like to see number people to do 'make test'
> on
> > bioperl-live and report back after the commit they see changes. There
> are
> > quite a few tests that fail currently.
> >
> > I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
> >
> > 	-Heikki
> >
> > On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > > I've started going through the files that have 'return undef' lines.
> > > I'll report back later.
> > >
> > > Initial impression is that there are a few cases where the context
> > > indicates list to be returned but failure returns an explicit undef.
> I'll
> > > fix those.
> > >
> > > Most of the cases are much more ambiguous. Even when documentation
> says
> > > the failure returns undef, it is clearly meant to mean false. In most
> > > cases documentation does not comment on return value at all. Luckily
> the
> > > context is almost always scalar and therefore it does not matter too
> > > much.
> > >
> > > I seem to be changing 'return undef' to plain 'return' a bit
> > > overzealously, so do not take it personally.
> > >
> > > 	-Heikki
> > >
> > > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > > ....
> > > >
> > > > > > Again, didn't do that.
> > > > >
> > > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > > certainly not directed at your recent changes to
> > > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > > exclude your changes from my discussion; you changed the docs
> because
> > > > > the code never did what they said they did (the docs were bad).
> > > > > That's fine (good!). My comments were a general point, slightly
> > > > > directed at the idea of changing all the return undef;s - changing
> > > > > the code so that it no longer matches the docs of a previously
> > > > > working method. That's what I think is bad. Though in this
> particular
> > > > > case it shouldn't make any difference at all.
> > > >
> > > > Agreed.  In any case, if tests have been properly set up then they
> > > > should catch problems.  This is, of course, if they are properly set
> > > > up.
> > > >
> > > > Chris
> > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed Jun  7 11:42:32 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 07 Jun 2006 11:42:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>
Message-ID: <C0AC6C28.8C12%osborne1@optonline.net>

Nicolaus,

The short answer is no, there's no option that will omit or add a particular
feature or annotation to the Sequence object returned by Bio::DB::GenBank.
Can you give some example accessions?

Brian O.


On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:

> Hello,
> 
> I am having some difficulty here.  I have a list of accessions, which
> are the parameters for a get_Stream_by_acc() function on a
> Bio::DB::GenBank object.  None of the returned GenBank information
> for any of my accessions seems to contain variation data, no matter
> how I try to coax it out with unflattener and typemapper.  This data
> is, however, available via the web interface of NCBI Nucleotide, as
> an optional feature (SNP).  I was wondering if there was some option
> I'm missing in the initialization of the Bio::DB::GenBank object (no
> options currently) that will coax the database into giving me this
> data?  Or something else that I'm missing altogether.  The organism
> of interest is human, taxon:9606.
> 
> Nicolaus Lance Hepler
> nlhepler at mail dot umd dot edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nlhepler at umd.edu  Wed Jun  7 12:26:06 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 12:26:06 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <C0AC6C28.8C12%osborne1@optonline.net>
References: <C0AC6C28.8C12%osborne1@optonline.net>
Message-ID: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>

Brian,

A sample accession is BC000007.  I figured a way around it though.   
Rather than automate the whole process, I just downloaded from Batch  
Entrez a flat .gb file of all my accessions.  It's not flexible, and  
will be inconvenient when we expand the dataset, but it will provide  
me with data to work with for now.

Nicolaus

> Nicolaus,
>
> The short answer is no, there's no option that will omit or add a  
> particular
> feature or annotation to the Sequence object returned by  
> Bio::DB::GenBank.
> Can you give some example accessions?
>
> Brian O.
>
>
> On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
>
>> Hello,
>>
>> I am having some difficulty here.  I have a list of accessions, which
>> are the parameters for a get_Stream_by_acc() function on a
>> Bio::DB::GenBank object.  None of the returned GenBank information
>> for any of my accessions seems to contain variation data, no matter
>> how I try to coax it out with unflattener and typemapper.  This data
>> is, however, available via the web interface of NCBI Nucleotide, as
>> an optional feature (SNP).  I was wondering if there was some option
>> I'm missing in the initialization of the Bio::DB::GenBank object (no
>> options currently) that will coax the database into giving me this
>> data?  Or something else that I'm missing altogether.  The organism
>> of interest is human, taxon:9606.
>>
>> Nicolaus Lance Hepler
>> nlhepler at mail dot umd dot edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From lstein at cshl.edu  Wed Jun  7 12:50:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Jun 2006 12:50:24 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <4483F338.7090909@mrc-dunn.cam.ac.uk>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
Message-ID: <200606071250.25026.lstein@cshl.edu>

I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
because the CGI upload filehandle is not seekable (for good reasons that I 
won't inflict on you)! You'll have to write to a temporary file, or else read 
the whole sequence into memory. Sorry about this.

Lincoln

On Monday 05 June 2006 05:02, Sendu Bala wrote:
> Wijaya Edward wrote:
> > Dear Lincoln and experts
> >
> > Curently I have a CGI application that does this:
> >
> > 1.  read and uploaded file
> > 2. check the content of the file whether fasta or not
> > 3. print out the content of the file.
> >
> >
> > Now the problem I'm facing is that
> > on step three. The content of the file handled is altered
> > namely the very first line does not get printed.
>
> The problem is almost certainly that the guessing is done by reading the
> first line of the filehandle, so that your subsequent while loop on that
> same filehandle starts at the second line.
> Just seek the filehandle back to the start before trying to print the
> contents out.
>
> ..
> my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
> my $format_upload  = $guesser_upload->guess;
> seek($fh_upload, 0, 0);
> ..
> while (<$fh_upload>) {
>      ...
> }
>
> An alternative might be to pass GuessSeqFormat the filename in which
> case it would make its own filehandle and close it, leaving your own
> filehandle untouched.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From paul.boutros at utoronto.ca  Wed Jun  7 13:03:01 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 13:03:01 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
Message-ID: <1149699781.448706c5e803d@webmail.utoronto.ca>

Hi,

Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 and I had a few 
failures:

Failed Test         Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/Annotation.t                    89    2  79 88
t/Biblio.t                        24    1  2
t/LocusLink.t                     23    1  23
t/PhysicalMap.t                   14    2  11-12
t/RepeatMasker.t                   6    3  1-2 6
t/StandAloneBlast.t               18    4  19-22
t/TaxonTree.t                     17   30  11 18-42
t/alignUtilities.t                 9    1  9
t/psm.t              255 65280    48   35  29 32-48
t/tutorial.t                      21   15  7-21

Not sure if any of these are related to the "return undef" changes, or are known.  I also 
had some warnings running BioGraphics.t

t/BioGraphics................Use of uninitialized value in numeric lt (<) at Bio/Graphics/
FeatureFile.pm line 547, <GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
t/BioGraphics................ok

I also ran the tests manually and below I've attached what came out (doesn't always agree 
with the results of make test, and in a few cases (e.g. tutorial.t or StandAloneBlast.t) 
there were no errors running the tests manually.
Paul

Annotation.t
============
not ok 8
# Test 8 got: '' (t/Annotation.t at line 59)
#   Expected: '0'

not ok 71
# Test 71 got: 'dumpster|test case|Ann:00001' (t/Annotation.t at line 187)
#    Expected: 'dumpster|test case|'

not ok 79
# Failed test 79 in t/Annotation.t at line 217

ok 85
Use of uninitialized value in concatenation (.) or string at /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annot
ationFactory.pm line 236.

------------- EXCEPTION  -------------
MSG: Bio::AnnotationI implementation Bio::Annotation:: failed to load:
------------- EXCEPTION  -------------
MSG: Failed to load module Bio::Annotation::. Can't locate Bio/Annotation/.pm in @INC 
(@INC contains: t /db2blast/Paul/perl5.8
.7/lib/5.8.7/aix /db2blast/Paul/perl5.8.7/lib/5.8.7 /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/aix /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7 /db2blast/Paul/perl5.8.7/lib/site_perl .) at /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7/Bio/Root/Root.pm
 line 396.

STACK Bio::Root::Root::_load_module /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Root/
Root.pm:398
STACK (eval) /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Annotation/
AnnotationFactory.pm:149
STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:148
STACK toplevel t/Annotation.t:237
--------------------------------------

STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:152
STACK toplevel t/Annotation.t:237
--------------------------------------


PhysicalMap.t
=============
not ok 11
# Test 11 got: <UNDEF> (t/PhysicalMap.t at line 55)
#    Expected: '0' (code holds and returns a string, definition requires a boolean)
not ok 12
# Test 12 got: '3' (t/PhysicalMap.t at line 56)
#    Expected: '1' (code holds and returns a string, definition requires a boolean)

TaxonTree.t
===========
ok 10
Use of uninitialized value in string eq at /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/
Bio/Taxonomy/Taxon.pm line 559.
not ok 11
# Test 11 got: <UNDEF> (t/TaxonTree.t at line 35)
#    Expected: 'species'
ok 12 # foo is not a rank, class variable @RANK not initialised
ok 13
ok 14
ok 15
ok 16
ok 17
ok 18
Can't use string ("this could be anything") as a HASH ref while "strict refs" in use at /
db2blast/Paul/perl5.8.7/lib/site_perl
/5.8.7/Bio/Taxonomy/Taxon.pm line 452.

alignUtilities.t
================
ok 6

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------
ok 7
ok 8
not ok 9
# Test 9 got: '1' (t/alignUtilities.t at line 53)
#   Expected: '3'

RepeatMasker.t
==============
t/RepeatMasker...............FAILED tests 1-2, 6
        Failed 3/6 tests, 50.00% okay

StandAloneBlast.t
=================
t/StandAloneBlast............FAILED tests 19-22
        Failed 4/18 tests, 77.78% okay

psm.t
=====
t/Pseudowise.................ok
t/psm........................NOK 29Illegal division by zero at t/psm.t line 147, <GEN1> 
line 36.
t/psm........................dubious
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
        Failed 18/48 tests, 62.50% okay
t/QRNA.......................ok

tutorial.t
==========
t/tutorial...................ok 5/21
The following numeric arguments can be passed to run the corresponding demo-script.
1  => sequence_manipulations
2  => seqstats_and_seqwords
3  => restriction_and_sigcleave
4  => other_seq_utilities
5  => run_perl
6  => searchio_parsing
8  => hmmer_parsing
9  => simplealign
10 => gene_prediction_parsing
11 => access_remote_db
12 => index_local_db
13 => fetch_local_db    (NOTE: needs to be run with demo 12)
14 => sequence_annotation
15 => largeseqs
16 => liveseqs
17 => run_struct
18 => demo_variations
19 => demo_xml
20 => run_tree
21 => run_map
22 => run_remoteblast
23 => run_standaloneblast
24 => run_clustalw_tcoffee
25 => run_psw_bl2seq

In addition the argument "100" followed by the name of a single
bioperl object will display a list of all the public methods
available from that object and from what object they are inherited.

Using the parameter "0" will run all the tests that do not require
external programs (i.e. tests 1 to 22).
Using any other argument (or no argument) will run this display.

So typical command lines might be:
To run all core demo scripts:
 > perl -w  bptutorial.pl 0
or to just run the local indexing demos:
 > perl -w  bptutorial.pl 12 13
or to list all the methods available for object Bio::Tools::SeqStats -
 > perl -w  bptutorial.pl 100 Bio::Tools::SeqStats

t/tutorial...................FAILED tests 7-21
        Failed 15/21 tests, 28.57% okay

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> -Heikki
> 


From sb at mrc-dunn.cam.ac.uk  Wed Jun  7 12:54:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 07 Jun 2006 17:54:31 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <200606071250.25026.lstein@cshl.edu>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
	<200606071250.25026.lstein@cshl.edu>
Message-ID: <448704C7.6080201@mrc-dunn.cam.ac.uk>

Lincoln Stein wrote:
> I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
> because the CGI upload filehandle is not seekable (for good reasons that I 
> won't inflict on you)! You'll have to write to a temporary file, or else read 
> the whole sequence into memory. Sorry about this.

The OP already had success with my alternative solution.


>> An alternative might be to pass GuessSeqFormat the filename in which
>> case it would make its own filehandle and close it, leaving your own
>> filehandle untouched.


From hlapp at gmx.net  Wed Jun  7 13:25:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 13:25:25 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
Message-ID: <76434774-51A4-46E7-97AA-1E9227CB7771@gmx.net>

Hi Michael,

yes it looks like a problem in DBD if DBD::mysql fails to recognize  
that the mysql instance to which it is connected does support  
transactions. You can verify this by writing a simple script that  
tries to open a connection with
{ AutoCommit => 0 } as the parameter hash:

	use DBI;
	my $dbh = DBI->connect("dbi:mysql:database=<yourdb>;host=<yourhost>",
	                       "username","password",
	                       { AutoCommit => 0, RaiseError => 0 });
	die DBI::errstr unless $dbh;
	$dbh->disconnect;

If this succeeds fine then something in Biosql may be related to the  
problem, but otherwise not.

	-hilmar


On Jun 7, 2006, at 12:01 PM, Michael Muratet US-Huntsville wrote:

> Hilmar
>
> Pardon the top post.
>
> I tried the test below and it failed. So, I went back and redid the  
> Innodb configuration (deleted all the index files--they were empty  
> anyway, reinstalled biosql (which was empty,too) and restarted the  
> server. Now, the test below works. I went into the DBD-3.0003 and  
> did a distclean and reinstalled the package, but it fails the one  
> transaction test, too. So, it looks like the problem is in DBD, yes?
>
> We had a RAID 5 drive glitch the day before yesterday and rebuilt  
> it. That's the only thing that's changed that I know of that could  
> have caused the problem with ibxxx files.
>
> I have received a reply on the DBD list. Can you think of anything  
> else I should try from the biosql end?
>
> Thanks a million.
>
> Mike
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 07, 2006 7:52 AM
> To: Michael Muratet US-Huntsville
> Cc: Bioperl; BioSQL
> Subject: Re: [Bioperl-l] bioperl-db failing tests
>
>
> Hi Michael,
>
> Bioperl-db will open all connections with AutoCommit => 0 in the DBI
> parameter hash. The test you're stumbling over is actually there to
> test that the database  does support transactions, but apparently in
> 5.x versions MySQL no longer silently ignores the AutoCommit
> parameter if it doesn't support transactions (effectively preempting
> the test ...).
>
> Now you say that innodb shows as enabled - i.e., you can confirm that
> you changed the Mysql configuration parameter that designates the
> directory for innodb to store its files?
>
> You can confirm that transactions are supported by simple tests on
> the sql level. Open a mysql shell and do the following:
>
> 	-- BTW 'start transaction;' will (should) work too
> 	mysql> set autocommit = 0;
> 	mysql> insert into biodatabase (name) values ('__dummy__');
> 	mysql> select name from biodatabase where name = '__dummy__';
> 	mysql> rollback;
> 	mysql> select name from biodatabase where name = '__dummy__';
>
> The first SELECT query should return one and the last query should
> return zero rows if transactions are supported, and there shouldn't
> be any error.
>
> If the above succeeds (which I don't expect it to) then it looks like
> the DBD::mysql driver thinks the database doesn't support
> transactions when in reality it does. Let me know the result.
>
> 	-hilmar
>
> On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:
>
>> Greetings
>>
>> I am trying to install bioperl-db in preparation for installing a
>> biosql database. I'm running on a Dell PowerEdge with quad dual-
>> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl
>> 1.5.1.  I have installed mysql v5.0.21 from source with --with-
>> innodb set for the configuration. I installed bioperl-db from cvs.
>> I have the latest DBI and DBD:mysql installed a few weeks ago from
>> CPAN. The installation has been working well with perl otherwise,
>> for example, the Ensembl core API works OK. SHOW ENGINES indicates
>> that innodb is enabled.  I have attached a snippet from the top of
>> the output below. I searched the web and the bioperl-db list and
>> haven't found anything that appears to be relevant. I've done
>> several of these installs and they've pretty much completed without
>> a single glitch. Does anyone have any ideas how to isolate the
>> problem?
>>
>> Thanks
>>
>> Mike
>>
>> [mmuratet at HSV-PROBE bioperl-db]$ make test
>> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>> t/01dbadaptor.....ok 14/19
>> ------------- EXCEPTION  -------------
>> MSG: failed to open connection: Transactions not supported by  
>> database
>> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 255
>> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 215
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/
>> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/
>> BioSQL/BasePersistenceAdaptor.pm:1477
>> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/
>> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/
>> DB/BioSQL/BaseDriver.pm:518
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
>> STACK toplevel t/01dbadaptor.t:62
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 14:08:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 13:08:19 -0500
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>
Message-ID: <001501c68a5d$5db655a0$15327e82@pyrimidine>

Nicolaus,

Bio::DB::GenBank use NCBI's efetch mainly; I implemented epost but it's a
hack at best and only works in certain circumstances.  So you could get the
sequence data directly but the links aren't included and are only given
through NCBI's elink.  There is no way I know of to get this information via
bioperl as there isn't an interface to NCBI's elink AFAIK (Brian?).  I'm
working on a rewrite for a general NCBI eutils interface for each tool
(efetch, epost, elink, etc), but it isn't working yet and probably won't be
ready to go until the end of summer-beginning of fall.

Just so you know how complex the situation is when using accessions, you
can't use a sequence accession directly when querying elink (and most
eutils), it has to be the GI number; I believe efetch is the only one that
accepts accessions.  So you would have to run esearch first using the
accessions as a query, grab the GI from the XML, run elink with the GI, grab
the SNP cluster ID, efetch the SNP data, and parse the data to get into
Bio::ClusterIO.  Fun, huh?  You would think NCBI would try making this a
little easier...

There used to be a way to parse dbSNP data using Bio::ClusterIO but the XML
schema changed so the parser is likely broken (the tests work but the file
is from the old schema).  I think Allen Day was in charge of it.

I used the eutils test interface () to grab the SNP cluster accessions for
your sequence using elink (note that the format is XML, which one  would
have to parse out to grab the cluster ID's):

<eLinkResult>
<LinkSet>
	<DbFrom>nucleotide</DbFrom>
	<IdList>
		<Id>33875090</Id>
	</IdList>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp</LinkName>
		<Link>
			<Id>4631</Id>
		</Link>
	</LinkSetDb>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp_genegenotype</LinkName>
		<Link>
			<Id>28362589</Id>
		</Link>
		<Link>
			<Id>4635949</Id>
		</Link>

		<Link>
			<Id>28362591</Id>
		</Link>
		<Link>
			<Id>11545838</Id>
		</Link>
		<Link>
			<Id>4246814</Id>

		</Link>
		<Link>
			<Id>28670911</Id>
		</Link>
		<Link>
			<Id>4073746</Id>
		</Link>
		<Link>

			<Id>9313754</Id>
		</Link>
		<Link>
			<Id>11545840</Id>
		</Link>
		<Link>
			<Id>17077806</Id>

		</Link>
		<Link>
			<Id>28362590</Id>
		</Link>
		<Link>
			<Id>4076327</Id>
		</Link>
		<Link>

			<Id>9834</Id>
		</Link>
		<Link>
			<Id>4073745</Id>
		</Link>
		<Link>
			<Id>6879874</Id>

		</Link>
	</LinkSetDb>
</LinkSet>
</eLinkResult>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nicolaus Hepler
> Sent: Wednesday, June 07, 2006 11:26 AM
> To: Brian Osborne; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] GenBank Feature: variation
> 
> Brian,
> 
> A sample accession is BC000007.  I figured a way around it though.
> Rather than automate the whole process, I just downloaded from Batch
> Entrez a flat .gb file of all my accessions.  It's not flexible, and
> will be inconvenient when we expand the dataset, but it will provide
> me with data to work with for now.
> 
> Nicolaus
> 
> > Nicolaus,
> >
> > The short answer is no, there's no option that will omit or add a
> > particular
> > feature or annotation to the Sequence object returned by
> > Bio::DB::GenBank.
> > Can you give some example accessions?
> >
> > Brian O.
> >
> >
> > On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
> >
> >> Hello,
> >>
> >> I am having some difficulty here.  I have a list of accessions, which
> >> are the parameters for a get_Stream_by_acc() function on a
> >> Bio::DB::GenBank object.  None of the returned GenBank information
> >> for any of my accessions seems to contain variation data, no matter
> >> how I try to coax it out with unflattener and typemapper.  This data
> >> is, however, available via the web interface of NCBI Nucleotide, as
> >> an optional feature (SNP).  I was wondering if there was some option
> >> I'm missing in the initialization of the Bio::DB::GenBank object (no
> >> options currently) that will coax the database into giving me this
> >> data?  Or something else that I'm missing altogether.  The organism
> >> of interest is human, taxon:9606.
> >>
> >> Nicolaus Lance Hepler
> >> nlhepler at mail dot umd dot edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Michael.Muratet at operon.com  Wed Jun  7 12:01:29 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Wed, 7 Jun 2006 11:01:29 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>

Hilmar

Pardon the top post.

I tried the test below and it failed. So, I went back and redid the Innodb configuration (deleted all the index files--they were empty anyway, reinstalled biosql (which was empty,too) and restarted the server. Now, the test below works. I went into the DBD-3.0003 and did a distclean and reinstalled the package, but it fails the one transaction test, too. So, it looks like the problem is in DBD, yes?

We had a RAID 5 drive glitch the day before yesterday and rebuilt it. That's the only thing that's changed that I know of that could have caused the problem with ibxxx files. 

I have received a reply on the DBD list. Can you think of anything else I should try from the biosql end?

Thanks a million.

Mike

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net]
Sent: Wednesday, June 07, 2006 7:52 AM
To: Michael Muratet US-Huntsville
Cc: Bioperl; BioSQL
Subject: Re: [Bioperl-l] bioperl-db failing tests


Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 15:38:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 14:38:08 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
Message-ID: <001901c68a69$e7ece8e0$15327e82@pyrimidine>

All,

Don't know how many people use Bio::ClusterIO this module, but it looks like
Bio::ClusterIO::dbsnp is broken unless you are using older XML versions of
the dbSNP database; the schema for ASN.1 and XML format for SNP has changed:

http://www.ncbi.nlm.nih.gov/projects/SNP/

under 'Announcements'.

I actually tried parsing the dbsnp test file and a newer schema XML file to
confirm this; the new version doesn't work (returned object from
next_cluster is undef).  I'm filing a bug as a reminder.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From paul.boutros at utoronto.ca  Wed Jun  7 18:35:46 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 18:35:46 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149719746.448754c2ef4e0@webmail.utoronto.ca>

> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
Yup yup, you're right. I should have mentioned in my original message that I didn't run 
any remote-server tests, and unfortunately can't do so on this box.
Paul

Quoting David Messina <dmessina at wustl.edu>:

> To look for problems related to Heikki's "return undef" sweep, I ran  
> 'make test' on both today's version of bioperl-live and on an older  
> version I had checked out on May 12. This was done on OS X 10.4.6 and  
> perl 5.8.6.
> 
> 
> Here are the results:
> 
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
> need to be examined.
> 
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
> 
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
> between 5/12 and today, so this appears to be not 'return undef'- 
> related.
> 
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load  
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
> 5/12 and today, so this is not 'return undef'-related.
> 
> 
> 
> All the other test failures appear in both versions of bioperl-live,  
> so presumably they are not affected by the 'return undef' changes.
> 
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
> 
> Also, he saw errors in Biblio.t, Repeatmasker.t, and  
> StandAloneBlast.t that I did not.
> 
> Dave
> 
> 
> Today's bioperl-live test results:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
> 99.84% okay.
> 
> Note that this is including tests requiring a remote server.
> 
> And here's the output from a May 12 checkout of bioperl-live:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
> 99.89% okay.
> 
> 
> 
> 
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
> 
> > Hi,
> >
> > Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> > and I had a few
> > failures:
> >
> > Failed Test         Stat Wstat Total Fail  List of Failed
> > ---------------------------------------------------------------------- 
> > ---------
> > t/Annotation.t                    89    2  79 88
> > t/Biblio.t                        24    1  2
> > t/LocusLink.t                     23    1  23
> > t/PhysicalMap.t                   14    2  11-12
> > t/RepeatMasker.t                   6    3  1-2 6
> > t/StandAloneBlast.t               18    4  19-22
> > t/TaxonTree.t                     17   30  11 18-42
> > t/alignUtilities.t                 9    1  9
> > t/psm.t              255 65280    48   35  29 32-48
> > t/tutorial.t                      21   15  7-21
> 
> 


From dmessina at wustl.edu  Wed Jun  7 18:26:25 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 17:26:25 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <1149699781.448706c5e803d@webmail.utoronto.ca>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
Message-ID: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>

To look for problems related to Heikki's "return undef" sweep, I ran  
'make test' on both today's version of bioperl-live and on an older  
version I had checked out on May 12. This was done on OS X 10.4.6 and  
perl 5.8.6.


Here are the results:

Failures in today's version of bioperl-live but NOT in 5/12 version
===================================================================
- psm.t -
The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
need to be examined.

Here's the error message:
Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
t/psm........................dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
         Failed 18/48 tests, 62.50% okay


Failures in 5/12 version of bioperl-live but NOT in today's version
===================================================================
- OntologyStore.t -
Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
touched between 5/12 and today.

The error looks like a transient network problem to me, but I'm not  
sure:
-------------------- WARNING ---------------------
MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
*checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
500.  retrying...
---------------------------------------------------
[REPEATED 5 times -Dave]

t/OntologyStore..............FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay


- RepeatMasker.t -
Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
between 5/12 and today, so this appears to be not 'return undef'- 
related.

- SeqVersion.t -
The SeqVersion error was due to a failure to find and load  
Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
5/12 and today, so this is not 'return undef'-related.


All the other test failures appear in both versions of bioperl-live,  
so presumably they are not affected by the 'return undef' changes.

Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
that he didn't. Those only pop up when I run the optional remote- 
server tests, however. Perhaps Paul didn't run those and that  
accounts for the discrepancy?

Also, he saw errors in Biblio.t, Repeatmasker.t, and  
StandAloneBlast.t that I did not.

Dave


Today's bioperl-live test results:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/PhysicalMap.t                  14    2  14.29%  11-12
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/psm.t             255 65280    48   35  72.92%  29 32-48
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
99.84% okay.

Note that this is including tests requiring a remote server.

And here's the output from a May 12 checkout of bioperl-live:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/OntologyStore.t                 6    4  66.67%  3-6
t/PhysicalMap.t                  14    2  14.29%  11-12
t/RepeatMasker.t                  6    3  50.00%  1-2 6
t/SeqVersion.t      255 65280     6   10 166.67%  2-6
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
99.89% okay.


On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:

> Hi,
>
> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> and I had a few
> failures:
>
> Failed Test         Stat Wstat Total Fail  List of Failed
> ---------------------------------------------------------------------- 
> ---------
> t/Annotation.t                    89    2  79 88
> t/Biblio.t                        24    1  2
> t/LocusLink.t                     23    1  23
> t/PhysicalMap.t                   14    2  11-12
> t/RepeatMasker.t                   6    3  1-2 6
> t/StandAloneBlast.t               18    4  19-22
> t/TaxonTree.t                     17   30  11 18-42
> t/alignUtilities.t                 9    1  9
> t/psm.t              255 65280    48   35  29 32-48
> t/tutorial.t                      21   15  7-21


From cjfields at uiuc.edu  Wed Jun  7 19:38:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 18:38:10 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>

I saw a ton of activity from Jason on bioperl-guts for test files and  
modules; you may want to check your tests vs. his changes in case  
they were fixed.  I'll be running similar tests on WinXP ad Mac OS X;  
would be nice to see how my results compare to Dave's

Chris

On Jun 7, 2006, at 5:26 PM, David Messina wrote:

> To look for problems related to Heikki's "return undef" sweep, I ran
> 'make test' on both today's version of bioperl-live and on an older
> version I had checked out on May 12. This was done on OS X 10.4.6 and
> perl 5.8.6.
>
>
> Here are the results:
>
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may
> need to be examined.
>
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
>
>
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been
> touched between 5/12 and today.
>
> The error looks like a transient network problem to me, but I'm not
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
>
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
>
>
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm
> between 5/12 and today, so this appears to be not 'return undef'-
> related.
>
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between
> 5/12 and today, so this is not 'return undef'-related.
>
>
>
> All the other test failures appear in both versions of bioperl-live,
> so presumably they are not affected by the 'return undef' changes.
>
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG
> that he didn't. Those only pop up when I run the optional remote-
> server tests, however. Perhaps Paul didn't run those and that
> accounts for the discrepancy?
>
> Also, he saw errors in Biblio.t, Repeatmasker.t, and
> StandAloneBlast.t that I did not.
>
> Dave
>
>
> Today's bioperl-live test results:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.
>
> Note that this is including tests requiring a remote server.
>
> And here's the output from a May 12 checkout of bioperl-live:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,
> 99.89% okay.
>
>
>
>
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
>
>> Hi,
>>
>> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7
>> and I had a few
>> failures:
>>
>> Failed Test         Stat Wstat Total Fail  List of Failed
>> --------------------------------------------------------------------- 
>> -
>> ---------
>> t/Annotation.t                    89    2  79 88
>> t/Biblio.t                        24    1  2
>> t/LocusLink.t                     23    1  23
>> t/PhysicalMap.t                   14    2  11-12
>> t/RepeatMasker.t                   6    3  1-2 6
>> t/StandAloneBlast.t               18    4  19-22
>> t/TaxonTree.t                     17   30  11 18-42
>> t/alignUtilities.t                 9    1  9
>> t/psm.t              255 65280    48   35  29 32-48
>> t/tutorial.t                      21   15  7-21
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Wed Jun  7 20:50:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 19:50:48 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
Message-ID: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>

Thanks for letting me know, Chris.

Here's a new round of results on bioperl-live checked out moments ago:
[OS X 10.4.6, perl 5.8.6]

Failed Test   Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/DBCUTG.t                  29    5  17.24%  26 30-32
t/LocusLink.t               23    1   4.35%  23
t/PopGen.t                  89    1   1.12%  85
t/psm.t        255 65280    48   35  72.92%  29 32-48
t/tutorial.t                21   15  71.43%  7-21
121 subtests skipped.
Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,  
99.69% okay.

Fixed since earlier today
=========================
Annotation.t
PhysicalMap.t
TaxonTree.t
alignUtilities.t

New since earlier today
=======================
PopGen.t

t/PopGen.....................FAILED test 85
         Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86  
okay, 96.63%)

Unchanged
=========
DBCUTG.t
LocusLink.t
psm.t
tutorial.t

Remote-server tests were run like before. I forgot to mention last  
time that I skipped the local DB tests and I don't have bioperl-ext  
installed, so several staden-related tests were also skipped.

Dave


My results from earlier today for reference:
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.


From heikki at sanbi.ac.za  Thu Jun  8 04:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jun  8 04:52:27 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:52:27 +0200
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
Message-ID: <200606081052.27446.heikki@sanbi.ac.za>

I sort of fixed this.

At least the tests pass (I commented out two) when using the new sample XML. 
To be really usefull, the code need much more work, so I left the bug open.

http://bugzilla.open-bio.org/show_bug.cgi?id=2018


	-Heikki


On Wednesday 07 June 2006 21:38, Chris Fields wrote:
> All,
>
> Don't know how many people use Bio::ClusterIO this module, but it looks
> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
> versions of the dbSNP database; the schema for ASN.1 and XML format for SNP
> has changed:
>
> http://www.ncbi.nlm.nih.gov/projects/SNP/
>
> under 'Announcements'.
>
> I actually tried parsing the dbsnp test file and a newer schema XML file to
> confirm this; the new version doesn't work (returned object from
> next_cluster is undef).  I'm filing a bug as a reminder.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jun  8 04:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 01:55:09 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 08 Jun 2006 15:55:09 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
Message-ID: <4487BBBD.6060702@infotech.monash.edu.au>

Hi all,

I've just been further auditing the Bioperl code and noticed that
Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
can't locate an example/sample sequence file in "Lasergene" format.

 From the code it looks similar to 'raw' format but has "^^" as
a separator character.

Can anyone provide a real-life example so I can augment the 
t/lasergene.t tests?

Thanks,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jrm62 at cam.ac.uk  Thu Jun  8 07:38:40 2006
From: jrm62 at cam.ac.uk (John Mifsud)
Date: 08 Jun 2006 12:38:40 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
Message-ID: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>

Dear all,

Firstly I hope this is the right email list to write to! 

Secondly, I have a little program that parses the BLAST results i have got 
running remotely to the NCBI server and takes out all the hit sequences and 
converts them to FASTA format.

Now when using BROAD BLAST and getting results this works fine (tblastn ver 
2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
the output is different and the parsing no longer works. I was wondering if 
anyone knew of a new SearchIO module / script that is designed to blast the 
updated NCBI BLAST output?

Thanks for your time,


John


From cjfields at uiuc.edu  Thu Jun  8 08:56:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 07:56:27 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <200606081052.27446.heikki@sanbi.ac.za>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
	<200606081052.27446.heikki@sanbi.ac.za>
Message-ID: <AB8EE4BC-4774-48A6-8F26-2A8356F8E700@uiuc.edu>

Sounds good to me.  If someone wants to use this down the line, they  
might be desperate enough to provide patches; there are a lot of  
commented out tags.

Chris

On Jun 8, 2006, at 3:52 AM, Heikki Lehvaslaiho wrote:

> I sort of fixed this.
>
> At least the tests pass (I commented out two) when using the new  
> sample XML.
> To be really usefull, the code need much more work, so I left the  
> bug open.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2018
>
>
> 	-Heikki
>
>
> On Wednesday 07 June 2006 21:38, Chris Fields wrote:
>> All,
>>
>> Don't know how many people use Bio::ClusterIO this module, but it  
>> looks
>> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
>> versions of the dbSNP database; the schema for ASN.1 and XML  
>> format for SNP
>> has changed:
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/
>>
>> under 'Announcements'.
>>
>> I actually tried parsing the dbsnp test file and a newer schema  
>> XML file to
>> confirm this; the new version doesn't work (returned object from
>> next_cluster is undef).  I'm filing a bug as a reminder.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 09:03:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 14:03:05 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <44882009.1040906@mrc-dunn.cam.ac.uk>

John Mifsud wrote:
> Dear all,
> 
> Firstly I hope this is the right email list to write to! 
> 
> Secondly, I have a little program that parses the BLAST results i have got 
> running remotely to the NCBI server and takes out all the hit sequences and 
> converts them to FASTA format.
> 
> Now when using BROAD BLAST and getting results this works fine (tblastn ver 
> 2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
> the output is different and the parsing no longer works. I was wondering if 
> anyone knew of a new SearchIO module / script that is designed to blast the 
> updated NCBI BLAST output?

You'll probably need to get the latest SearchIO blast module from 
bioperl-live.
http://bioperl.org/wiki/Getting_BioPerl

If you're having difficulties with your setup, John, I can just send you 
the relevant file(s). Mail me (or Alan) privately for that.


From cjfields at uiuc.edu  Thu Jun  8 09:12:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 08:12:23 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>

I would say, based on previous responses, update to the latest CVS  
(bioperl-live).  You could also try updating  
Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you  
don't want to update the entire toolkit.  Running these with BLAST  
2.2.14 output seems to work fine.

Though this is the likely fix, if you have additional problems next  
time please make sure to include more information.  We have no idea  
what OS, bioperl version, perl version you are running.  And a code  
snippet and bug description would be nice (i.e. "it doesn't work" -  
not a good description; "the script freezes" is a little more  
informative).

Chris

On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:

> Dear all,
>
> Firstly I hope this is the right email list to write to!
>
> Secondly, I have a little program that parses the BLAST results i  
> have got
> running remotely to the NCBI server and takes out all the hit  
> sequences and
> converts them to FASTA format.
>
> Now when using BROAD BLAST and getting results this works fine  
> (tblastn ver
> 2.2.9). However, NCBI have just updated their BLAST server (to  
> 2.2.14) and
> the output is different and the parsing no longer works. I was  
> wondering if
> anyone knew of a new SearchIO module / script that is designed to  
> blast the
> updated NCBI BLAST output?
>
> Thanks for your time,
>
>
> John
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 12:03:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 17:03:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>	<200605311255.19166.heikki@sanbi.ac.za>
	<447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <44884A49.6060805@mrc-dunn.cam.ac.uk>

Sendu Bala wrote:
> Heikki Lehvaslaiho wrote:
>> In my opinion the sooner the bugs get exposed the better. It is much more 
>> likely that there is a well hidden bug caused by assigning accidentally undef 
>> into an one element array that someone intentionally writing code that 
>> expects that behaviour!
>>
>> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
>> and could not see any differences in the test output. 
>>
>> Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible 
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Looks like Heikki's work went well. If there is any further interest in 
getting rid of all the remaining undef returns, this also need to be fixed:

sub x {
   # return (...) on success
   # do nothing on failure
}

Needs to be changed to:

sub x {
   # return (...) on success
   return;
}


From roy at colibase.bham.ac.uk  Thu Jun  8 12:31:10 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 08 Jun 2006 17:31:10 +0100
Subject: [Bioperl-l] Truncate sequence with features
Message-ID: <448850CE.1040105@colibase.bham.ac.uk>

Hi all.

I've been playing around with a subroutine to truncate a sequence and 
adjust the coordinates of any features that overlap the specified 
region- something that according to the comments in 
Bio::Location::Simple has been abortively worked on in the past.

I've submitted the subroutine as an enhancement in Bugzilla. It's a bit 
hacky but works for what I needed it for. However I'm a bit unsure on 
the best way to deal with split locations where one of the sublocations 
is entirely outside the truncated region. My current method results in 
locations like:
join(1..500, >1000..>1000)

which is quite ugly and possibly invalid, but kind of makes sense. Does 
anyone know what would be the correct behaviour for this situation?

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From cjfields at uiuc.edu  Thu Jun  8 14:47:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 13:47:19 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <000701c68b2b$f8cc21e0$15327e82@pyrimidine>

Thomas;

That error isn't related to BioPerl.  This is the standard HTML response
NCBI gives as a web page; the error imbedded in the HTML you received as a
warning has:

ERROR: Cannot accept request, error code: 1Number of unfinished requests
(151) from your IP address reached the HARD limit 150.

So you may have too many requests in the BLAST queue.  

Chris

> -----Original Message-----
> From: Thomas J Keller [mailto:kellert at ohsu.edu]
> Sent: Thursday, June 08, 2006 1:39 PM
> To: Chris Fields
> Cc: John Mifsud; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] NCBI BLAST results parsing
> 
> I'm having the same problem bp_remote_blast.pl worked yesterday,
> today it's busted. Incidently, I got the following email from NCBI
> this morning:
> The new version of the NCBI SOAP E-Utilities, which includes recent
> changes to the NCBI sequence databases schema, was released today.
> 
> Thank you.
> NCBI E-Utilities Team
> 
> I wouldn't have thought that that would affect
> Bio::Tools::RemoteBlast but something has changed.
> 
> Here's a snippet of the output after $ bp_remote_blast.pl -p blastn -
> d nr -e 1e-3 -i nm_008540.fasta
> 
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
> Content-Length: 267
> Content-Type: application/x-www-form-urlencoded
> 
> DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+%
> 25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C
> +mRNA.%
> 0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm
> ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn
> 
> 
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content-
> Type" content="text/html; charset=utf-8"/><link rel="stylesheet"
> href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body
> bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"
> onload="StartBlastCgi();"><!--  the header   --> <table border="0"
> width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"
> colspan=4>    <map name="head_img_map">    <area shape="rect"
> coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home
> page">       <area shape="rect" coords="301,0,600,40" href="http://
> www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>
> <IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"
> NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"
> HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr
> align="center">    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI=
> yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN
> KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes
> &HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT
> O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td
> width="150" bgcolor="#003366">        <a href="http://
> www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI
> =yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI
> NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye
> s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</
> FONT></a></td></tr></table><br><!--  the contents   --> <form
> action="Blast.cgi" enctype="application/x-www-form-urlencoded"
> method="POST"><script src="blastcgi.js"></script><SCRIPT
> LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =
> 'html/head_formating.gif';// --></SCRIPT><br><hr><font
> color="red">ERROR: Cannot accept request, error code: 1Number of
> unfinished requests (151)  from your IP address reached the HARD
> limit 150.</font><hr></form>   </body></html>
> ---------------------------------------------------
> 
> On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:
> 
> > I would say, based on previous responses, update to the latest CVS
> > (bioperl-live).  You could also try updating
> > Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> > don't want to update the entire toolkit.  Running these with BLAST
> > 2.2.14 output seems to work fine.
> >
> > Though this is the likely fix, if you have additional problems next
> > time please make sure to include more information.  We have no idea
> > what OS, bioperl version, perl version you are running.  And a code
> > snippet and bug description would be nice (i.e. "it doesn't work" -
> > not a good description; "the script freezes" is a little more
> > informative).
> >
> > Chris
> >
> > On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
> >
> >> Dear all,
> >>
> >> Firstly I hope this is the right email list to write to!
> >>
> >> Secondly, I have a little program that parses the BLAST results i
> >> have got
> >> running remotely to the NCBI server and takes out all the hit
> >> sequences and
> >> converts them to FASTA format.
> >>
> >> Now when using BROAD BLAST and getting results this works fine
> >> (tblastn ver
> >> 2.2.9). However, NCBI have just updated their BLAST server (to
> >> 2.2.14) and
> >> the output is different and the parsing no longer works. I was
> >> wondering if
> >> anyone knew of a new SearchIO module / script that is designed to
> >> blast the
> >> updated NCBI BLAST output?
> >>
> >> Thanks for your time,
> >>
> >>
> >> John
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From kellert at ohsu.edu  Thu Jun  8 14:39:04 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 11:39:04 -0700
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
Message-ID: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>

I'm having the same problem bp_remote_blast.pl worked yesterday,  
today it's busted. Incidently, I got the following email from NCBI  
this morning:
The new version of the NCBI SOAP E-Utilities, which includes recent
changes to the NCBI sequence databases schema, was released today.

Thank you.
NCBI E-Utilities Team

I wouldn't have thought that that would affect  
Bio::Tools::RemoteBlast but something has changed.

Here's a snippet of the output after $ bp_remote_blast.pl -p blastn - 
d nr -e 1e-3 -i nm_008540.fasta

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
Content-Length: 267
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+% 
25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C 
+mRNA.% 
0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm 
ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn


---------------------------------------------------

-------------------- WARNING ---------------------
MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content- 
Type" content="text/html; charset=utf-8"/><link rel="stylesheet"  
href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body  
bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"  
onload="StartBlastCgi();"><!--  the header   --> <table border="0"  
width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"  
colspan=4>    <map name="head_img_map">    <area shape="rect"  
coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home  
page">       <area shape="rect" coords="301,0,600,40" href="http:// 
www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>     
<IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"  
NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"  
HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr  
align="center">    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI= 
yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN 
KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes 
&HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT 
O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td  
width="150" bgcolor="#003366">        <a href="http:// 
www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI 
=yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI 
NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye 
s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</ 
FONT></a></td></tr></table><br><!--  the contents   --> <form  
action="Blast.cgi" enctype="application/x-www-form-urlencoded"  
method="POST"><script src="blastcgi.js"></script><SCRIPT  
LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =  
'html/head_formating.gif';// --></SCRIPT><br><hr><font  
color="red">ERROR: Cannot accept request, error code: 1Number of  
unfinished requests (151)  from your IP address reached the HARD  
limit 150.</font><hr></form>   </body></html>
---------------------------------------------------

On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:

> I would say, based on previous responses, update to the latest CVS
> (bioperl-live).  You could also try updating
> Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> don't want to update the entire toolkit.  Running these with BLAST
> 2.2.14 output seems to work fine.
>
> Though this is the likely fix, if you have additional problems next
> time please make sure to include more information.  We have no idea
> what OS, bioperl version, perl version you are running.  And a code
> snippet and bug description would be nice (i.e. "it doesn't work" -
> not a good description; "the script freezes" is a little more
> informative).
>
> Chris
>
> On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
>
>> Dear all,
>>
>> Firstly I hope this is the right email list to write to!
>>
>> Secondly, I have a little program that parses the BLAST results i
>> have got
>> running remotely to the NCBI server and takes out all the hit
>> sequences and
>> converts them to FASTA format.
>>
>> Now when using BROAD BLAST and getting results this works fine
>> (tblastn ver
>> 2.2.9). However, NCBI have just updated their BLAST server (to
>> 2.2.14) and
>> the output is different and the parsing no longer works. I was
>> wondering if
>> anyone knew of a new SearchIO module / script that is designed to
>> blast the
>> updated NCBI BLAST output?
>>
>> Thanks for your time,
>>
>>
>> John
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Jun  8 15:28:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 14:28:18 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <200606081049.40232.heikki@sanbi.ac.za>
Message-ID: <000001c68b31$b5320390$15327e82@pyrimidine>

Here are tests run from WinXP, ActivePerl 5.8.817; almost everything passes.
Not sure what's going on with StandAloneBlast or the protgraph tests, so
I'll check into it.  The psm.t tests that failed are the same as the ones
mentioned previously on other systems.
As an aside, I hate that using '-w' flag with ActivePerl gives a thousand
useless 'subroutines redefined' warnings; only way I found to turn it off is
to not use the flag.  Anyway, I pulled out the relevant chunks of code here;
I'll submit the Mac results separately to not confuse the two.  

...
t/StandAloneBlast............FAILED tests 19-22
	Failed 4/18 tests, 77.78% okay
...
t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
36-37, 45, 48-56, 59-60, 65-66
	Failed 22/66 tests, 66.67% okay
...
t/psm........................Illegal division by zero at t/psm.t line 147,
<GEN1> line 36.
dubious
	Test returned status 9 (wstat 2304, 0x900)
DIED. FAILED tests 29, 32-48
Failed 18/48 tests, 62.50% okay
...
Failed Test         Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/StandAloneBlast.t               18    4  22.22%  19-22
t/protgraph.t                     66   22  33.33%  11 13 20-21 26 33 36-37
45
                                                   48-56 59-60 65-66
t/psm.t                9  2304    48   35  72.92%  29 32-48
39 subtests skipped.
Failed 3/233 test scripts, 98.71% okay. 36/11100 subtests failed, 99.68%
okay.
NMAKE :  U1077: 
Stop.


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Thursday, June 08, 2006 3:50 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; BioPerl Mailing List; Chris Fields
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall with
> "returnundef"
> 
> Looks like we survived the sweeping change - and fixed a number of
> existing
> bugs in the process. Thanks for everyone who helped!
> 
> 	-Heikki
> 
> On Thursday 08 June 2006 02:50, David Messina wrote:
> > Thanks for letting me know, Chris.
> >
> > Here's a new round of results on bioperl-live checked out moments ago:
> > [OS X 10.4.6, perl 5.8.6]
> >
> > Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> > ------------------------------------------------------------------------
> > -------
> > t/DBCUTG.t                  29    5  17.24%  26 30-32
> > t/LocusLink.t               23    1   4.35%  23
> > t/PopGen.t                  89    1   1.12%  85
> > t/psm.t        255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                21   15  71.43%  7-21
> > 121 subtests skipped.
> > Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> > 99.69% okay.
> >
> > Fixed since earlier today
> > =========================
> > Annotation.t
> > PhysicalMap.t
> > TaxonTree.t
> > alignUtilities.t
> >
> > New since earlier today
> > =======================
> > PopGen.t
> >
> > t/PopGen.....................FAILED test 85
> >          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> > okay, 96.63%)
> >
> > Unchanged
> > =========
> > DBCUTG.t
> > LocusLink.t
> > psm.t
> > tutorial.t
> >
> > Remote-server tests were run like before. I forgot to mention last
> > time that I skipped the local DB tests and I don't have bioperl-ext
> > installed, so several staden-related tests were also skipped.
> >
> > Dave
> >
> > My results from earlier today for reference:
> > > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > > ----------------------------------------------------------------------
> > > --
> > > -------
> > > t/Annotation.t                   89    2   2.25%  79 88
> > > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > > t/LocusLink.t                    23    1   4.35%  23
> > > t/PhysicalMap.t                  14    2  14.29%  11-12
> > > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > > t/alignUtilities.t                9    1  11.11%  9
> > > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > > t/tutorial.t                     21   15  71.43%  7-21
> > > 114 subtests skipped.
> > > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > > 99.84% okay.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fernan at iib.unsam.edu.ar  Thu Jun  8 13:02:27 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Thu, 8 Jun 2006 14:02:27 -0300
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <4487BBBD.6060702@infotech.monash.edu.au>
References: <4487BBBD.6060702@infotech.monash.edu.au>
Message-ID: <20060608170227.GF3334@iib.unsam.edu.ar>

+----[ Torsten Seemann <torsten.seemann at infotech.monash.edu.au> (08.Jun.2006 13:47):
|
| Hi all,
| 
| I've just been further auditing the Bioperl code and noticed that
| Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
| can't locate an example/sample sequence file in "Lasergene" format.
| 
|  From the code it looks similar to 'raw' format but has "^^" as
| a separator character.
| 
| Can anyone provide a real-life example so I can augment the 
| t/lasergene.t tests?
|
+----]

See the attached file. 

The format seems to be plain text, beginning with a free
text description that goes from the beginning of the file
until the "^^" delimiter, and after that the sequence.

Fernan
-------------- next part --------------
Created: Jueves, 08 de Junio de 2006 01:56 p.m.

This is a test sequence created with EditSeq (Lasergene's DNAStar)

^^
ATCGATCGATCG

From freimuth at pathology.wustl.edu  Thu Jun  8 13:12:36 2006
From: freimuth at pathology.wustl.edu (Freimuth, Robert)
Date: Thu, 8 Jun 2006 12:12:36 -0500
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
Message-ID: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>

Hi,

I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
of hits from blast, then get some information about the tiled result.  I
thought I'd use the num_unaligned_query and num_unaligned_hit methods to
get the number of unaligned bases in the tiled result, then subtract
that from the length of the query/subject sequence to get the number of
aligned bases in the region spanned by the hit(s).  My code is below,
followed by the error message.


while( my $result_obj = $blast_obj->next_result() )
{
    while( my $hit_obj = $result_obj->next_hit() )
    {
        my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
=> $hit_obj->name() );
        $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
this number of bp

        while( my $hsp_obj = $hit_obj->next_hsp() )
        {
            # add all HSPs to a GenericHit object so they can be tiled
together
            $generic_hit_obj->add_hsp( $hsp_obj );
        }

        my $num_unaligned_query =
$generic_hit_obj->num_unaligned_query();
        my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();


------------- EXCEPTION  -------------
MSG: Must have defined query_len
STACK Bio::Search::Hit::GenericHit::logical_length
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
STACK Bio::Search::Hit::GenericHit::num_unaligned_query
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
STACK toplevel blast_needle_timetrials_1.pl:94
 
--------------------------------------


I looked through the docs to try to find an explanation or some mention
of how to set query_len, but I didn't find anything.  Could someone
please point out what I'm doing wrong?  Additionally, if I'm making this
harder than it needs to be, please give me a gentle whack with the clue
stick.

Thanks,
Bob


From osborne1 at optonline.net  Thu Jun  8 15:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From osborne1 at optonline.net  Thu Jun  8 15:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From jason at bioperl.org  Thu Jun  8 16:15:47 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 8 Jun 2006 16:15:47 -0400
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <84AC010A-25E6-48C7-A723-CE4688ECA926@bioperl.org>

why are you trying to create new Hit objects?
  $hit_obj is-A GenericHit object...


-jason
On Jun 8, 2006, at 1:12 PM, Freimuth, Robert wrote:

> Hi,
>
> I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
> of hits from blast, then get some information about the tiled  
> result.  I
> thought I'd use the num_unaligned_query and num_unaligned_hit  
> methods to
> get the number of unaligned bases in the tiled result, then subtract
> that from the length of the query/subject sequence to get the  
> number of
> aligned bases in the region spanned by the hit(s).  My code is below,
> followed by the error message.
>
>
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
>
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
>
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
> STACK Bio::Search::Hit::GenericHit::num_unaligned_query
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
> STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
> STACK toplevel blast_needle_timetrials_1.pl:94
>
> --------------------------------------
>
>
> I looked through the docs to try to find an explanation or some  
> mention
> of how to set query_len, but I didn't find anything.  Could someone
> please point out what I'm doing wrong?  Additionally, if I'm making  
> this
> harder than it needs to be, please give me a gentle whack with the  
> clue
> stick.
>
> Thanks,
> Bob
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 18:36:00 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 09 Jun 2006 08:36:00 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <20060608170227.GF3334@iib.unsam.edu.ar>
References: <4487BBBD.6060702@infotech.monash.edu.au>
	<20060608170227.GF3334@iib.unsam.edu.ar>
Message-ID: <4488A650.2050803@infotech.monash.edu.au>

> I've just been further auditing the Bioperl code and noticed that
> Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
> can't locate an example/sample sequence file in "Lasergene" format.

Thanks to Fernan, Todd and Senthil who sent me example Lasergene files.
Those will be enough examples to write some tests.

--Torsten


From kellert at ohsu.edu  Thu Jun  8 20:29:10 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 17:29:10 -0700
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>

Greetings,
Is fink still a reasonable way to install and maintain bioperl?  
(There's been some emails about instability.) How 'bout upgrades: the  
way I have fink installed it's path is first when perl reads @INC. So  
if I put a newer Bio::something in /usr/local/whereever it won't be  
seen if an older module is in the fink path.  Can I upgrade in the  
fink "space" without messing up fink's database? Other options?

Thanks,
Tom K


Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From hlapp at gmx.net  Thu Jun  8 21:19:28 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 8 Jun 2006 21:19:28 -0400
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
	<1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
Message-ID: <060FC8CE-FD89-436E-B79C-135BB4F324CD@gmx.net>

Why don't you remove the fink bioperl package if you want to install  
a newer version locally?

BTW unless you use a custom-compiled perl your packages will end up  
in /Library/Perl/5.8.6/ (or /System/Library/Perl/5.8.6/), not /usr/ 
local, when you issue 'make install'.

	-hilmar

On Jun 8, 2006, at 8:29 PM, Thomas J Keller wrote:

> Greetings,
> Is fink still a reasonable way to install and maintain bioperl?
> (There's been some emails about instability.) How 'bout upgrades: the
> way I have fink installed it's path is first when perl reads @INC. So
> if I put a newer Bio::something in /usr/local/whereever it won't be
> seen if an older module is in the fink path.  Can I upgrade in the
> fink "space" without messing up fink's database? Other options?
>
> Thanks,
> Tom K
>
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Jun  8 22:30:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 21:30:20 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <C0ADF5CF.8C8F%osborne1@optonline.net>
Message-ID: <000c01c68b6c$a8184710$15327e82@pyrimidine>

Yes; using ActiveState's PPM:

ppm> query CLone
Querying target 1 (ActivePerl 5.8.7.815)
  1. Clone [0.20] recursively copy Perl datatypes
ppm>

v. 0.20 is the latest in CPAN.

I can try some additional tests with the relevant modules to see what the
problem is.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Thursday, June 08, 2006 2:42 PM
> To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; bioperl-l
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> with"returnundef"
> 
> Chris,
> 
> Odd. protgraph.t passes all of its tests on my computer. Do you have the
> Clone module installed?
> 
> Brian O.
> 
> 
> On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> > 36-37, 45, 48-56, 59-60, 65-66
> > Failed 22/66 tests, 66.67% okay
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  9 03:35:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 09:35:12 +0200
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <448850CE.1040105@colibase.bham.ac.uk>
References: <448850CE.1040105@colibase.bham.ac.uk>
Message-ID: <200606090935.12758.heikki@sanbi.ac.za>

Roy,

The definitive document describing the locations is the feature table 
definition:

http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#3.5

but you probably know that already.


Two questions come to mind:

1. Can you parse your joint location using bioperl without errors?

2. Is there a practical advantage in including a location which has no 
relevance to the sequence in hand?

I notice that the /partial qualifier is deprecated and the docs suggest using 
</> signs to indicate that the sequence is partial, so I guess what you are 
doing is  correct.

	-Heikki

On Thursday 08 June 2006 18:31, Roy Chaudhuri wrote:
> Hi all.
>
> I've been playing around with a subroutine to truncate a sequence and
> adjust the coordinates of any features that overlap the specified
> region- something that according to the comments in
> Bio::Location::Simple has been abortively worked on in the past.
>
> I've submitted the subroutine as an enhancement in Bugzilla. It's a bit
> hacky but works for what I needed it for. However I'm a bit unsure on
> the best way to deal with split locations where one of the sublocations
> is entirely outside the truncated region. My current method results in
> locations like:
> join(1..500, >1000..>1000)
>
> which is quite ugly and possibly invalid, but kind of makes sense. Does
> anyone know what would be the correct behaviour for this situation?
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
>
> http://xbase.bham.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Fri Jun  9 04:06:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 10:06:30 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <000c01c68b6c$a8184710$15327e82@pyrimidine>
References: <000c01c68b6c$a8184710$15327e82@pyrimidine>
Message-ID: <200606091006.30893.heikki@sanbi.ac.za>

I am using:
   This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
and I have Clone installed, but more than half the tests fail.

Something is badly wrong.


	-Heikki
bala ~/src/bioperl/core> perl -w t/protgraph.t
1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Failed test 10 in t/protgraph.t at line 85
not ok 11
# Test 11 got: '5' (t/protgraph.t at line 86)
#    Expected: '13'
not ok 12
# Failed test 12 in t/protgraph.t at line 94
not ok 13
# Test 13 got: '5' (t/protgraph.t at line 95)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t/protgraph.t at line 113)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t/protgraph.t at line 114)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t/protgraph.t at line 122)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t/protgraph.t at line 150)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t/protgraph.t at line 158)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t/protgraph.t at line 163)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t/protgraph.t at line 187
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t/protgraph.t at line 212)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t/protgraph.t at line 228)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t/protgraph.t at line 229)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t/protgraph.t at line 233)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t/protgraph.t at line 234)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t/protgraph.t at line 248)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t/protgraph.t at line 249)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t/protgraph.t at line 253)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t/protgraph.t at line 254)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t/protgraph.t at line 263)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t/protgraph.t at line 264)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t/protgraph.t at line 275)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t/protgraph.t at line 276)
#    Expected: '5'


On Friday 09 June 2006 04:30, Chris Fields wrote:
> Yes; using ActiveState's PPM:
>
> ppm> query CLone
> Querying target 1 (ActivePerl 5.8.7.815)
>   1. Clone [0.20] recursively copy Perl datatypes
> ppm>
>
> v. 0.20 is the latest in CPAN.
>
> I can try some additional tests with the relevant modules to see what the
> problem is.
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > Sent: Thursday, June 08, 2006 2:42 PM
> > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > with"returnundef"
> >
> > Chris,
> >
> > Odd. protgraph.t passes all of its tests on my computer. Do you have the
> > Clone module installed?
> >
> > Brian O.
> >
> > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > Failed 22/66 tests, 66.67% okay
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 04:08:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 09:08:18 +0100
Subject: [Bioperl-l] undef query_len error
	with	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <44892C72.2040605@mrc-dunn.cam.ac.uk>

Freimuth, Robert wrote:
> Hi,
> 
> I'm trying to use the Bio::Search::Hit::GenericHit
[snip]
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
> 
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
> 
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
> 
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
[snip]
> I looked through the docs to try to find an explanation or some mention
> of how to set query_len, but I didn't find anything.

As Jason asked, why are you essentially recreating the hit object?
The problem you are seeing is that the query length is normally set via 
SearchIO stream via ResultI when it internally creates a new hit object.
When you created your own hit object you didn't supply -query_len as an 
option to new(), nor did you later use the query_length() method to set it.

If you really do need your $generic_hit_obj (instead of just using 
$hit_obj), do $generic_hit_obj->query_length($hit_obj->query_length); 
(Or if you know the length of your query sequence, supply that directly.)


From zhangchnxp at gmail.com  Fri Jun  9 05:05:36 2006
From: zhangchnxp at gmail.com (Zhang chnxp)
Date: Fri, 9 Jun 2006 17:05:36 +0800
Subject: [Bioperl-l] Are there any modules handling the HLA Typing (Sequence
	Based Typing) ?
Message-ID: <4d1768a60606090205m6e360413paf172fa4e731ef2e@mail.gmail.com>

Hi there,
  I have some .abi trace files from an ABI3100 Genetic Analyzer. Are
there any packages handling the typing work of HLA-A, -B, -C, -DRB1,
etc.? Or are there any free softwares solving the ambiguity through
the SBT?


From cain at cshl.edu  Wed Jun  7 19:02:43 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 07 Jun 2006 19:02:43 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"return	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149721363.12513.96.camel@localhost.localdomain>

On Wed, 2006-06-07 at 17:26 -0500, David Messina wrote:
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
That is a problem with the cvs server at SourceForge (where the Sequence
Ontology is hosted).  I changed the module that tries to get that file
(I don't remember off hand what it was).  

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060607/eca6cf35/attachment-0003.bin>

From oldham at ucla.edu  Thu Jun  8 22:07:34 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Thu, 8 Jun 2006 19:07:34 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large file
Message-ID: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>

Dear all,

I am a total Bioperl newbie struggling to accomplish a conceptually simple
task.  I have a single large fasta file containing about 200,000 probe
sequences (from an Affymetrix microarray), each of which looks like this:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC

What I would like to do is extract from this file a subset of ~130,800
probes (both the header and the sequence) and output this subset into a new
fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
("1138_at" is the probe set ID in the header listed above); I have these
8,175 IDs listed in a separate file.  I *think* that I managed to create an
index of all 200,000 probes in the original fasta file using the following
script:

#!/usr/bin/perl -w

 # script 1: create the index

 use Bio::Index::Fasta;
 use strict;
 my $Index_File_Name = shift;
 my $inx = Bio::Index::Fasta->new(
     -filename => $Index_File_Name,
     -write_flag => 1);
 $inx->make_index(@ARGV);

I'm not sure if this is the most sensible approach, and even if it is, I'm
not sure what to do next.  Any help would be greatly appreciated!

Many thanks,
Mike O.


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006


From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 10:52:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 15:52:59 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <44898B4B.8080901@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes
[snip]
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
[snip]
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!

I'd say you're on the right lines. Next, you should continue reading the 
  rest of the synopsis and description in the docs for Bio::Index::Fasta.

Perhaps it's not clear, but you don't need to say 
$inx->make_index(@ARGV); if you've already provided -file to new() and 
are only dealing with one file. You also can't supply -file to new() if 
you want to change the id_parser (which you do, since you need to tell 
it how to detect your probe set ID).

Having indexed your file you can then output the desired sequences, just 
like the foreach loop suggested in the synopsis. (You could have that in 
the same script.)


One thing I'm not clear on is why it needs -write_flag => 1. Why can't 
it index a read-only database? Even when you set -write_flag allowing it 
to work, it doesn't write anything...


From simon.andrews at bbsrc.ac.uk  Fri Jun  9 11:01:05 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 9 Jun 2006 16:01:05 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>

 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Michael Oldham
> Sent: 09 June 2006 03:08
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a 
> single large file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a 
> conceptually simple task.  I have a single large fasta file 
> containing about 200,000 probe sequences (from an Affymetrix 
> microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC

Unfortunately that's not Fasta format (which only has a single header
line starting with a '>'.  I'd imagine that most programs which deal
with fasta which read that entry would see it as two sequences, the
first of which is empty.


> What I would like to do is extract from this file a subset of 
> ~130,800 probes (both the header and the sequence) and output 
> this subset into a new fasta file.  These 130,800 probes 
> correspond to 8,175 probe set IDs ("1138_at" is the probe set 
> ID in the header listed above)

If you're only having to do this once then it should be fairly quick to
knock up a one off script to do this.  Since you've only got 8000ish
probeset ids then you can probably just read those into a hash to start
with then parse through your big sequence file with something like;


#!perl
use warnings;
use strict;

my %probe_ids;

# Add real code here to populate your hash
$probe_ids{1138_at} = 1;
##########################################


open (IN,'your_affy_file.txt') or die "Can't read affy file: $!";

open (OUT,'>','probe_list.txt') or die "Can't write output: $!";

while (<IN>) {

  if (/^>probe/) {
    # This assumes there are always 3 lines per probe entry
    if (exists $probe_ids{(split(/:/))[2]}) {
      print OUT;
      print OUT scalar <IN>;
      print OUT scalar <IN>;
    }
  }
}


From MEC at stowers-institute.org  Fri Jun  9 10:58:22 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 9 Jun 2006 09:58:22 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work: 

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a 
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks 
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this 
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I 
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed 
>to create an
>index of all 200,000 probes in the original fasta file using 
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even 
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From senthil at cdfd.org.in  Fri Jun  9 18:21:11 2006
From: senthil at cdfd.org.in (M Senthil Kumar)
Date: Fri, 9 Jun 2006 15:21:11 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>


On Fri, 9 Jun 2006, simon andrews (BI) wrote:
|
|
|> -----Original Message-----
|> From: bioperl-l-bounces at lists.open-bio.org
|> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
|> Michael Oldham
|> Sent: 09 June 2006 03:08
|> To: bioperl-l at lists.open-bio.org
|> Subject: [Bioperl-l] Output a subset of FASTA data from a
|> single large file
|>
|> Dear all,
|>
|> I am a total Bioperl newbie struggling to accomplish a
|> conceptually simple task.  I have a single large fasta file
|> containing about 200,000 probe sequences (from an Affymetrix
|> microarray), each of which looks like this:
|>
|> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
|> >Antisense;
|> TGGCTCCTGCTGAGGTCCCCTTTCC
|
|Unfortunately that's not Fasta format (which only has a single header
|line starting with a '>'.  I'd imagine that most programs which deal
|with fasta which read that entry would see it as two sequences, the
|first of which is empty.
|

[snipped]

hi,

I think the file is in fasta format and probably you might have seen it
differently because of your mail transport agent.

Senthil


From cjfields at uiuc.edu  Fri Jun  9 13:59:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:18 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>
Message-ID: <002b01c68bee$6e3237e0$15327e82@pyrimidine>

No; I saw the same thing here.  It's not FASTA in the traditional sense:

http://www.bioperl.org/wiki/FASTA_sequence_format

though he did get it to build a database successfully.  Well, 'success' in
the sense that no errors were thrown.  I've learned the absence of error
messages does not necessarily mean that everything went as planned; it
depends on how much error handling has been added to the module by the
submitting author.  

It's possible that the second annotation line was ignored completely.  I
suppose it's also possible that two sequences are entered into the database,
an empty sequence for the first '>' line and the full sequence for the
second.  It's all dependent on how the parser handles this.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of M Senthil Kumar
> Sent: Friday, June 09, 2006 5:21 PM
> To: simon andrews (BI)
> Cc: bioperl-l at lists.open-bio.org; Michael Oldham
> Subject: Re: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> 
> 
> On Fri, 9 Jun 2006, simon andrews (BI) wrote:
> |
> |
> |> -----Original Message-----
> |> From: bioperl-l-bounces at lists.open-bio.org
> |> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> |> Michael Oldham
> |> Sent: 09 June 2006 03:08
> |> To: bioperl-l at lists.open-bio.org
> |> Subject: [Bioperl-l] Output a subset of FASTA data from a
> |> single large file
> |>
> |> Dear all,
> |>
> |> I am a total Bioperl newbie struggling to accomplish a
> |> conceptually simple task.  I have a single large fasta file
> |> containing about 200,000 probe sequences (from an Affymetrix
> |> microarray), each of which looks like this:
> |>
> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> |> >Antisense;
> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> |
> |Unfortunately that's not Fasta format (which only has a single header
> |line starting with a '>'.  I'd imagine that most programs which deal
> |with fasta which read that entry would see it as two sequences, the
> |first of which is empty.
> |
> 
> [snipped]
> 
> hi,
> 
> I think the file is in fasta format and probably you might have seen it
> differently because of your mail transport agent.
> 
> Senthil
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 13:59:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:31 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606091006.30893.heikki@sanbi.ac.za>
Message-ID: <002c01c68bee$76219ef0$15327e82@pyrimidine>

I ran tests this morning on protgraph.t using bioperl-live, Mac OS X (Intel)
running perl 5.8.6 and all tests passed, but I haven't updated from CVS
since June 7th.  The test results are almost exactly alike; most failed
tests are from unexpected results (with exactly the same results for both
OS's).  A few look more serious: test 45 failed on both and tests 10 and 12
failed on linux (the only noticeable difference between the two) 
...

ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
...

The line numbers seem to also be off by one (linux tests seem to have one
extra line); not sure if that means anything.

Here's the full WinXP protgraph.t results:

1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t\protgraph.t at line 112)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t\protgraph.t at line 113)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t\protgraph.t at line 121)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t\protgraph.t at line 149)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t\protgraph.t at line 157)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t\protgraph.t at line 162)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t\protgraph.t at line 186
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t\protgraph.t at line 211)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t\protgraph.t at line 227)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t\protgraph.t at line 228)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t\protgraph.t at line 232)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t\protgraph.t at line 233)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t\protgraph.t at line 247)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t\protgraph.t at line 248)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t\protgraph.t at line 252)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t\protgraph.t at line 253)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t\protgraph.t at line 262)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t\protgraph.t at line 263)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t\protgraph.t at line 274)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t\protgraph.t at line 275)
#    Expected: '5'  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Friday, June 09, 2006 3:07 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields; 'Brian Osborne'
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> I am using:
>    This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
> and I have Clone installed, but more than half the tests fail.
> 
> Something is badly wrong.
> 
> 
> 	-Heikki
> bala ~/src/bioperl/core> perl -w t/protgraph.t
> 1..66
> ok 1
> ok 2
> ok 3
> ok 4
> ok 5
> ok 6
> ok 7
> ok 8
> ok 9
> not ok 10
> # Failed test 10 in t/protgraph.t at line 85
> not ok 11
> # Test 11 got: '5' (t/protgraph.t at line 86)
> #    Expected: '13'
> not ok 12
> # Failed test 12 in t/protgraph.t at line 94
> not ok 13
> # Test 13 got: '5' (t/protgraph.t at line 95)
> #    Expected: '13'
> ok 14
> ok 15
> ok 16
> ok 17
> ok 18
> ok 19
> not ok 20
> # Test 20 got: '0.013' (t/protgraph.t at line 113)
> #    Expected: '0.027'
> .not ok 21
> # Test 21 got: '1' (t/protgraph.t at line 114)
> #    Expected: ''
> ..ok 22
> .ok 23
> ok 24
> ..ok 25
> .not ok 26
> # Test 26 got: '1' (t/protgraph.t at line 122)
> #    Expected: '5'
> ok 27
> ok 28
> ok 29
> ok 30
> ok 31
> ok 32
> not ok 33
> # Test 33 got: '139' (t/protgraph.t at line 150)
> #    Expected: '71'
> ok 34
> ok 35
> not ok 36
> # Test 36 got: '126' (t/protgraph.t at line 158)
> #    Expected: '58'
> .not ok 37
> # Test 37 got: '1' (t/protgraph.t at line 163)
> #    Expected: '15'
> ok 38
> ok 39
> ok 40
> ok 41
> ok 42
> ok 43
> ok 44
> not ok 45
> # Failed test 45 in t/protgraph.t at line 187
> ok 46
> ok 47
> not ok 48
> # Test 48 got: '75' (t/protgraph.t at line 212)
> #    Expected: '72'
> not ok 49
> # Test 49 got: '343' (t/protgraph.t at line 228)
> #    Expected: '72'
> not ok 50
> # Test 50 got: '368' (t/protgraph.t at line 229)
> #    Expected: '74'
> not ok 51
> # Test 51 got: '344' (t/protgraph.t at line 233)
> #    Expected: '73'
> not ok 52
> # Test 52 got: '368' (t/protgraph.t at line 234)
> #    Expected: '74'
> not ok 53
> # Test 53 got: '432' (t/protgraph.t at line 248)
> #    Expected: '72'
> not ok 54
> # Test 54 got: '461' (t/protgraph.t at line 249)
> #    Expected: '74'
> not ok 55
> # Test 55 got: '434' (t/protgraph.t at line 253)
> #    Expected: '74'
> not ok 56
> # Test 56 got: '463' (t/protgraph.t at line 254)
> #    Expected: '76'
> ok 57
> ok 58
> not ok 59
> # Test 59 got: '437' (t/protgraph.t at line 263)
> #    Expected: '3'
> not ok 60
> # Test 60 got: '467' (t/protgraph.t at line 264)
> #    Expected: '4'
> ok 61
> ok 62
> ok 63
> ok 64
> not ok 65
> # Test 65 got: '440' (t/protgraph.t at line 275)
> #    Expected: '3'
> not ok 66
> # Test 66 got: '472' (t/protgraph.t at line 276)
> #    Expected: '5'
> 
> 
> On Friday 09 June 2006 04:30, Chris Fields wrote:
> > Yes; using ActiveState's PPM:
> >
> > ppm> query CLone
> > Querying target 1 (ActivePerl 5.8.7.815)
> >   1. Clone [0.20] recursively copy Perl datatypes
> > ppm>
> >
> > v. 0.20 is the latest in CPAN.
> >
> > I can try some additional tests with the relevant modules to see what
> the
> > problem is.
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > > Sent: Thursday, June 08, 2006 2:42 PM
> > > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > > with"returnundef"
> > >
> > > Chris,
> > >
> > > Odd. protgraph.t passes all of its tests on my computer. Do you have
> the
> > > Clone module installed?
> > >
> > > Brian O.
> > >
> > > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > > Failed 22/66 tests, 66.67% okay
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Fri Jun  9 14:29:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 09 Jun 2006 14:29:53 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002b01c68bee$6e3237e0$15327e82@pyrimidine>
Message-ID: <C0AF3661.CD0A%sdavis2@mail.nih.gov>


On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> No; I saw the same thing here.  It's not FASTA in the traditional sense:
> 
> http://www.bioperl.org/wiki/FASTA_sequence_format
> 
> though he did get it to build a database successfully.  Well, 'success' in
> the sense that no errors were thrown.  I've learned the absence of error
> messages does not necessarily mean that everything went as planned; it
> depends on how much error handling has been added to the module by the
> submitting author.
> 
> It's possible that the second annotation line was ignored completely.  I
> suppose it's also possible that two sequences are entered into the database,
> an empty sequence for the first '>' line and the full sequence for the
> second.  It's all dependent on how the parser handles this.

I think that Senthil was pointing out that even though >Antisense looks to
be on its own line, it isn't, but is simply a continutation of the FASTA
header.  Judging from the context, that is the only interpretation that
makes sense.  

Sean

>> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> |> >Antisense;
>> |> TGGCTCCTGCTGAGGTCCCCTTTCC
>> |
>> |Unfortunately that's not Fasta format (which only has a single header
>> |line starting with a '>'.  I'd imagine that most programs which deal
>> |with fasta which read that entry would see it as two sequences, the
>> |first of which is empty.
>> |
>> 
>> [snipped]
>> 
>> hi,
>> 
>> I think the file is in fasta format and probably you might have seen it
>> differently because of your mail transport agent.


From cjfields at uiuc.edu  Fri Jun  9 15:05:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:05:44 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <002e01c68bf7$b594d210$15327e82@pyrimidine>

There's information in the HOWTOs:

http://www.bioperl.org/wiki/HOWTO:Flat_databases

http://www.bioperl.org/wiki/HOWTO:OBDA

Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
('fasta' format I/O) and this is what I got as output:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;


i.e. an empty sequence, which is what I guessed might happen, though I
thought it might pick up the second '>' and the full sequence there.  Since
the sequence is tossed you'll have to prescreen your sequence input stream
by either concatenating the two '>' lines together or screening for the
relevant information you want to retain.  You can try maybe getting this
info into Bio::Seq objects and writing to a Bio::SeqIO stream (to file or
file handle).

Once you have that set up, the HOWTO tells you how to set up custom or
secondary namespaces, so you can use a regex to parse out the information
for a primary or secondary keys:

http://www.bioperl.org/wiki/HOWTO:Flat_databases#Secondary_or_custom_namespa
ces

then you could select specific sequences this way (per the HOWTO):

$db->secondary_namespaces("GI");
my $acc_seq = $db->get_Seq_by_id("P84139");
my $gi_seq = $db->get_Seq_by_secondary("GI",443893);

or for multiple sequences (judging from the POD):

my $acc_seqio = $db->get_Stream_by_id(@ids);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Oldham
> Sent: Thursday, June 08, 2006 9:08 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes (both the header and the sequence) and output this subset into a
> new
> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> ("1138_at" is the probe set ID in the header listed above); I have these
> 8,175 IDs listed in a separate file.  I *think* that I managed to create
> an
> index of all 200,000 probes in the original fasta file using the following
> script:
> 
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
>  use strict;
>  my $Index_File_Name = shift;
>  my $inx = Bio::Index::Fasta->new(
>      -filename => $Index_File_Name,
>      -write_flag => 1);
>  $inx->make_index(@ARGV);
> 
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!
> 
> Many thanks,
> Mike O.
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 15:49:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:49:51 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <C0AF3661.CD0A%sdavis2@mail.nih.gov>
Message-ID: <002f01c68bfd$e1111e20$15327e82@pyrimidine>

> On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > No; I saw the same thing here.  It's not FASTA in the traditional sense:
> >
> > http://www.bioperl.org/wiki/FASTA_sequence_format
> >
> > though he did get it to build a database successfully.  Well, 'success'
> in
> > the sense that no errors were thrown.  I've learned the absence of error
> > messages does not necessarily mean that everything went as planned; it
> > depends on how much error handling has been added to the module by the
> > submitting author.
> >
> > It's possible that the second annotation line was ignored completely.  I
> > suppose it's also possible that two sequences are entered into the
> database,
> > an empty sequence for the first '>' line and the full sequence for the
> > second.  It's all dependent on how the parser handles this.
> 
> I think that Senthil was pointing out that even though >Antisense looks to
> be on its own line, it isn't, but is simply a continutation of the FASTA
> header.  Judging from the context, that is the only interpretation that
> makes sense.
> 
> Sean

Sorry.  Just checked through another mail client and you're right.  That's
what I get for trusting Mr. Gates (stupid Outlook).  I have seen a few funky
FASTA derivations, so I thought that's what was going on here.  My bad!

My point, though erroneous, was that the fasta format parser may not parse
this data correctly if he did have two description lines, but may not
indicate there are problems by throwing an exception.  I demonstrated that
using Bio::SeqIO as an example (you get empty sequences).  Bio::Index::Fasta
parses the file itself using this loop to index:

	# Main indexing loop
	while (<FASTA>) {
		if (/^>/) {
			# $begin is the position of the first character
after the '>'
			my $begin = tell(FASTA) - length( $_ ) + 1;

			foreach my $id (&$id_parser($_)) {
				$self->add_record($id, $i, $begin);
			}
		}
	}

Which simply looks for '>'.  That's fine for a vast majority of sequences.
I thought it would be nice to have something that's a little more strenuous
in verifying the format rather than trusting it implicitly, maybe by using
an eval{} block to make sure the format is FASTA-like and looks like
DNA/RNA/protein.  

Chris


> >> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >> |> >Antisense;
> >> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> >> |
> >> |Unfortunately that's not Fasta format (which only has a single header
> >> |line starting with a '>'.  I'd imagine that most programs which deal
> >> |with fasta which read that entry would see it as two sequences, the
> >> |first of which is empty.
> >> |
> >>
> >> [snipped]
> >>
> >> hi,
> >>
> >> I think the file is in fasta format and probably you might have seen it
> >> differently because of your mail transport agent.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Fri Jun  9 09:23:21 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 9 Jun 2006 15:23:21 +0200
Subject: [Bioperl-l] SimpleAlign
Message-ID: <716af09c0606090623v37c72bc5r1ddbcb2b8355a4a0@mail.gmail.com>

Hi,

Two queries with respect to SimpleAlign. I am using the following code
based on the POD.

my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
print $out $_ while <$in>;

1) is it possible to set set_displayname_flat() globally without doing
$_->set_displayname_flat() per alignment.

2) My input files have an ID and description line for each seq in the
alignment. When the file is converted I loose the description line. I
know I can get the description of the sequences (e.g.
$aln->get_seq_by_pos(2)->description()).
How could I export the complete fasta defline including the
description (I realize that general clustal format has a limit on the
number of characters, but still).

Regards,
Bernd


From oldham at ucla.edu  Fri Jun  9 21:39:45 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 9 Jun 2006 18:39:45 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>

Thanks to everyone for their helpful advice.  I think I am getting closer,
but no cigar quite yet.  The script below runs quickly with no errors--but
the output file is empty.  It seems that the problem must lie somewhere in
the 'while' loop, and I'm sure it's quite obvious to a more experienced
eye--but not to mine!  Any suggestions?  Thanks again for your help.

--Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID.dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Friday, June 09, 2006 7:58 AM
To: Michael Oldham; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single large
file


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work:

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed
>to create an
>index of all 200,000 probes in the original fasta file using
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: 6/9/2006


From cjfields at uiuc.edu  Sun Jun 11 00:32:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Jun 2006 23:32:04 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <F4E1042A-CE2D-4E51-B711-BDBB6E052FEB@uiuc.edu>

What happens if you just print $idmatch or $1 (i.e. check to see if  
the regex matches anything)?  If there is nothing printed then either  
the regex isn't working as expected or there is something logically  
wrong.  The problem may be that the captured string must match the id  
exactly, the id being the key to the %ID hash; any extra characters  
picked up by the regex outside of your id key and you will not get  
anything.  Looking at Malcolm's regex it should work just fine, but  
we only had one example sequence to try here.

If your while loop is set up like this won't it only print only the  
matched description lines to the outfile (no sequence) even if there  
is a match?  Or is this what you wanted?   If you want the sequence  
you should add 'print OUT <PROBES>;' after the 'print OUT;' line.

Chris

On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:

> Thanks to everyone for their helpful advice.  I think I am getting  
> closer,
> but no cigar quite yet.  The script below runs quickly with no  
> errors--but
> the output file is empty.  It seems that the problem must lie  
> somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more  
> experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>
> --Mike O.
>
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID.dat.txt';
>
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
>
> my $probes = 'HG_U95Av2_probe_fasta.txt';
>
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
> keys=PSIDs and
> all values=1.
>
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;
>
>
> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Friday, June 09, 2006 7:58 AM
> To: Michael Oldham; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
> single large
> file
>
>
>
> I wouldn't bioperl for this, or create an index.  Perl would do  
> fine and
> probably be faster.
>
> Assuming your ids are one per line in a file named id.dat looking like
> this
>
> 1138_at
> 1134_at
> etc..
>
> this should work:
>
> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> mybigfile.fa
>
> good luck
>
> --Malcolm Cook
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Michael Oldham
>> Sent: Thursday, June 08, 2006 9:08 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>> single large file
>>
>> Dear all,
>>
>> I am a total Bioperl newbie struggling to accomplish a
>> conceptually simple
>> task.  I have a single large fasta file containing about 200,000  
>> probe
>> sequences (from an Affymetrix microarray), each of which looks
>> like this:
>>
>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> Antisense;
>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>
>> What I would like to do is extract from this file a subset of  
>> ~130,800
>> probes (both the header and the sequence) and output this
>> subset into a new
>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>> ("1138_at" is the probe set ID in the header listed above); I
>> have these
>> 8,175 IDs listed in a separate file.  I *think* that I managed
>> to create an
>> index of all 200,000 probes in the original fasta file using
>> the following
>> script:
>>
>> #!/usr/bin/perl -w
>>
>> # script 1: create the index
>>
>> use Bio::Index::Fasta;
>> use strict;
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fasta->new(
>>     -filename => $Index_File_Name,
>>     -write_flag => 1);
>> $inx->make_index(@ARGV);
>>
>> I'm not sure if this is the most sensible approach, and even
>> if it is, I'm
>> not sure what to do next.  Any help would be greatly appreciated!
>>
>> Many thanks,
>> Mike O.
>>
>>
>>
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
> 6/8/2006
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
> 6/9/2006
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 04:21:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:21:31 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002e01c68bf7$b594d210$15327e82@pyrimidine>
References: <002e01c68bf7$b594d210$15327e82@pyrimidine>
Message-ID: <448D240B.6040508@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> There's information in the HOWTOs:
> 
> http://www.bioperl.org/wiki/HOWTO:Flat_databases
> 
> http://www.bioperl.org/wiki/HOWTO:OBDA
> 
> Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
> ('fasta' format I/O) and this is what I got as output:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> 
> 
> i.e. an empty sequence, which is what I guessed might happen
[snip]

As you later discovered, that was an Outlook problem. Just to make this 
thread relevant to bioperl, the bioperl solution is:

use Bio::SeqIO;
use Bio::Index::Fasta;
my $inx = Bio::Index::Fasta->new(-write_flag => 1);
$inx->id_parser(\&get_id);
$inx->make_index(shift);

my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
my $wanted_ids_file = shift;
open(IDS, $wanted_ids_file);
while (<IDS>) {
   chomp;
   my $seq = $inx->fetch($_);
   $out->write_seq($seq);
}

sub get_id {
   my $line = shift;
   $line =~ /^>probe:\S+?:(\S+?):/;
   $1;
}

It works for me on the sample sequence given by the OP.


From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 04:49:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:49:49 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <448D2AAD.3030601@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Thanks to everyone for their helpful advice.  I think I am getting closer,
> but no cigar quite yet.  The script below runs quickly with no errors--but
> the output file is empty.  It seems that the problem must lie somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> 
> --Mike O.
> 
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> my $IDs = 'ID.dat.txt';
> 
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
> 
> my $probes = 'HG_U95Av2_probe_fasta.txt';
> 
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
> 
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> 
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
> all values=1.
> 
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;

Not sure why it would print nothing (are the ids in IDFILE the same case 
as the ids in the fasta file, do they only contain word characters?), 
but even if it did you would only be printing out the fasta headers and 
not the sequences. Doing it the bioperl way gives you more flexibility 
in the future; you may want to do something with the sequences after 
printing them out, in which case do it in bioperl using Seq objects and 
skip the intermediate step of printing them.


From MEC at stowers-institute.org  Mon Jun 12 11:28:41 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:28:41 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F98D@exchkc02.stowers-institute.org>

Michael,

I don't think you can call perl's `print` on just a filehandle as you
are doing.  This is probably your problem.

If you call `select OUT` after opeining it, print will print $_ to it.
And, every line in the fasta record whose header matches on of the IDS
will get printed, not just the fasta header lines.  Read the code again
nothing that $idmatch is only getting reset when a correctly formatted
fasta header line is matched.

--Malcolm


>-----Original Message-----
>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>Sent: Saturday, June 10, 2006 11:32 PM
>To: Michael Oldham
>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>What happens if you just print $idmatch or $1 (i.e. check to see if  
>the regex matches anything)?  If there is nothing printed then either  
>the regex isn't working as expected or there is something logically  
>wrong.  The problem may be that the captured string must match the id  
>exactly, the id being the key to the %ID hash; any extra characters  
>picked up by the regex outside of your id key and you will not get  
>anything.  Looking at Malcolm's regex it should work just fine, but  
>we only had one example sequence to try here.
>
>If your while loop is set up like this won't it only print only the  
>matched description lines to the outfile (no sequence) even if there  
>is a match?  Or is this what you wanted?   If you want the sequence  
>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>
>Chris
>
>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>
>> Thanks to everyone for their helpful advice.  I think I am getting  
>> closer,
>> but no cigar quite yet.  The script below runs quickly with no  
>> errors--but
>> the output file is empty.  It seems that the problem must lie  
>> somewhere in
>> the 'while' loop, and I'm sure it's quite obvious to a more  
>> experienced
>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>
>> --Mike O.
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>>
>> my $IDs = 'ID.dat.txt';
>>
>> unless (open(IDFILE, $IDs)) {
>> 	print "Could not open file $IDs!\n";
>> 	}
>>
>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>
>> unless (open(PROBES, $probes)) {
>> 	print "Could not open file $probes!\n";
>> 	}
>>
>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>
>> my @ID = <IDFILE>;
>> chomp @ID;
>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>> keys=PSIDs and
>> all values=1.
>>
>> 	while (<PROBES>) {
>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>> 		if ($idmatch){
>> 			print OUT;
>> 		}
>> 	}
>> exit;
>>
>>
>> -----Original Message-----
>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>> Sent: Friday, June 09, 2006 7:58 AM
>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>> single large
>> file
>>
>>
>>
>> I wouldn't bioperl for this, or create an index.  Perl would do  
>> fine and
>> probably be faster.
>>
>> Assuming your ids are one per line in a file named id.dat 
>looking like
>> this
>>
>> 1138_at
>> 1134_at
>> etc..
>>
>> this should work:
>>
>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>> mybigfile.fa
>>
>> good luck
>>
>> --Malcolm Cook
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Michael Oldham
>>> Sent: Thursday, June 08, 2006 9:08 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>> single large file
>>>
>>> Dear all,
>>>
>>> I am a total Bioperl newbie struggling to accomplish a
>>> conceptually simple
>>> task.  I have a single large fasta file containing about 200,000  
>>> probe
>>> sequences (from an Affymetrix microarray), each of which looks
>>> like this:
>>>
>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>> Antisense;
>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>
>>> What I would like to do is extract from this file a subset of  
>>> ~130,800
>>> probes (both the header and the sequence) and output this
>>> subset into a new
>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>> ("1138_at" is the probe set ID in the header listed above); I
>>> have these
>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>> to create an
>>> index of all 200,000 probes in the original fasta file using
>>> the following
>>> script:
>>>
>>> #!/usr/bin/perl -w
>>>
>>> # script 1: create the index
>>>
>>> use Bio::Index::Fasta;
>>> use strict;
>>> my $Index_File_Name = shift;
>>> my $inx = Bio::Index::Fasta->new(
>>>     -filename => $Index_File_Name,
>>>     -write_flag => 1);
>>> $inx->make_index(@ARGV);
>>>
>>> I'm not sure if this is the most sensible approach, and even
>>> if it is, I'm
>>> not sure what to do next.  Any help would be greatly appreciated!
>>>
>>> Many thanks,
>>> Mike O.
>>>
>>>
>>>
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>> 6/9/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>Christopher Fields
>Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>Dept of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>


From MEC at stowers-institute.org  Mon Jun 12 11:47:09 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:47:09 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F991@exchkc02.stowers-institute.org>

ooops, in my message 


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From MEC at stowers-institute.org  Mon Jun 12 11:48:02 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:48:02 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>

oops,

s/matches on of/matches one of/
s/nothing that/noting that/ 

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hubert.prielinger at gmx.at  Mon Jun 12 14:29:19 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 12 Jun 2006 12:29:19 -0600
Subject: [Bioperl-l] How to use gi2taxonid
Message-ID: <448DB27F.6090107@gmx.at>

hi,
I have downloaded the gi2taxonid file to get the taxonid for a GI number 
taken from a report as recommended here, but I don't know how to use the 
gi2taxonid file.
Jason wrote in a previous post that you have to make a DB_File out of 
it, but I don't know how....and finally tie it to a hash....
Can anybody give me a hint how to use it..... my final goal is to get 
the taxonomy.

thanks
Hubert


From cjfields at uiuc.edu  Mon Jun 12 15:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 14:13:30 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <000f01c68e54$4d155ac0$15327e82@pyrimidine>

Michael, Malcolm et al,

I ran Michael's code (not Malcolm's one-liner), with and w/o adding the file
handle line that I suggested.  My suggestion works b/c I'm calling the file
handle in scalar context, which reads the next line, just like '$foo =
<FILE>' or 'while(<FILE>) {}' advances to the next line (with $/ = "\n")
each time the file handle is called.  You could use:

$_ = <PROBES>;
print OUT;

I just chopped it down to one line.

Without the extra line I suggested I get only the description line (I used
this as a test file based on the original sequence and Michael's description
of the ID):

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;

Which I don't think Michael wants (he mentioned sequence and description, I
think).  

Modifying the loop in Michael's code to:
...

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT <PROBES>; # grabs next line and prints
	}
}

Gets:

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
AGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTATCC
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
TGGATCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;
TGGCTACTGCTGAGGTCCCCTTTCC

Which matches the ID's in the ID file (there are 10 sequences in the probes
file).  

I did notice one odd thing; I tried the above code on Mac OS X and it worked
fine (i.e. printed only the descriptions and sequences for the ID's in the
ID hash).  If I used Windows, I needed to use this version:

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT scalar(<PROBES>);		
	}
}

Or 'print <PROBES>;' prints all sequences (I guess it assumes list context
instead of scalar context when printing, so this forces it to be scalar).

Like I said, I haven't tried Malcolm's one-liner.  It's possible that it
works just as well as what I suggested.  I'm just responding to Michael's
code request.

Chris


> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 10:48 AM
> To: Cook, Malcolm; Chris Fields; Michael Oldham
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> oops,
> 
> s/matches on of/matches one of/
> s/nothing that/noting that/
> 
> --Malcolm
> 
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >Cook, Malcolm
> >Sent: Monday, June 12, 2006 10:29 AM
> >To: Chris Fields; Michael Oldham
> >Cc: bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >Michael,
> >
> >I don't think you can call perl's `print` on just a filehandle as you
> >are doing.  This is probably your problem.
> >
> >If you call `select OUT` after opeining it, print will print $_ to it.
> >And, every line in the fasta record whose header matches on of the IDS
> >will get printed, not just the fasta header lines.  Read the code again
> >nothing that $idmatch is only getting reset when a correctly formatted
> >fasta header line is matched.
> >
> >--Malcolm
> >
> >
> >>-----Original Message-----
> >>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>Sent: Saturday, June 10, 2006 11:32 PM
> >>To: Michael Oldham
> >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
> >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >>single large file
> >>
> >>What happens if you just print $idmatch or $1 (i.e. check to see if
> >>the regex matches anything)?  If there is nothing printed
> >then either
> >>the regex isn't working as expected or there is something logically
> >>wrong.  The problem may be that the captured string must
> >match the id
> >>exactly, the id being the key to the %ID hash; any extra characters
> >>picked up by the regex outside of your id key and you will not get
> >>anything.  Looking at Malcolm's regex it should work just fine, but
> >>we only had one example sequence to try here.
> >>
> >>If your while loop is set up like this won't it only print only the
> >>matched description lines to the outfile (no sequence) even if there
> >>is a match?  Or is this what you wanted?   If you want the sequence
> >>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
> >>
> >>Chris
> >>
> >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
> >>
> >>> Thanks to everyone for their helpful advice.  I think I am getting
> >>> closer,
> >>> but no cigar quite yet.  The script below runs quickly with no
> >>> errors--but
> >>> the output file is empty.  It seems that the problem must lie
> >>> somewhere in
> >>> the 'while' loop, and I'm sure it's quite obvious to a more
> >>> experienced
> >>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> >>>
> >>> --Mike O.
> >>>
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> use strict;
> >>>
> >>> my $IDs = 'ID.dat.txt';
> >>>
> >>> unless (open(IDFILE, $IDs)) {
> >>> 	print "Could not open file $IDs!\n";
> >>> 	}
> >>>
> >>> my $probes = 'HG_U95Av2_probe_fasta.txt';
> >>>
> >>> unless (open(PROBES, $probes)) {
> >>> 	print "Could not open file $probes!\n";
> >>> 	}
> >>>
> >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> >>>
> >>> my @ID = <IDFILE>;
> >>> chomp @ID;
> >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
> >>> keys=PSIDs and
> >>> all values=1.
> >>>
> >>> 	while (<PROBES>) {
> >>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> >>> 		if ($idmatch){
> >>> 			print OUT;
> >>> 		}
> >>> 	}
> >>> exit;
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> >>> Sent: Friday, June 09, 2006 7:58 AM
> >>> To: Michael Oldham; bioperl-l at lists.open-bio.org
> >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
> >>> single large
> >>> file
> >>>
> >>>
> >>>
> >>> I wouldn't bioperl for this, or create an index.  Perl would do
> >>> fine and
> >>> probably be faster.
> >>>
> >>> Assuming your ids are one per line in a file named id.dat
> >>looking like
> >>> this
> >>>
> >>> 1138_at
> >>> 1134_at
> >>> etc..
> >>>
> >>> this should work:
> >>>
> >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> >>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> >>> mybigfile.fa
> >>>
> >>> good luck
> >>>
> >>> --Malcolm Cook
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>>> Michael Oldham
> >>>> Sent: Thursday, June 08, 2006 9:08 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
> >>>> single large file
> >>>>
> >>>> Dear all,
> >>>>
> >>>> I am a total Bioperl newbie struggling to accomplish a
> >>>> conceptually simple
> >>>> task.  I have a single large fasta file containing about 200,000
> >>>> probe
> >>>> sequences (from an Affymetrix microarray), each of which looks
> >>>> like this:
> >>>>
> >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >>>> Antisense;
> >>>> TGGCTCCTGCTGAGGTCCCCTTTCC
> >>>>
> >>>> What I would like to do is extract from this file a subset of
> >>>> ~130,800
> >>>> probes (both the header and the sequence) and output this
> >>>> subset into a new
> >>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> >>>> ("1138_at" is the probe set ID in the header listed above); I
> >>>> have these
> >>>> 8,175 IDs listed in a separate file.  I *think* that I managed
> >>>> to create an
> >>>> index of all 200,000 probes in the original fasta file using
> >>>> the following
> >>>> script:
> >>>>
> >>>> #!/usr/bin/perl -w
> >>>>
> >>>> # script 1: create the index
> >>>>
> >>>> use Bio::Index::Fasta;
> >>>> use strict;
> >>>> my $Index_File_Name = shift;
> >>>> my $inx = Bio::Index::Fasta->new(
> >>>>     -filename => $Index_File_Name,
> >>>>     -write_flag => 1);
> >>>> $inx->make_index(@ARGV);
> >>>>
> >>>> I'm not sure if this is the most sensible approach, and even
> >>>> if it is, I'm
> >>>> not sure what to do next.  Any help would be greatly appreciated!
> >>>>
> >>>> Many thanks,
> >>>> Mike O.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> No virus found in this outgoing message.
> >>>> Checked by AVG Free Edition.
> >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>>> 6/8/2006
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>> --
> >>> No virus found in this incoming message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>> 6/8/2006
> >>>
> >>> --
> >>> No virus found in this outgoing message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
> >>> 6/9/2006
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>Christopher Fields
> >>Postdoctoral Researcher
> >>Lab of Dr. Robert Switzer
> >>Dept of Biochemistry
> >>University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From hlapp at gmx.net  Mon Jun 12 16:06:23 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 12 Jun 2006 16:06:23 -0400
Subject: [Bioperl-l] How to use gi2taxonid
In-Reply-To: <448DB27F.6090107@gmx.at>
References: <448DB27F.6090107@gmx.at>
Message-ID: <878FB829-AD31-457D-957E-210448D7F6F5@gmx.net>

Thought about typing

	$ perldoc DB_File

at the command line?

Hubert, are you trying to outsource what should be your own work to  
the bioperl list, or what motivates you to waste everybody's time? If  
you google 'how to ask good questions' this (indeed frequently cited,  
also on the bioperl list if you had paid attention) comes up as the  
first link:

http://www.catb.org/~esr/faqs/smart-questions.html

There's nothing I can add, except to read it in full before your next  
posting or you may reach the point fast at which nobody will bother  
to respond to you and do your homework for you.

On Jun 12, 2006, at 2:29 PM, Hubert Prielinger wrote:

> hi,
> I have downloaded the gi2taxonid file to get the taxonid for a GI  
> number
> taken from a report as recommended here, but I don't know how to  
> use the
> gi2taxonid file.
> Jason wrote in a previous post that you have to make a DB_File out of
> it, but I don't know how....and finally tie it to a hash....
> Can anybody give me a hint how to use it..... my final goal is to get
> the taxonomy.
>
> thanks
> Hubert
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon Jun 12 16:35:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 15:35:10 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <448D240B.6040508@mrc-dunn.cam.ac.uk>
Message-ID: <001201c68e5f$b34ec8c0$15327e82@pyrimidine>

...
> Chris Fields wrote:
> > There's information in the HOWTOs:
> >
> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >
> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >
...
> As you later discovered, that was an Outlook problem. Just to make this
> thread relevant to bioperl, the bioperl solution is:

Agreed (stupid Outlook).  It might be much faster to use non-Bioperl-ish
ways, but it is easier to further manipulate sequences (convert format,
analyze sequences, etc) using Bioperl directly.  I haven't used flat
databases much but it should move very quickly, even in an OO environment.

The one problem with the proposed non-bioperl method is, if you wanted
100,000 sequences (based on ID's) in a FASTA database file containing
200,000 sequences, all ID's would need to be stored (1) in an array (which
gulped the data from the ID file) and then map the ID's to (2) a hash;
that's may be a pretty big memory footprint depending on your system.  

Sendu's BioPerl version indexes the FASTA file based on the ID, then (1)
reads the ID's in one at a time from the file, (2) retrieves the data, then
(3) prints it out.   The advantage of this approach is that the built index
can be used in other bioperl scripts as well w/o having to rebuild it again,
so if you wanted a different set of ID's later on you can access the
database using the prebuilt index.  More can be found in the
Bio::Index::Fasta POD.  

You can also use the ideas and code in the HOWTO (Flat Databases) I
mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
advantage of these is that you can use Sleepycat's Berkeley Database through
the Perl BerkeleyDB module (more functionality than DB_File) which is faster
than a standard flat database.  In the HOWTO, specifically look under
'Secondary or custom namespaces' for ideas on how to use your ID as a
primary or secondary key.

Chris

> use Bio::SeqIO;
> use Bio::Index::Fasta;
> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> $inx->id_parser(\&get_id);
> $inx->make_index(shift);
> 
> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> my $wanted_ids_file = shift;
> open(IDS, $wanted_ids_file);
> while (<IDS>) {
>    chomp;
>    my $seq = $inx->fetch($_);
>    $out->write_seq($seq);
> }
> 
> sub get_id {
>    my $line = shift;
>    $line =~ /^>probe:\S+?:(\S+?):/;
>    $1;
> }
> 
> It works for me on the sample sequence given by the OP.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Mon Jun 12 16:23:45 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 12 Jun 2006 16:23:45 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>

I'm trying to install the bioperl-run package and an getting errors from
make test regarding PAML:

t/PAML....................ok 2/18Can't call method "get_MLmatrix" on an
undefined value at t/PAML.t line 85, <GEN2> line 85.
t/PAML....................dubious
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Is this a legitimate error or am I missing something?

Ryan


From MEC at stowers-institute.org  Mon Jun 12 17:15:35 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 16:15:35 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>

Yeah, good points...

... my recommendation of the one-liner was motivated based on a small
number of IDs and no other applications needing to index the entire
fasta database.


--Malcolm [At which point he bowed out of this fray]

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>Sent: Monday, June 12, 2006 3:35 PM
>To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>...
>> Chris Fields wrote:
>> > There's information in the HOWTOs:
>> >
>> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
>> >
>> > http://www.bioperl.org/wiki/HOWTO:OBDA
>> >
>...
>> As you later discovered, that was an Outlook problem. Just 
>to make this
>> thread relevant to bioperl, the bioperl solution is:
>
>Agreed (stupid Outlook).  It might be much faster to use 
>non-Bioperl-ish
>ways, but it is easier to further manipulate sequences (convert format,
>analyze sequences, etc) using Bioperl directly.  I haven't used flat
>databases much but it should move very quickly, even in an OO 
>environment.
>
>The one problem with the proposed non-bioperl method is, if you wanted
>100,000 sequences (based on ID's) in a FASTA database file containing
>200,000 sequences, all ID's would need to be stored (1) in an 
>array (which
>gulped the data from the ID file) and then map the ID's to (2) a hash;
>that's may be a pretty big memory footprint depending on your system.  
>
>Sendu's BioPerl version indexes the FASTA file based on the 
>ID, then (1)
>reads the ID's in one at a time from the file, (2) retrieves 
>the data, then
>(3) prints it out.   The advantage of this approach is that 
>the built index
>can be used in other bioperl scripts as well w/o having to 
>rebuild it again,
>so if you wanted a different set of ID's later on you can access the
>database using the prebuilt index.  More can be found in the
>Bio::Index::Fasta POD.  
>
>You can also use the ideas and code in the HOWTO (Flat Databases) I
>mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
>advantage of these is that you can use Sleepycat's Berkeley 
>Database through
>the Perl BerkeleyDB module (more functionality than DB_File) 
>which is faster
>than a standard flat database.  In the HOWTO, specifically look under
>'Secondary or custom namespaces' for ideas on how to use your ID as a
>primary or secondary key.
>
>Chris
>
>> use Bio::SeqIO;
>> use Bio::Index::Fasta;
>> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
>> $inx->id_parser(\&get_id);
>> $inx->make_index(shift);
>> 
>> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
>> my $wanted_ids_file = shift;
>> open(IDS, $wanted_ids_file);
>> while (<IDS>) {
>>    chomp;
>>    my $seq = $inx->fetch($_);
>>    $out->write_seq($seq);
>> }
>> 
>> sub get_id {
>>    my $line = shift;
>>    $line =~ /^>probe:\S+?:(\S+?):/;
>>    $1;
>> }
>> 
>> It works for me on the sample sequence given by the OP.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Jun 12 17:20:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 16:20:55 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>
Message-ID: <001601c68e66$17b760a0$15327e82@pyrimidine>

Sorry Malcolm.  I didn't want to imply that your way or the bioperl way was
best, just point out advantages/disadvantages.  

Oops, didn't point out the possible Bioperl disadvantage (too many objects
generated = slow slow slow).  

Chris

> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 4:16 PM
> To: Chris Fields; Sendu Bala; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> Yeah, good points...
> 
> ... my recommendation of the one-liner was motivated based on a small
> number of IDs and no other applications needing to index the entire
> fasta database.
> 
> 
> --Malcolm [At which point he bowed out of this fray]
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Monday, June 12, 2006 3:35 PM
> >To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >...
> >> Chris Fields wrote:
> >> > There's information in the HOWTOs:
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >> >
> >...
> >> As you later discovered, that was an Outlook problem. Just
> >to make this
> >> thread relevant to bioperl, the bioperl solution is:
> >
> >Agreed (stupid Outlook).  It might be much faster to use
> >non-Bioperl-ish
> >ways, but it is easier to further manipulate sequences (convert format,
> >analyze sequences, etc) using Bioperl directly.  I haven't used flat
> >databases much but it should move very quickly, even in an OO
> >environment.
> >
> >The one problem with the proposed non-bioperl method is, if you wanted
> >100,000 sequences (based on ID's) in a FASTA database file containing
> >200,000 sequences, all ID's would need to be stored (1) in an
> >array (which
> >gulped the data from the ID file) and then map the ID's to (2) a hash;
> >that's may be a pretty big memory footprint depending on your system.
> >
> >Sendu's BioPerl version indexes the FASTA file based on the
> >ID, then (1)
> >reads the ID's in one at a time from the file, (2) retrieves
> >the data, then
> >(3) prints it out.   The advantage of this approach is that
> >the built index
> >can be used in other bioperl scripts as well w/o having to
> >rebuild it again,
> >so if you wanted a different set of ID's later on you can access the
> >database using the prebuilt index.  More can be found in the
> >Bio::Index::Fasta POD.
> >
> >You can also use the ideas and code in the HOWTO (Flat Databases) I
> >mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
> >advantage of these is that you can use Sleepycat's Berkeley
> >Database through
> >the Perl BerkeleyDB module (more functionality than DB_File)
> >which is faster
> >than a standard flat database.  In the HOWTO, specifically look under
> >'Secondary or custom namespaces' for ideas on how to use your ID as a
> >primary or secondary key.
> >
> >Chris
> >
> >> use Bio::SeqIO;
> >> use Bio::Index::Fasta;
> >> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> >> $inx->id_parser(\&get_id);
> >> $inx->make_index(shift);
> >>
> >> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> >> my $wanted_ids_file = shift;
> >> open(IDS, $wanted_ids_file);
> >> while (<IDS>) {
> >>    chomp;
> >>    my $seq = $inx->fetch($_);
> >>    $out->write_seq($seq);
> >> }
> >>
> >> sub get_id {
> >>    my $line = shift;
> >>    $line =~ /^>probe:\S+?:(\S+?):/;
> >>    $1;
> >> }
> >>
> >> It works for me on the sample sequence given by the OP.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From roy at colibase.bham.ac.uk  Mon Jun 12 11:46:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 12 Jun 2006 16:46:49 +0100
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <200606090935.12758.heikki@sanbi.ac.za>
References: <448850CE.1040105@colibase.bham.ac.uk>
	<200606090935.12758.heikki@sanbi.ac.za>
Message-ID: <448D8C69.4030005@colibase.bham.ac.uk>

Hi Heikki.

> Two questions come to mind:
> 
> 1. Can you parse your joint location using bioperl without errors?
Seems to work fine as far as I can tell (no errors, and to_FTstring 
reproduces the location as expected).

> 2. Is there a practical advantage in including a location which has no 
> relevance to the sequence in hand?
I think it would be misleading to imply that a location was complete 
when it is only a part of the originally annotated feature. From the FT 
definition the other possibility would be to include the missing parts 
of the feature as remote locations, I guess that may be more satisfactory.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From colin.erdman at du.edu  Mon Jun 12 15:52:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Mon, 12 Jun 2006 13:52:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
Message-ID: <1150141965.2992.17.camel@localhost.localdomain>

Hello all,

I am doing a project relating to some forensic analysis of mitochondrial
DNA. 

I would like to write a script that will take a reference sequence, in
this case the Anderson sequence which is the standard mitochondrial
sequence which sample sequences are compared to, and compare it to an
unknown sequence.

I have been using this script:

use Bio::SearchIO;
use strict;
my $fh;
my @nomatches;
open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p blastn |") || die $!;

my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);

if( my $result = $parser->next_result ) { 
     if( my $hit = $result->next_hit ) {   
     if( my $hsp = $hit->next_hsp ) { 
         my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
	 my ( @hitbases) = $hsp->hit_string;
	 my ( @querybases) = $hsp->query_string;
	 my $seq_string = join("", at querybases);
	 my $seq_string1 = join("", at hitbases);
         for my $base (  @qmismatches ) {
            print "base $base of the hit sequence is a mismatch: ";
	    print substr $seq_string, $base-1, 1;
	    print "->";
            print substr $seq_string1, $base-1, 1;
            print "\n";
        }
	
     }
     }
}


The problem is, that some mitochondrial sequences from individuals have
insertions, deletion etc, that cause them to be offset from the
reference sequence, this then offsets the numbering system.

To provide an example:

>Anderson Reference Sequence|HV2
ATTTGGT...
1234567

>Sample|HV2....
ATTTG|C|GT
12345,5.1,67

The |C| denote an insertion, and traditionally in the forensics community
this would be called position 5.1G, but the program reads it as position 6.

So basically I need to figure out how to modify a perl script in order to recognize 
that 5.1G is an insertion, and that it is not position 6, position 6 is actually 
the G to the right of it, followed by position 7-T.

Any ideas and suggestions would be greatly helpful, I know this could be very tricky,
or very easy - I just have come to the point where the idea flow has stopped and would 
love to gather some outside input.

Thanks
Colin Erdman
colin.erdman at du.edu
Undergraduate Research Associate
Institute For Forensic Genetic
University of Denver 


From jason at bioperl.org  Tue Jun 13 10:19:04 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 10:19:04 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
Message-ID: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>

The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors  
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"  
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason at bioperl.org  Tue Jun 13 11:45:27 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 11:45:27 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
	<B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <F802F582-28E4-4761-873C-2A49A60B3593@bioperl.org>

And just to say - codeml 3.15 parsing does work - yn00 parsing just  
hasn't been updated.   I agree that it is bad the test is failing but  
it is dependent on the version that is installed and we should put  
some sort of detect version-skip test code in there so it doesn't  
cause the tests to fail.  Just need more hands on deck tracking these  
sort of things....

-jason
On Jun 13, 2006, at 10:19 AM, Jason Stajich wrote:

> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start  
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
>
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
>
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
>
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
>
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>>
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>>
>> Is this a legitimate error or am I missing something?
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Tue Jun 13 12:04:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 12:04:46 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001001c68f03$17429070$e6028a0a@GOLHARMOBILE1>

I'll take a look at it and see what I can do.  While I'm at it,
bioperl-run tests a module called Coil, but I don't have that installed.
The documentation doesn't specify where I can get this application.
Does anyone know where Coil comes from?


-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Kevin.M.Brown at asu.edu  Tue Jun 13 13:42:40 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 13 Jun 2006 10:42:40 -0700
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?


From aaron.j.mackey at gsk.com  Tue Jun 13 08:19:11 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 13 Jun 2006 08:19:11 -0400
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <1150141965.2992.17.camel@localhost.localdomain>
Message-ID: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>

See Bio::LocatableSeq

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:

> Hello all,
> 
> I am doing a project relating to some forensic analysis of mitochondrial
> DNA. 
> 
> I would like to write a script that will take a reference sequence, in
> this case the Anderson sequence which is the standard mitochondrial
> sequence which sample sequences are compared to, and compare it to an
> unknown sequence.
> 
> I have been using this script:
> 
> use Bio::SearchIO;
> use strict;
> my $fh;
> my @nomatches;
> open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> blastn |") || die $!;
> 
> my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> 
> if( my $result = $parser->next_result ) { 
>      if( my $hit = $result->next_hit ) { 
>      if( my $hsp = $hit->next_hsp ) { 
>          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
>     my ( @hitbases) = $hsp->hit_string;
>     my ( @querybases) = $hsp->query_string;
>     my $seq_string = join("", at querybases);
>     my $seq_string1 = join("", at hitbases);
>          for my $base (  @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch: ";
>        print substr $seq_string, $base-1, 1;
>        print "->";
>             print substr $seq_string1, $base-1, 1;
>             print "\n";
>         }
> 
>      }
>      }
> }
> 
> 
> The problem is, that some mitochondrial sequences from individuals have
> insertions, deletion etc, that cause them to be offset from the
> reference sequence, this then offsets the numbering system.
> 
> To provide an example:
> 
> >Anderson Reference Sequence|HV2
> ATTTGGT...
> 1234567
> 
> >Sample|HV2....
> ATTTG|C|GT
> 12345,5.1,67
> 
> The |C| denote an insertion, and traditionally in the forensics 
community
> this would be called position 5.1G, but the program reads it as position 
6.
> 
> So basically I need to figure out how to modify a perl script in 
> order to recognize 
> that 5.1G is an insertion, and that it is not position 6, position 6
> is actually 
> the G to the right of it, followed by position 7-T.
> 
> Any ideas and suggestions would be greatly helpful, I know this 
> could be very tricky,
> or very easy - I just have come to the point where the idea flow has
> stopped and would 
> love to gather some outside input.
> 
> Thanks
> Colin Erdman
> colin.erdman at du.edu
> Undergraduate Research Associate
> Institute For Forensic Genetic
> University of Denver 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From colin.erdman at du.edu  Tue Jun 13 11:12:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 09:12:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150211566.7034.1.camel@localhost.localdomain>

I could see how this will help... but I am not sure how to implement it
in my situation, I am not very familiar with the Bio::Range or
Bio::Location modules...

Thanks very much,
Colin E.

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From colin.erdman at du.edu  Tue Jun 13 12:05:30 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 10:05:30 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150214730.12044.2.camel@localhost.localdomain>

I actually have found EMBOSS DiffSeq to work quite well for detecting
the insertions and SNPs in the "sample sequence" as compared to the
"reference sequence". 

If I get this all figured out and integrated I will post a method, I
imagine this would prove useful to others as well.

Thanks all,
Colin

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From golharam at umdnj.edu  Tue Jun 13 14:59:59 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:59:59 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <002301c68f1b$917b8c80$e6028a0a@GOLHARMOBILE1>

Nevermind - don't check it in yet.  There are still some other problems
not being picked up by the test suite.  I'll work on that and add to the
test suite.  Jason, I'll send you everything once I have it complete.


-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, June 13, 2006 2:34 PM
To: 'Jason Stajich'
Cc: 'bioperl-l at bioperl.org'
Subject: RE: [Bioperl-l] Test errors in bioperl-run


It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors 
> from make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix" on 
> an undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Jonathan_Epstein at nih.gov  Tue Jun 13 14:21:00 2006
From: Jonathan_Epstein at nih.gov (Jonathan_Epstein at nih.gov)
Date: Tue, 13 Jun 2006 14:21:00 -0400
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <0J0T001LE9O5M6@lswsmta04.nmcc.sprintspectrum.com>

sounds like a job for MUMMER (from Steven Salzberg's group).

Jonathan Epstein 

----------- 
Sent from my Treo

-----Original Message-----

From:  "Kevin Brown" <Kevin.M.Brown at asu.edu>
Subj:  [Bioperl-l] Blast or blat against custom db?
Date:  Tue Jun 13, 2006 2:17 pm
Size:  1K
To:  <bioperl-l at lists.open-bio.org>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

--- message truncated ---


From golharam at umdnj.edu  Tue Jun 13 14:34:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:34:00 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>

It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAML.pm
Type: application/octet-stream
Size: 43262 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060613/566881b4/attachment-0003.obj>

From cjfields at uiuc.edu  Tue Jun 13 21:41:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:41:45 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000601c68f53$b1e4b090$15327e82@pyrimidine>

I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Tue Jun 13 21:42:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:42:25 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000701c68f53$c9addcb0$15327e82@pyrimidine>

Sorry, Brian beat me to it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From osborne1 at optonline.net  Tue Jun 13 21:38:09 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 13 Jun 2006 21:38:09 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <C0B4E0C1.8D74%osborne1@optonline.net>

Checked in.


On 6/13/06 2:34 PM, "Ryan Golhar" <golharam at umdnj.edu> wrote:

> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>> 
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>> 
>> Is this a legitimate error or am I missing something?
>> 
>> Ryan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Tue Jun 13 21:55:49 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 21:55:49 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <000601c68f53$b1e4b090$15327e82@pyrimidine>
Message-ID: <000101c68f55$a9fa8ec0$2f01a8c0@GOLHARMOBILE1>

Okay, that's fine.  It does pass the bioperl-live tests.  When I ran the
bp_pairwise_kaks script, it didn't work, the script doesn't work with
3.15.  It looks like the current test suite is not exhaustive.  

When I looked into the code more so, I see that codeml 3.15 generates
some files slightly different than 3.14 which needs to be accounted for.
I'll work on that and post it here...shouldn't be too long.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Tuesday, June 13, 2006 9:42 PM
To: golharam at umdnj.edu; 'Jason Stajich'
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and 
> the comment sections have been changed slightly.  I've modified 
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from 
> YN00. I've attached it to this message.  It passs all the PAML tests 
> from bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  
> Can you (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code 
> as the output has changed substantially as Yang is now provided 
> several different method's simple Ka and Ks calculations.  Downgrade 
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start 
> parsing for the Pairwise data as well as the function 
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the 
> software packages so I am hopeful that other developers that use our 
> software as do molecular evolutionary studies will get involved to 
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week 
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors 
> > from make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix" on

> > an undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Tue Jun 13 21:10:04 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 14 Jun 2006 09:10:04 +0800
Subject: [Bioperl-l] SimpleAlign /Bio::AlignIO; POD code doesn't work for me
Message-ID: <5b6410e0606131810k495d8f55mc6dc73f0cd5a6df5@mail.gmail.com>

>
> Hi,
>
> Two queries with respect to SimpleAlign. I am using the following code
> based on the POD.
>
> my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
> my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
> print $out $_ while <$in>;
>
> 1) is it possible to set set_displayname_flat() globally without doing
> $_->set_displayname_flat() per alignment.
>
> 2) My input files have an ID and description line for each seq in the
> alignment. When the file is converted I loose the description line. I
> know I can get the description of the sequences (e.g.
> $aln->get_seq_by_pos(2)->description()).
> How could I export the complete fasta defline including the
> description (I realize that general clustal format has a limit on the
> number of characters, but still).
>
> Regards,
> Bernd
> _______________________________________________
>
I might be totally wrong here but what I understand about the FASTA format
is that the first word  (ie no spaces) is the only true name of the seq. So
anything other than the first word is discarded. putting underscores for me
works.

on a sidenote does ur 3rd line work?
it doesn't on my 1.5rc1
I had to add the bold line which was missing in the POD doc.
dont' think it was the use strict pragma
    open MYIN,"<$file" or die "Can't open input alignment";
    open MYOUT, ">$file2" or die "can't write to output";
    my $in  = Bio::AlignIO->newFh(-fh     => \*MYIN,
                               -format => 'fasta');
    my $out = Bio::AlignIO->newFh(-fh     =>  \*MYOUT,
                               -format => 'clustalw');
    print $out $_ while <$in>;

Cheers
kevin


From sb at mrc-dunn.cam.ac.uk  Wed Jun 14 03:49:10 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 14 Jun 2006 08:49:10 +0100
Subject: [Bioperl-l] Blast or blat against custom db?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
Message-ID: <448FBF76.1090505@mrc-dunn.cam.ac.uk>

Kevin Brown wrote:
[snip]
> So, my thought was to utilize something like blastall from within the
> script to find other oligos of similar match, but it means that I need
> to dump out the oligos designed, create the db with formatdb. [snip]
> I'm just trying to figure out how to do it all without leaving the
> script, but as yet haven't noticed a way to create a db from within perl
> using bioperl?
> 
> Any thoughts on directions I should look?

AFAIK there's no bioperl interface onto formatdb, but the way to do it 
is make a fasta file (perhaps using bioperl) with all the oligos (what 
you want to become the db), then use a perl system call (or similar) to 
run formatdb. Still in the same script you'd then run and analyse the 
blast with bioperl calls (presumably starting with StandAloneBlast - 
http://bioperl.org/wiki/HOWTO:Beginners#BLAST if you need it).

Just be sure to carefully craft your blast parameters so they're 
suitable for oligo-sized matches and test the 3' base of hits are identical.


From MEC at stowers-institute.org  Wed Jun 14 09:47:59 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 14 Jun 2006 08:47:59 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>

 
Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT; 

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu] 
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; 
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs 
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret 
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the 
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for 
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175 
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 
>6/13/2006
>
>


From oldham at ucla.edu  Tue Jun 13 22:03:04 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Tue, 13 Jun 2006 19:03:04 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDOEOLCJAA.oldham@ucla.edu>

Dear Malcolm, Chris, et al,

Thanks to everyone for your helpful suggestions.  When I run the code
below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
output file is still blank.  If I replace this list with a single ID
("542_at"), it works:

>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
GCGCAGCAGCGAGAATTTCGACGAG
>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
GAATTTCGACGAGCTGCTGAAGGCA
>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
CGACGAGCTGCTGAAGGCACTGGGT
........etc.

If I try a list of two IDs ("542_at" and "31799_at"), only the last one
is present in the output:

>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; Antisense;
GTTCATCACAAATCTATTGTGCTTG
>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
Antisense;
GTCCACTAAATGTAGTAACGAAATG
>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
Antisense;
TCCACTAAATGTAGTAACGAAATGT
........etc.

The same thing seems to happen if I go to 3 IDs, or 4 IDs (only the last
ID is present in the output file).  At this point I have no idea why
this is happening, and I am not sure how to interpret Malcolm's comment:

oops,

s/matches on of/matches one of/
s/nothing that/noting that/

Any ideas?  Thanks again................!

Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
			print OUT scalar(<PROBES>);
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Monday, June 12, 2006 8:48 AM
To: Cook, Malcolm; Chris Fields; Michael Oldham
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


oops,

s/matches on of/matches one of/
s/nothing that/noting that/

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>the regex matches anything)?  If there is nothing printed
>then either
>>the regex isn't working as expected or there is something logically
>>wrong.  The problem may be that the captured string must
>match the id
>>exactly, the id being the key to the %ID hash; any extra characters
>>picked up by the regex outside of your id key and you will not get
>>anything.  Looking at Malcolm's regex it should work just fine, but
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the
>>matched description lines to the outfile (no sequence) even if there
>>is a match?  Or is this what you wanted?   If you want the sequence
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 6/11/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006


From s_maheshwari84 at rediffmail.com  Thu Jun 15 07:42:24 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 Jun 2006 11:42:24 -0000
Subject: [Bioperl-l] simple problem plz look
Message-ID: <20060615114224.21669.qmail@webmail31.rediffmail.com>

I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI


From rkulasekaran at accelrys.com  Thu Jun 15 08:06:30 2006
From: rkulasekaran at accelrys.com (rkulasekaran at accelrys.com)
Date: Thu, 15 Jun 2006 17:36:30 +0530
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <OF88050CF5.C0508A24-ON6525718E.00425D40-6525718E.00428384@accelrys.com>

Hi,

Can you declare the array ( my @data ) before reading the index.

I guess that will work fine.

- Raja


"saurabh maheshwari" <s_maheshwari84 at rediffmail.com> 
Sent by: bioperl-l-bounces at lists.open-bio.org
15/06/2006 17:12
Please respond to
saurabh maheshwari <s_maheshwari84 at rediffmail.com>


To
bioperl-l at lists.open-bio.org
cc

Subject
[Bioperl-l] simple problem plz look


I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Click on the link below to report this email as spam
https://www.mailcontrol.com/sr/behF6u7j0vHYfoNqVfMn0T6lftsSPmT67PBEri3aA93L4mIZnnEsbOOgcm5LPEUItueIAtlw4aAQAjnhffjwxluskn5SCC6PU4sqvHqdy3UBLnb7IgqQIpogrs47CqHnPsig3hjMwg17c5A4zs49QdfwQIXZ3EkZGQpytOaqXTas8SlXA7tRyL!Oh9pq4bqQJsTF3icLnDHTJZLEigD5cPnlrScQD5EK 


From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 08:52:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 13:52:53 +0100
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
References: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <44915825.8040902@mrc-dunn.cam.ac.uk>

saurabh maheshwari wrote:
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>

I don't think general Perl problems are appropriate for this list.
Try subscribing to the beginners mailing list via http://learn.perl.org/

But in any case, say:
my @data;
$data[0][0] = 'P_p';


From cjfields at uiuc.edu  Thu Jun 15 11:18:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 10:18:32 -0500
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <002001c6908e$f8b11b30$15327e82@pyrimidine>

And exactly how is this applicable to BioPerl?

Start here:

http://learn.perl.org/

My guess: you need to declare 'my @data;' first.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Thursday, June 15, 2006 6:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] simple problem plz look
> 
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>
> 
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 13:42:52 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 10:42:52 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
Message-ID: <44919C1C.1060901@email.arizona.edu>

We are unable to parse BLAST 2.2.14 results from the NCBI website using 
SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
bioperl-live, but when users download either plain text or HTML blast 
outputs from the NCBI page, SearchIO cannot parse them.  This used to 
work prior to BLAST 2.2.14.  Should I try installing the entire 
bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
that makes any difference.)

Thanks,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597


From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 15:00:38 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 20:00:38 +0100
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
	<44919C1C.1060901@email.arizona.edu>
Message-ID: <4491AE56.6090505@mrc-dunn.cam.ac.uk>

Susan J. Miller wrote:
> We are unable to parse BLAST 2.2.14 results from the NCBI website using 
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
> bioperl-live, but when users download either plain text or HTML blast 
> outputs from the NCBI page, SearchIO cannot parse them.  This used to 
> work prior to BLAST 2.2.14.  Should I try installing the entire 
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
> that makes any difference.)

Parsing saved results from the website works fine here. Please be more 
specific in what you mean by 'unable to parse'. What error messages do 
you get? What exact code did you use to get those errors? Exactly what 
input data did you use? Exactly how did you generate that data?

Cheers,
Sendu.


From cjfields at uiuc.edu  Thu Jun 15 17:06:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 16:06:13 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
Message-ID: <002701c690bf$8b732410$15327e82@pyrimidine>

Bio::SearchIO can't handle HTML output directly; you have to junk the tags
first, and we can't really guarantee anymore that will work either (I
haven't tried it).  The FAQ tells you how:

http://www.bioperl.org/wiki/FAQ

I would avoid HTML parsing altogether.  The only sure-fire method that will
always work, according to NCBI, is XML output, and that's parsable using
Bio::SearchIO::blastxml.  You can also try tabular format, which
Bio::SearchIO::blasttable can parse as well.

However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
as well using BLASTP (and that's still set up to parse text output using
SearchIO I believe).  Could you give us an example of the type of BLAST you
were running, the sequence you used, and the error you had?  It could be
program-specific output that may be causing the problems.  The last time
text parsing broke it was changes specifically to only BLASTN/TBLASTX output
or something along those lines.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Susan J. Miller
> Sent: Thursday, June 15, 2006 12:43 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
> 
> We are unable to parse BLAST 2.2.14 results from the NCBI website using
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in
> bioperl-live, but when users download either plain text or HTML blast
> outputs from the NCBI page, SearchIO cannot parse them.  This used to
> work prior to BLAST 2.2.14.  Should I try installing the entire
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if
> that makes any difference.)
> 
> Thanks,
> -susan
> 
> Susan J. Miller
> Biotechnology Computing Facility
> Arizona Research Laboratories
> Bio West 228
> University of Arizona
> Tucson, AZ  85721
> (520) 626-2597
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 17:43:59 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 14:43:59 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <002701c690bf$8b732410$15327e82@pyrimidine>
References: <002701c690bf$8b732410$15327e82@pyrimidine>
Message-ID: <4491D49F.4030208@email.arizona.edu>

Chris Fields wrote:
> 
> However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
> to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
> as well using BLASTP (and that's still set up to parse text output using
> SearchIO I believe).  Could you give us an example of the type of BLAST you
> were running, the sequence you used, and the error you had?  It could be
> program-specific output that may be causing the problems.  The last time
> text parsing broke it was changes specifically to only BLASTN/TBLASTX output
> or something along those lines.

Hi Chris and Sendu,

Thanks for your replies.  I am using blastp from the NCBI BLAST page, 
with this input sequence:

MSRLLWRKVAGATVGPGPVPAPGRWVSSSVPASDPSDGQRRRQQQQQQQQQQQQQPQQPQVLSSEGGQLR
HNPLDIQMLSRGLHEQIFGQGGEMPGEAAVRRSVEHLQKHGLWGQPAVPLPDVELRLPPLYGDNLDQHFR
LLAQKQSLPYLEAANLLLQAQLPPKPPAWAWAEGWTRYGPEGEAVPVAIPEERALVFDVEVCLAEGTCPT
LAVAISPSAWYSWCSQRLVEERYSWTSQLSPADLIPLEVPTGASSPTQRDWQEQLVVGHNVSFDRAHIRE
QYLIQGSRMRFLDTMSMHMAISGLSSFQRSLWIAAKQGKHKVQPPTKQGQKSQRKARRGPAISSWDWLDI

I have tried saving HTML (with and without the graphical overview), 
plain text, and XML.  I am parsing with this script:

#!/usr/local/bin/perl -w

use Bio::SearchIO;

while ($fil = shift(@ARGV)) {

   $srchio = new Bio::SearchIO ('-format' => 'blast', '-file' => $fil);
   while ($result = $srchio->next_result) {

         $db = $result->database_name;
         $alg = $result->algorithm;
         print "DB $db\n ALG $alg\n";

         $qid = $result->query_name;
         print "QRY $qid\n";

         while ($hit = $result->next_hit) {

           $hitnam = $hit->name;
           print "\t$hitnam\n";

           $nhsp = 0;
           while ($hit->next_hsp) {
                 $nhsp++;
           }
           print "\tHSPS: $nhsp\n";
         } # end next_hit
   }
}

Interestingly, the results are different (but never correct) for the 
different types of output I've tried.  For xml, the script runs but 
produces no output, for plain text the script hangs with no output, and 
for html, I get these errors:


-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|27502689|gb|AAH42571.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 308.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 308.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|21779923|gb|AAM77583.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 333.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 333.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|1644239|dbj|BAA12223.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 358.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 358.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline Positives = 270/273 (98%), Gaps = 0/273 (0%) 
Query 78
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/blast.pm:1172
STACK: ./srchio.pl:8


At this point I should probably try installing all of bioperl-live, or 
at least get IteratedSearchResultEventBuilder.pm - or would you 
recommend something else?  Let me know if you need more info.

Thanks again,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597


From cjfields at uiuc.edu  Thu Jun 15 19:03:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 18:03:37 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <4491D49F.4030208@email.arizona.edu>
Message-ID: <002b01c690cf$efa05510$15327e82@pyrimidine>

...

> Hi Chris and Sendu,
> 
> Thanks for your replies.  I am using blastp from the NCBI BLAST page,
> with this input sequence:

...

> I have tried saving HTML (with and without the graphical overview),
> plain text, and XML.  I am parsing with this script:


> #!/usr/local/bin/perl -w
> 
> use Bio::SearchIO;
> ...
> }

I got this script to work.  I used your sequence and retrieved BLASTP text
output from NCBI BLASTP 2.2.14, then saved it from the web browser, and just
copied it to three separate files.  Using those files as input, they all
parse fine, with output like this:

DB All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding
environmental samples
 ALG BLASTP
QRY
        gi|27502689|gb|AAH42571.1|
        HSPS: 1
        gi|21779923|gb|AAM77583.1|
        HSPS: 1
...

> Interestingly, the results are different (but never correct) for the
> different types of output I've tried.  For xml, the script runs but
> produces no output, for plain text the script hangs with no output, and
> for html, I get these errors:

What's interesting is that HTML did anything at all.  You MUST strip out the
HTML tags as per the FAQ, which I pointed out before:

http://www.bioperl.org/wiki/FAQ

See the question : Does Bio::SearchIO parse the HTML output that BLAST
creates using the -T option?

Again, I would NOT attempt parsing HTML.  The only reason we have a FAQ
question about it is b/c it popped up on the list many many times in the
past (i.e. it is a FAQ) and someone found out that HTML::Strip works.  We
will never adequately support it beyond suggesting stripping the tags out.
NCBI changes their HTML output more often than their text output.

If you tried parsing XML with the format set to 'blast' you'll get nothing
(the blast text parser looks for text output using regexes, so it just
bypasses all the XML tags).  You must set:

-format => 'blastxml' 

You'll also need to install XML::SAX, and I would suggest installing
XML::SAX::ExpatXS and the Expat XML parser for your system to speed things
up.

The 'hanging' you mention using text parsing sounds like the old bug where
it got caught in an infinite loop.  I don't have this problem.  It could be
a couple of things:

1) You have an old version of bioperl and updated Bio::SearchIO, but you
haven't updated Bio::SearchIO::blast. That's the plugin module where the
error was (not Bio::SearchIO).  Try updating either that or install the
entire distribution from scratch.

2) You have two versions of Bioperl installed (an old one and bioperl-live)
and perl is using the old version of bioperl (and the old version of
SearchIO::blast).  Make sure you only have one version installed and that it
is bioperl-live.

> At this point I should probably try installing all of bioperl-live, or
> at least get IteratedSearchResultEventBuilder.pm - or would you
> recommend something else?  Let me know if you need more info.

If you have the entire distribution installed, you should have ISREB anyway.
ISREB (IteratedSearchResultEventBuilder) has nothing to do with the problems
here, though.

Chris

> Thanks again,
> -susan


From cain at cshl.edu  Thu Jun 15 11:25:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 15 Jun 2006 11:25:54 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
Message-ID: <1150385154.2622.152.camel@localhost.localdomain>

Hi all,

When running make test on a fairly new system, I got the following
failure:

t/Compatible.................No Set::Scalar. Unable to test Bio::Tree::Compatible
Can't locate Set/Scalar.pm in @INC
....
BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Tree/Compatible.pm line 138.
Compilation failed in require at t/Compatible.t line 42.
BEGIN failed--compilation aborted at t/Compatible.t line 42.
t/Compatible.................dubious                                         
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Set::Scalar is mentioned in Makefile.PL as an optional package (but not
required) and isn't mentioned in the INSTALL doc anywhere.  It looks
like the author of the test (t/Compatible.t) is trying to skip this test
if Set::Scalar isn't found, but the 'dubious' result gets marked
ultimately as a failure.

What is the right thing to do here?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/8cb53ee4/attachment-0003.bin>

From hlapp at gmx.net  Fri Jun 16 00:42:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 16 Jun 2006 00:42:25 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
In-Reply-To: <1150385154.2622.152.camel@localhost.localdomain>
References: <1150385154.2622.152.camel@localhost.localdomain>
Message-ID: <D4E96C47-977E-474C-B093-82CDE775F6C1@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Should be fixed on the main trunk. -hilmar

On Jun 15, 2006, at 11:25 AM, Scott Cain wrote:

> Hi all,
>
> When running make test on a fairly new system, I got the following
> failure:
>
> t/Compatible.................No Set::Scalar. Unable to test  
> Bio::Tree::Compatible
> Can't locate Set/Scalar.pm in @INC
> ....
> BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl- 
> live/blib/lib/Bio/Tree/Compatible.pm line 138.
> Compilation failed in require at t/Compatible.t line 42.
> BEGIN failed--compilation aborted at t/Compatible.t line 42.
> t/Compatible.................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Set::Scalar is mentioned in Makefile.PL as an optional package (but  
> not
> required) and isn't mentioned in the INSTALL doc anywhere.  It looks
> like the author of the test (t/Compatible.t) is trying to skip this  
> test
> if Set::Scalar isn't found, but the 'dubious' result gets marked
> ultimately as a failure.
>
> What is the right thing to do here?
>
> Thanks,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFEkja5uV6N2JxL7qsRAjqCAJ9RTgPntJ+dmGHeiovS5FeG3QvZagCeMzmw
sKkizbLUYAsyJqVw/2SplcQ=
=ehd6
-----END PGP SIGNATURE-----


From rmb32 at cornell.edu  Thu Jun 15 21:37:03 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 15 Jun 2006 18:37:03 -0700
Subject: [Bioperl-l] reading and writing GFF3
Message-ID: <44920B3F.90405@cornell.edu>

There is stuff in bioperl for reading and writing GFF3.  There's 
Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
is the 'best' one to use?

Neither of these is working very well for me.

My proximate use case is reading in a RepeatMasker report with 
Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
writing those out to a GFF3 file.

Bio::Tools::GFF will take these things and write out something that 
closely resembles GFF3, but with Target attributes that don't seem to 
comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
commas instead of spaces.  I'm attaching a little script that 
illustrates this.

Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
features contained in them, throwing 'only Bio::SeqFeature::Annotated 
objects are writeable'.  This seems a bit silly, since one of the whole 
points of Bioperl is using polymorphism to make it easy to connect 
things together.  I've attached a little script to illustrate this one too.

So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
deprecated?  Why does Bio::FeatureIO::gff only accept 
Bio::SeqFeature::Annotated objects?

Thanks in advance.

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_featureio_gff_test.pl
Type: application/x-perl
Size: 1455 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_tools_gff_test.pl
Type: application/x-perl
Size: 1436 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment-0007.bin>

From cain at cshl.edu  Fri Jun 16 10:18:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 10:18:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <44920B3F.90405@cornell.edu>
References: <44920B3F.90405@cornell.edu>
Message-ID: <1150467493.2622.209.camel@localhost.localdomain>

Hi Rob,

I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
but that is actually a good thing.  The tighter constraints results in a
better, more consistent file format.

The reason only BSF::Annotated features are writable is that there needs
to be tight control on the 'type' of the feature, to insure that the
type is part of the Sequence Ontology.  It also makes it much easier to
properly write out the attributes in the ninth column, particularly the
ones that are 'reserved', like Parent, Dbxref, and Ontology_term.

BTG is still usable, but the GFF3 it puts out is actually more
'GFF3-like'; that is, it looks like GFF3, but because there are no
constraints on the type and the terms that are used in the ninth column,
you have to be very careful using it to produce GFF3, by making sure
that your feature objects conform to the standard before BTG tries to
write them out.  (Of course, one way to do that would be to convert your
feature objects to BSF::Annotated objects, but then you could use
BFIO::gff :-)

[Long pause while scott goes and monkeys with Bio::Tools::GFF]

OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
this is completely valid.  (I even fixed the escaping the of the stray
'=' in 'hind_R=2046'.)  The output I get is this:

##gff-version 3
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120

Scott


On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> There is stuff in bioperl for reading and writing GFF3.  There's 
> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
> is the 'best' one to use?
> 
> Neither of these is working very well for me.
> 
> My proximate use case is reading in a RepeatMasker report with 
> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
> writing those out to a GFF3 file.
> 
> Bio::Tools::GFF will take these things and write out something that 
> closely resembles GFF3, but with Target attributes that don't seem to 
> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
> commas instead of spaces.  I'm attaching a little script that 
> illustrates this.
> 
> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
> objects are writeable'.  This seems a bit silly, since one of the whole 
> points of Bioperl is using polymorphism to make it easy to connect 
> things together.  I've attached a little script to illustrate this one too.
> 
> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
> deprecated?  Why does Bio::FeatureIO::gff only accept 
> Bio::SeqFeature::Annotated objects?
> 
> Thanks in advance.
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/572bd98b/attachment-0003.bin>

From rmb32 at cornell.edu  Fri Jun 16 14:36:22 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 11:36:22 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
References: <44920B3F.90405@cornell.edu>
	<1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <4492FA26.6030909@cornell.edu>

Thanks for the reply Scott.  It's good that the BSF::Annotated features 
control the type to be in the SO.  I sort of came to the "BTG is only 
gff3-/like/" conclusion myself as I poked around in the two modules in 
question, so I'd much rather use BSF::gff.  So I guess the question now 
is (and this will probably be a pretty common use case) how does one 
take an "old" Bio::SeqFeature::Generic or the like object and make it 
into a Bio::SeqFeature::Annotated?


Rob

Scott Cain wrote:
> Hi Rob,
>
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
>
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
>
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
>
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120
>
> Scott
>
>
>
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>   
>> There is stuff in bioperl for reading and writing GFF3.  There's 
>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
>> is the 'best' one to use?
>>
>> Neither of these is working very well for me.
>>
>> My proximate use case is reading in a RepeatMasker report with 
>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
>> writing those out to a GFF3 file.
>>
>> Bio::Tools::GFF will take these things and write out something that 
>> closely resembles GFF3, but with Target attributes that don't seem to 
>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
>> commas instead of spaces.  I'm attaching a little script that 
>> illustrates this.
>>
>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
>> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
>> objects are writeable'.  This seems a bit silly, since one of the whole 
>> points of Bioperl is using polymorphism to make it easy to connect 
>> things together.  I've attached a little script to illustrate this one too.
>>
>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
>> deprecated?  Why does Bio::FeatureIO::gff only accept 
>> Bio::SeqFeature::Annotated objects?
>>
>> Thanks in advance.
>>
>> Rob
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 16 15:12:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Jun 2006 14:12:28 -0500
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>

Scott, 

Looks like Robert also submitted a bug report related to this as well.
Could you check into it (pretty-please)?  I'm still GFF3-illiterate.

http://bugzilla.open-bio.org/show_bug.cgi?id=2025

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Scott Cain
> Sent: Friday, June 16, 2006 9:18 AM
> To: Robert Buels
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] reading and writing GFF3
> 
> Hi Rob,
> 
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
> 
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> 
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
> 
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> 
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
> 
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> 918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> 488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> 1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> 312     -       .       Target=hind_R%3D2046 59 120
> 
> Scott
> 
> 
> 
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > There is stuff in bioperl for reading and writing GFF3.  There's
> > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > is the 'best' one to use?
> >
> > Neither of these is working very well for me.
> >
> > My proximate use case is reading in a RepeatMasker report with
> > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > writing those out to a GFF3 file.
> >
> > Bio::Tools::GFF will take these things and write out something that
> > closely resembles GFF3, but with Target attributes that don't seem to
> > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > commas instead of spaces.  I'm attaching a little script that
> > illustrates this.
> >
> > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > objects are writeable'.  This seems a bit silly, since one of the whole
> > points of Bioperl is using polymorphism to make it easy to connect
> > things together.  I've attached a little script to illustrate this one
> too.
> >
> > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > deprecated?  Why does Bio::FeatureIO::gff only accept
> > Bio::SeqFeature::Annotated objects?
> >
> > Thanks in advance.
> >
> > Rob
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory


From rmb32 at cornell.edu  Fri Jun 16 15:30:23 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:30:23 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <449306CF.1030301@cornell.edu>

Woops, I should have said something about that.  I submitted it before I 
saw that Scott had already done the escaping in CVS.

Chris Fields wrote:
> Scott, 
>
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>> Sent: Friday, June 16, 2006 9:18 AM
>> To: Robert Buels
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>
>> Hi Rob,
>>
>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>> but that is actually a good thing.  The tighter constraints results in a
>> better, more consistent file format.
>>
>> The reason only BSF::Annotated features are writable is that there needs
>> to be tight control on the 'type' of the feature, to insure that the
>> type is part of the Sequence Ontology.  It also makes it much easier to
>> properly write out the attributes in the ninth column, particularly the
>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>
>> BTG is still usable, but the GFF3 it puts out is actually more
>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>> constraints on the type and the terms that are used in the ninth column,
>> you have to be very careful using it to produce GFF3, by making sure
>> that your feature objects conform to the standard before BTG tries to
>> write them out.  (Of course, one way to do that would be to convert your
>> feature objects to BSF::Annotated objects, but then you could use
>> BFIO::gff :-)
>>
>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>
>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>> this is completely valid.  (I even fixed the escaping the of the stray
>> '=' in 'hind_R=2046'.)  The output I get is this:
>>
>> ##gff-version 3
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>> 918     -       .       Target=Contig151 325 832
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>> 488     -       .       Target=Contig386 1 124
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>> 1718    +       .       Target=Contig358 1 311
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>> 312     -       .       Target=hind_R%3D2046 59 120
>>
>> Scott
>>
>>
>>
>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>     
>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>> is the 'best' one to use?
>>>
>>> Neither of these is working very well for me.
>>>
>>> My proximate use case is reading in a RepeatMasker report with
>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>> writing those out to a GFF3 file.
>>>
>>> Bio::Tools::GFF will take these things and write out something that
>>> closely resembles GFF3, but with Target attributes that don't seem to
>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>> commas instead of spaces.  I'm attaching a little script that
>>> illustrates this.
>>>
>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>> points of Bioperl is using polymorphism to make it easy to connect
>>> things together.  I've attached a little script to illustrate this one
>>>       
>> too.
>>     
>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>> Bio::SeqFeature::Annotated objects?
>>>
>>> Thanks in advance.
>>>
>>> Rob
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Fri Jun 16 15:34:16 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:34:16 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150486453.4412.30.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
Message-ID: <449307B8.5040802@cornell.edu>

So about that converting ye olde feature objects into 
Bio::SeqFeature::Annotated objects.  How do I do it?


Scott Cain wrote:
> That's OK--You added a few items that should be escaped that weren't, so
> I added those too.
>
> Thanks,
> Scott
>
>
> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>   
>> Woops, I should have said something about that.  I submitted it before
>> I saw that Scott had already done the escaping in CVS.
>>
>> Chris Fields wrote: 
>>     
>>> Scott, 
>>>
>>> Looks like Robert also submitted a bug report related to this as well.
>>> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>>>
>>> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>>>
>>> Chris
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>>>> Sent: Friday, June 16, 2006 9:18 AM
>>>> To: Robert Buels
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>>>
>>>> Hi Rob,
>>>>
>>>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>>>> but that is actually a good thing.  The tighter constraints results in a
>>>> better, more consistent file format.
>>>>
>>>> The reason only BSF::Annotated features are writable is that there needs
>>>> to be tight control on the 'type' of the feature, to insure that the
>>>> type is part of the Sequence Ontology.  It also makes it much easier to
>>>> properly write out the attributes in the ninth column, particularly the
>>>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>>>
>>>> BTG is still usable, but the GFF3 it puts out is actually more
>>>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>>>> constraints on the type and the terms that are used in the ninth column,
>>>> you have to be very careful using it to produce GFF3, by making sure
>>>> that your feature objects conform to the standard before BTG tries to
>>>> write them out.  (Of course, one way to do that would be to convert your
>>>> feature objects to BSF::Annotated objects, but then you could use
>>>> BFIO::gff :-)
>>>>
>>>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>>>
>>>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>>>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>>>> this is completely valid.  (I even fixed the escaping the of the stray
>>>> '=' in 'hind_R=2046'.)  The output I get is this:
>>>>
>>>> ##gff-version 3
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>>>> 918     -       .       Target=Contig151 325 832
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>>>> 488     -       .       Target=Contig386 1 124
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>>>> 1718    +       .       Target=Contig358 1 311
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>>>> 312     -       .       Target=hind_R%3D2046 59 120
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>>>     
>>>>         
>>>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>>>> is the 'best' one to use?
>>>>>
>>>>> Neither of these is working very well for me.
>>>>>
>>>>> My proximate use case is reading in a RepeatMasker report with
>>>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>>>> writing those out to a GFF3 file.
>>>>>
>>>>> Bio::Tools::GFF will take these things and write out something that
>>>>> closely resembles GFF3, but with Target attributes that don't seem to
>>>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>>>> commas instead of spaces.  I'm attaching a little script that
>>>>> illustrates this.
>>>>>
>>>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>>>> points of Bioperl is using polymorphism to make it easy to connect
>>>>> things together.  I've attached a little script to illustrate this one
>>>>>       
>>>>>           
>>>> too.
>>>>     
>>>>         
>>>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>>>> Bio::SeqFeature::Annotated objects?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Rob
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>     
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>   
>>>       
>> -- 
>> Robert Buels
>> SGN Bioinformatics Analyst
>> 252A Emerson Hall, Cornell University
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>>
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Fri Jun 16 15:28:52 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:28:52 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <1150486133.4412.25.camel@localhost.localdomain>

I tweaked the patch and applied it, and closed the bug.

Thanks for pointing it out--I doubt I would have noticed it in the
bioper-guts mailing, which I generally don't look too closely at :-o

Scott


On Fri, 2006-06-16 at 14:12 -0500, Chris Fields wrote:
> Scott, 
> 
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > Sent: Friday, June 16, 2006 9:18 AM
> > To: Robert Buels
> > Cc: bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] reading and writing GFF3
> > 
> > Hi Rob,
> > 
> > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > but that is actually a good thing.  The tighter constraints results in a
> > better, more consistent file format.
> > 
> > The reason only BSF::Annotated features are writable is that there needs
> > to be tight control on the 'type' of the feature, to insure that the
> > type is part of the Sequence Ontology.  It also makes it much easier to
> > properly write out the attributes in the ninth column, particularly the
> > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > 
> > BTG is still usable, but the GFF3 it puts out is actually more
> > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > constraints on the type and the terms that are used in the ninth column,
> > you have to be very careful using it to produce GFF3, by making sure
> > that your feature objects conform to the standard before BTG tries to
> > write them out.  (Of course, one way to do that would be to convert your
> > feature objects to BSF::Annotated objects, but then you could use
> > BFIO::gff :-)
> > 
> > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > 
> > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > this is completely valid.  (I even fixed the escaping the of the stray
> > '=' in 'hind_R=2046'.)  The output I get is this:
> > 
> > ##gff-version 3
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > 918     -       .       Target=Contig151 325 832
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > 488     -       .       Target=Contig386 1 124
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > 1718    +       .       Target=Contig358 1 311
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > 312     -       .       Target=hind_R%3D2046 59 120
> > 
> > Scott
> > 
> > 
> > 
> > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > is the 'best' one to use?
> > >
> > > Neither of these is working very well for me.
> > >
> > > My proximate use case is reading in a RepeatMasker report with
> > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > writing those out to a GFF3 file.
> > >
> > > Bio::Tools::GFF will take these things and write out something that
> > > closely resembles GFF3, but with Target attributes that don't seem to
> > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > commas instead of spaces.  I'm attaching a little script that
> > > illustrates this.
> > >
> > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > points of Bioperl is using polymorphism to make it easy to connect
> > > things together.  I've attached a little script to illustrate this one
> > too.
> > >
> > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > Bio::SeqFeature::Annotated objects?
> > >
> > > Thanks in advance.
> > >
> > > Rob
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/912257e8/attachment-0003.bin>

From cain at cshl.edu  Fri Jun 16 15:34:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:34:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449306CF.1030301@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
Message-ID: <1150486453.4412.30.camel@localhost.localdomain>

That's OK--You added a few items that should be escaped that weren't, so
I added those too.

Thanks,
Scott


On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> Woops, I should have said something about that.  I submitted it before
> I saw that Scott had already done the escaping in CVS.
> 
> Chris Fields wrote: 
> > Scott, 
> > 
> > Looks like Robert also submitted a bug report related to this as well.
> > Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> > 
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> > 
> > Chris
> > 
> >   
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > > Sent: Friday, June 16, 2006 9:18 AM
> > > To: Robert Buels
> > > Cc: bioperl-l at bioperl.org
> > > Subject: Re: [Bioperl-l] reading and writing GFF3
> > > 
> > > Hi Rob,
> > > 
> > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > > but that is actually a good thing.  The tighter constraints results in a
> > > better, more consistent file format.
> > > 
> > > The reason only BSF::Annotated features are writable is that there needs
> > > to be tight control on the 'type' of the feature, to insure that the
> > > type is part of the Sequence Ontology.  It also makes it much easier to
> > > properly write out the attributes in the ninth column, particularly the
> > > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > > 
> > > BTG is still usable, but the GFF3 it puts out is actually more
> > > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > > constraints on the type and the terms that are used in the ninth column,
> > > you have to be very careful using it to produce GFF3, by making sure
> > > that your feature objects conform to the standard before BTG tries to
> > > write them out.  (Of course, one way to do that would be to convert your
> > > feature objects to BSF::Annotated objects, but then you could use
> > > BFIO::gff :-)
> > > 
> > > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > > 
> > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > > this is completely valid.  (I even fixed the escaping the of the stray
> > > '=' in 'hind_R=2046'.)  The output I get is this:
> > > 
> > > ##gff-version 3
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > > 918     -       .       Target=Contig151 325 832
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > > 488     -       .       Target=Contig386 1 124
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > > 1718    +       .       Target=Contig358 1 311
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > > 312     -       .       Target=hind_R%3D2046 59 120
> > > 
> > > Scott
> > > 
> > > 
> > > 
> > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > >     
> > > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > > is the 'best' one to use?
> > > > 
> > > > Neither of these is working very well for me.
> > > > 
> > > > My proximate use case is reading in a RepeatMasker report with
> > > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > > writing those out to a GFF3 file.
> > > > 
> > > > Bio::Tools::GFF will take these things and write out something that
> > > > closely resembles GFF3, but with Target attributes that don't seem to
> > > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > > commas instead of spaces.  I'm attaching a little script that
> > > > illustrates this.
> > > > 
> > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > > points of Bioperl is using polymorphism to make it easy to connect
> > > > things together.  I've attached a little script to illustrate this one
> > > >       
> > > too.
> > >     
> > > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > > Bio::SeqFeature::Annotated objects?
> > > > 
> > > > Thanks in advance.
> > > > 
> > > > Rob
> > > > 
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >       
> > > --
> > > ------------------------------------------------------------------------
> > > Scott Cain, Ph. D.                                         cain at cshl.edu
> > > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > > Cold Spring Harbor Laboratory
> > >     
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >   
> 
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/3dfde2ea/attachment-0003.bin>

From cain at cshl.edu  Fri Jun 16 15:55:31 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:55:31 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449307B8.5040802@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
Message-ID: <1150487731.4412.35.camel@localhost.localdomain>

Um, yeah, good question.  The reason I didn't answer you when you wrote
before is that I was hoping for divine inspiration for an answer (or for
somebody else to answer, which would have been really great :-)

The short answer (and easy one for me to type) is that you will probably
need an ad hoc method to do it, which is the same thing I do when I need
to convert gff2 to gff3, to make sure the things I need mapped get
mapped the 'right' way (that is, the way I want them to go).  I don't
have any sample code that does this, but if you want to start working up
an ad hoc method, I will certainly try to help you as much as I can.

Scott


On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> So about that converting ye olde feature objects into 
> Bio::SeqFeature::Annotated objects.  How do I do it?
> 
> 
> Scott Cain wrote:
> > That's OK--You added a few items that should be escaped that weren't, so
> > I added those too.
> >
> > Thanks,
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >   
> >> Woops, I should have said something about that.  I submitted it before
> >> I saw that Scott had already done the escaping in CVS.
> >>
> >> Chris Fields wrote: 
> >>     
> >>> Scott, 
> >>>
> >>> Looks like Robert also submitted a bug report related to this as well

From rmb32 at cornell.edu  Fri Jun 16 16:31:08 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 13:31:08 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150487731.4412.35.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
Message-ID: <4493150C.1080909@cornell.edu>

Rather than cobble together some ad-hoc solution, I would be interested 
in working on a good solution to this problem, because it seems like 
it's just going to get more common as more people start wanting to write 
GFF3.  What about some code in whatever customarily makes these objects 
(probably BSF::Annotated's new() method?) that could take another type 
of Feature object and attempt to shoehorn its data into a new 
BSF::Annotated?  If it failed (because the type isn't in SO or 
whatever), it could throw() some informative error message.

Then, people could write straightforward code something like:

while(my $oldstylefeature = $features_in->next_feature) {
    $oldstylefeature->primary_tag('something_that_is_in_so');
    $oldstylefeature->something_else('some other something that needs to 
be changed for compliance');
    my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
    $gff3_out->write_feature($newfeature);
}

Does that sound like a good idea?  I'd be more than willing to implement 
this, since I'm going to need to do this sort of thing with many more 
things than just RepeatMasker.

Rob

Scott Cain wrote:
> Um, yeah, good question.  The reason I didn't answer you when you wrote
> before is that I was hoping for divine inspiration for an answer (or for
> somebody else to answer, which would have been really great :-)
>
> The short answer (and easy one for me to type) is that you will probably
> need an ad hoc method to do it, which is the same thing I do when I need
> to convert gff2 to gff3, to make sure the things I need mapped get
> mapped the 'right' way (that is, the way I want them to go).  I don't
> have any sample code that does this, but if you want to start working up
> an ad hoc method, I will certainly try to help you as much as I can.
>
> Scott
>
>
> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>   
>> So about that converting ye olde feature objects into 
>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>
>>
>> Scott Cain wrote:
>>     
>>> That's OK--You added a few items that should be escaped that weren't, so
>>> I added those too.
>>>
>>> Thanks,
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> Woops, I should have said something about that.  I submitted it before
>>>> I saw that Scott had already done the escaping in CVS.
>>>>
>>>> Chris Fields wrote: 
>>>>     
>>>>         
>>>>> Scott, 
>>>>>
>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Sat Jun 17 06:36:59 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 03:36:59 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>	
	<1150486453.4412.30.camel@localhost.localdomain>	
	<449307B8.5040802@cornell.edu>	
	<1150487731.4412.35.camel@localhost.localdomain>	
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <4493DB4B.4020509@cornell.edu>

Yep.  I'm almost finished with the first draft of a function that does 
this.  I'll polish it up over the weekend then on Monday I'll submit a 
bugzilla bug and patch with it so you can take a look.

Rob

Scott Cain wrote:
> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>   
>> Rather than cobble together some ad-hoc solution, I would be interested 
>> in working on a good solution to this problem, because it seems like 
>> it's just going to get more common as more people start wanting to write 
>> GFF3.  What about some code in whatever customarily makes these objects 
>> (probably BSF::Annotated's new() method?) that could take another type 
>> of Feature object and attempt to shoehorn its data into a new 
>> BSF::Annotated?  If it failed (because the type isn't in SO or 
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that needs to 
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to implement 
>> this, since I'm going to need to do this sort of thing with many more 
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>     
>>> Um, yeah, good question.  The reason I didn't answer you when you wrote
>>> before is that I was hoping for divine inspiration for an answer (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will probably
>>> need an ad hoc method to do it, which is the same thing I do when I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>> have any sample code that does this, but if you want to start working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> So about that converting ye olde feature objects into 
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>     
>>>>         
>>>>> That's OK--You added a few items that should be escaped that weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Woops, I should have said something about that.  I submitted it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote: 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Scott, 
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>               

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Fri Jun 16 23:56:44 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 23:56:44 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <4493150C.1080909@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
Message-ID: <1150516605.2600.9.camel@localhost.localdomain>

Rob,

I came to the same conclusion as well; I wrote my response as I was
heading out the door and while I was running errands, I realized the
right thing to do is to write a Bio::SeqFeature::Annotated method called
new_from_object, whose usage would be:

  my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);

where you would give it a Bio::SeqFeatureI compliant object and try to
create a BSFA like use suggested below.  You could allow passing in args
to control how different things are handled, like mapping non-SO types
to SO types.  I'll think about this over the weekend and let you know if
brilliance strikes me.

Scott


On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> Rather than cobble together some ad-hoc solution, I would be interested 
> in working on a good solution to this problem, because it seems like 
> it's just going to get more common as more people start wanting to write 
> GFF3.  What about some code in whatever customarily makes these objects 
> (probably BSF::Annotated's new() method?) that could take another type 
> of Feature object and attempt to shoehorn its data into a new 
> BSF::Annotated?  If it failed (because the type isn't in SO or 
> whatever), it could throw() some informative error message.
> 
> Then, people could write straightforward code something like:
> 
> while(my $oldstylefeature = $features_in->next_feature) {
>     $oldstylefeature->primary_tag('something_that_is_in_so');
>     $oldstylefeature->something_else('some other something that needs to 
> be changed for compliance');
>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>     $gff3_out->write_feature($newfeature);
> }
> 
> Does that sound like a good idea?  I'd be more than willing to implement 
> this, since I'm going to need to do this sort of thing with many more 
> things than just RepeatMasker.
> 
> Rob
> 
> Scott Cain wrote:
> > Um, yeah, good question.  The reason I didn't answer you when you wrote
> > before is that I was hoping for divine inspiration for an answer (or for
> > somebody else to answer, which would have been really great :-)
> >
> > The short answer (and easy one for me to type) is that you will probably
> > need an ad hoc method to do it, which is the same thing I do when I need
> > to convert gff2 to gff3, to make sure the things I need mapped get
> > mapped the 'right' way (that is, the way I want them to go).  I don't
> > have any sample code that does this, but if you want to start working up
> > an ad hoc method, I will certainly try to help you as much as I can.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >   
> >> So about that converting ye olde feature objects into 
> >> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>
> >>
> >> Scott Cain wrote:
> >>     
> >>> That's OK--You added a few items that should be escaped that weren't, so
> >>> I added those too.
> >>>
> >>> Thanks,
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>   
> >>>       
> >>>> Woops, I should have said something about that.  I submitted it before
> >>>> I saw that Scott had already done the escaping in CVS.
> >>>>
> >>>> Chris Fields wrote: 
> >>>>     
> >>>>         
> >>>>> Scott, 
> >>>>>
> >>>>> Looks like Robert also submitted a bug report related to this as well=
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/7ff49e0d/attachment-0003.bin>

From hlapp at gmx.net  Sat Jun 17 12:20:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 17 Jun 2006 12:20:08 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You don't need a new method for this. Instead, support a -feature  
argument.

	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);

This should work for any instance of Bio::SeqFeatureI. If it is a  
B::SF::Annotated already it is obviously just a deep copy (if copy is  
desired - could be another parameter). Otherwise more will be involved.

Alternatively, and possibly better, is to write a specialized  
SeqFeatureI factory (that would implement  
Bio::Factory::ObjectFactoryI) and then delegate this job to it:

	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
		-type_ontology => $sequence_ontology,
		-source_ontology => $feature_source_ontology,
		-unflatten => 1);
	my $bsfa = $feat_factory->create_object({-feature => $feature});

This is preferable because it separates business logic that isn't  
necessarily related into defined units. I.e., the logic necessary to  
convert an ordinary feature into a strongly typed one is different  
from how to represent a strongly typed feature. IMHO anyway ...

Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
started as the result of a discussion thread earlier this (or last?)  
year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
though not in concept.

Maybe we need to get together again and thrash out a strategy; or a  
BOF at the GMOD meeting? I feel this does need a core group of people  
who care, hash out a strategy that will also solve the backwards  
compatibility problem with the current Bio::SeqFeatureI state-of- 
limbo, and allow us to implement the decisions with a few people in a  
concentrated effort. This will then also remove the only real large  
stumbling block towards a 1.6 release.

Maybe we should think about a little pre-GMOD hackathon to clear up  
this mess? Scott, you'll be there a day early? I'll be already back  
and Jason I believe will still be in town, although he may have other  
commitments already. Nonetheless, it shouldn't really take that much  
but rather dedicated time, a whiteboard, and a few people who care  
thrashing this out and then do it.

Thoughts?

	-hilmar

On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:

> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method  
> called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> ($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in  
> args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you  
> know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>> Rather than cobble together some ad-hoc solution, I would be  
>> interested
>> in working on a good solution to this problem, because it seems like
>> it's just going to get more common as more people start wanting to  
>> write
>> GFF3.  What about some code in whatever customarily makes these  
>> objects
>> (probably BSF::Annotated's new() method?) that could take another  
>> type
>> of Feature object and attempt to shoehorn its data into a new
>> BSF::Annotated?  If it failed (because the type isn't in SO or
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that  
>> needs to
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>> ($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to  
>> implement
>> this, since I'm going to need to do this sort of thing with many more
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>> wrote
>>> before is that I was hoping for divine inspiration for an answer  
>>> (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will  
>>> probably
>>> need an ad hoc method to do it, which is the same thing I do when  
>>> I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>> don't
>>> have any sample code that does this, but if you want to start  
>>> working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>
>>>> So about that converting ye olde feature objects into
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>>> That's OK--You added a few items that should be escaped that  
>>>>> weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>
>>>>>
>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>> it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>> Scott,
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>> as well=
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> --------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
ImoAXD/jrbF0gXzSr2CY4tQ=
=XfDq
-----END PGP SIGNATURE-----


From rmb32 at cornell.edu  Sat Jun 17 14:36:28 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 11:36:28 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <44944BAC.7000302@cornell.edu>

I'd love to help more with this, since with the new tomato genome coming 
in my job is going to be working more and more with annotations, but I'm 
not a core person and I can't go to the meeting in NC.  In the interests 
of getting my job done right now, I've implemented a -feature argument 
to Bio::SeqFeature::Annotated's constructor, which calls uses a method 
from_feature() I added.  If you guys want it, it's attached to bug 2026.

 From the perspective of a casual bioperl user, anything you guys can do 
to make the handling of features and annotations less fragmented and 
more robust would be wonderful.  I'd be happy to help with 
implementation if one of you grizzled veterans would give me marching 
orders. :-)

Rob

Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> You don't need a new method for this. Instead, support a -feature 
> argument.
>
>     my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>
> This should work for any instance of Bio::SeqFeatureI. If it is a 
> B::SF::Annotated already it is obviously just a deep copy (if copy is 
> desired - could be another parameter). Otherwise more will be involved.
>
> Alternatively, and possibly better, is to write a specialized 
> SeqFeatureI factory (that would implement 
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>
>     my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>         -type_ontology => $sequence_ontology,
>         -source_ontology => $feature_source_ontology,
>         -unflatten => 1);
>     my $bsfa = $feat_factory->create_object({-feature => $feature});
>
> This is preferable because it separates business logic that isn't 
> necessarily related into defined units. I.e., the logic necessary to 
> convert an ordinary feature into a strongly typed one is different 
> from how to represent a strongly typed feature. IMHO anyway ...
>
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan 
> started as the result of a discussion thread earlier this (or last?) 
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted, 
> though not in concept.
>
> Maybe we need to get together again and thrash out a strategy; or a 
> BOF at the GMOD meeting? I feel this does need a core group of people 
> who care, hash out a strategy that will also solve the backwards 
> compatibility problem with the current Bio::SeqFeatureI 
> state-of-limbo, and allow us to implement the decisions with a few 
> people in a concentrated effort. This will then also remove the only 
> real large stumbling block towards a 1.6 release.
>
> Maybe we should think about a little pre-GMOD hackathon to clear up 
> this mess? Scott, you'll be there a day early? I'll be already back 
> and Jason I believe will still be in town, although he may have other 
> commitments already. Nonetheless, it shouldn't really take that much 
> but rather dedicated time, a whiteboard, and a few people who care 
> thrashing this out and then do it.
>
> Thoughts?
>
>     -hilmar
>
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>
>> Rob,
>>
>> I came to the same conclusion as well; I wrote my response as I was
>> heading out the door and while I was running errands, I realized the
>> right thing to do is to write a Bio::SeqFeature::Annotated method called
>> new_from_object, whose usage would be:
>>
>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, 
>> %args);
>>
>> where you would give it a Bio::SeqFeatureI compliant object and try to
>> create a BSFA like use suggested below.  You could allow passing in args
>> to control how different things are handled, like mapping non-SO types
>> to SO types.  I'll think about this over the weekend and let you know if
>> brilliance strikes me.
>>
>> Scott
>>
>>
>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>> Rather than cobble together some ad-hoc solution, I would be interested
>>> in working on a good solution to this problem, because it seems like
>>> it's just going to get more common as more people start wanting to 
>>> write
>>> GFF3.  What about some code in whatever customarily makes these objects
>>> (probably BSF::Annotated's new() method?) that could take another type
>>> of Feature object and attempt to shoehorn its data into a new
>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>> whatever), it could throw() some informative error message.
>>>
>>> Then, people could write straightforward code something like:
>>>
>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>     $oldstylefeature->something_else('some other something that 
>>> needs to
>>> be changed for compliance');
>>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>>     $gff3_out->write_feature($newfeature);
>>> }
>>>
>>> Does that sound like a good idea?  I'd be more than willing to 
>>> implement
>>> this, since I'm going to need to do this sort of thing with many more
>>> things than just RepeatMasker.
>>>
>>> Rob
>>>
>>> Scott Cain wrote:
>>>> Um, yeah, good question.  The reason I didn't answer you when you 
>>>> wrote
>>>> before is that I was hoping for divine inspiration for an answer 
>>>> (or for
>>>> somebody else to answer, which would have been really great :-)
>>>>
>>>> The short answer (and easy one for me to type) is that you will 
>>>> probably
>>>> need an ad hoc method to do it, which is the same thing I do when I 
>>>> need
>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>>> have any sample code that does this, but if you want to start 
>>>> working up
>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>
>>>>> So about that converting ye olde feature objects into
>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> That's OK--You added a few items that should be escaped that 
>>>>>> weren't, so
>>>>>> I added those too.
>>>>>>
>>>>>> Thanks,
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> Woops, I should have said something about that.  I submitted it 
>>>>>>> before
>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Scott,
>>>>>>>>
>>>>>>>> Looks like Robert also submitted a bug report related to this 
>>>>>>>> as well=
>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> -------------------------------------------------------------------------- 
>>
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
>
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Sat Jun 17 16:21:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 17 Jun 2006 15:21:37 -0500
Subject: [Bioperl-l] OT : Re:  reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1D0C8412-3705-47EF-9AAA-1DD0B09AD6B5@uiuc.edu>


On Jun 17, 2006, at 11:20 AM, Hilmar Lapp wrote:
>
> Maybe we need to get together again and thrash out a strategy; or a
> BOF at the GMOD meeting? I feel this does need a core group of people
> who care, hash out a strategy that will also solve the backwards
> compatibility problem with the current Bio::SeqFeatureI state-of-
> limbo, and allow us to implement the decisions with a few people in a
> concentrated effort. This will then also remove the only real large
> stumbling block towards a 1.6 release.

That would be fantastic!

A bit OT, but if plans are afoot for a 1.6 release maybe the 'core  
group' that meets at NC could start drawing up a list of ideas/plans  
towards that release, even if it is still a ways off.  A roadmap of  
sorts so the community knows where to put forth the majority of their  
effort and focus.

Chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Mon Jun 19 06:16:57 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 19 Jun 2006 12:16:57 +0200
Subject: [Bioperl-l] doc.bioperl
Message-ID: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>

Hi,

I just noted that it can happen that the pages at doc.bioperl.org
state "No synopsis" whereas there is one in the PM file (use perldoc
or the CVS).
An example:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Fasta.html
No synopsis, No description, but

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup

shows both.

So, if you're looking for documentation don't forget to do e.g.
"perldoc Bio::DB::Fasta"

regards,
bernd


From cjfields at uiuc.edu  Mon Jun 19 10:38:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 09:38:01 -0500
Subject: [Bioperl-l] doc.bioperl
In-Reply-To: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>
Message-ID: <001501c693ad$f7689790$15327e82@pyrimidine>

This has been reported as a bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1926

Jason mentions in the bug report that the POD may contain something that
messes with the way PDOC deals with code so should be rewritten.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Monday, June 19, 2006 5:17 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] doc.bioperl
> 
> Hi,
> 
> I just noted that it can happen that the pages at doc.bioperl.org
> state "No synopsis" whereas there is one in the PM file (use perldoc
> or the CVS).
> An example:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-
> live/Bio/DB/Fasta.html
> No synopsis, No description, but
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-
> live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup
> 
> shows both.
> 
> So, if you're looking for documentation don't forget to do e.g.
> "perldoc Bio::DB::Fasta"
> 
> regards,
> bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Mon Jun 19 10:59:23 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 19 Jun 2006 09:59:23 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <83485BEB-2457-4FD6-90B8-353228868C3A@wustl.edu>

Hi,

Just curious if any other BioPerlers will be at the YAPC conference  
in Chicago next week (http://yapcchicago.org/). Some of us from the  
WashU GSC will be there, and it might be fun to meet some other  
BioPerl people over lunch or something. If there's enough interest, I  
will organize.

By the way, if you're unfamiliar with the conference and are  
interested in attending, I think registration is still open. The fee  
is low ($100).

Dave


-- 
Dave Messina
Informatics Analyst
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1415


From ClarkeW at AGR.GC.CA  Mon Jun 19 18:34:37 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 19 Jun 2006 18:34:37 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>

Hi,

I am getting the following warning and then exception 

 
-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is 1

---------------------------------------------------

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to [ACTG*] which does not look
healthy

 
NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
sequence)

 
when extracting display name and sequence from a MYSQL database. My code
is as follows:

 
my $sql = "select Clone_Name,Sequence from tbl_bgene";

     my $sth = $dbh->prepare($sql);

     $sth->execute();

     while (my $hash = $sth->fetchrow_hashref()) {

          # print("Name: ".$hash->{'Clone_Name'}."\n");

          my $seq = new Bio::Seq(  -display_id     =>
$hash->{'Clone_Name'},

                                   -seq      =>   $hash->{'Sequence'});

          $handle->write_seq($seq);

          # print("Sequence: ".$hash->{'Sequence'}."\n");

     }

 
For some reason it is failing on a particular sequence, which is a valid
DNA sequence. If anyone has any ideas on why this is I would appreciate
it.

 
Thanks, Wayne


From torsten.seemann at infotech.monash.edu.au  Mon Jun 19 19:30:19 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 09:30:19 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <4497338B.3030609@infotech.monash.edu.au>

> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 1
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.

Usually a '*' indicates a STOP codon in a protein sequence.
I don't think it is valid in a DNA sequence?

So my guess is that BioPerl is auto-detecting it as Protein sequence,
as A,C,T,G are all valid amino acids, and * is a stop codon.

So I think BioPerl is doing the right thing.

If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
Bio:Seq constructor.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From taerwin at gmail.com  Mon Jun 19 21:38:14 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 11:38:14 +1000
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>

Hi all,

Does anyone have a runnable for cap3? There seems to be some discussion
about one in the mailing archives (
http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
find any code.


Regards,

Tim


From osborne1 at optonline.net  Mon Jun 19 22:23:43 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 19 Jun 2006 22:23:43 -0400
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
Message-ID: <C0BCD46F.8EA5%osborne1@optonline.net>

Tim,

The code seems to be here, not clear if there's an executable:

http://seq.cs.iastate.edu/download.html


Brian O.


On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:

> Hi all,
> 
> Does anyone have a runnable for cap3? There seems to be some discussion
> about one in the mailing archives (
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
> find any code.
> 
> 
> 
> Regards,
> 
> Tim
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jun 19 23:23:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 22:23:26 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <000701c69418$e53b9110$15327e82@pyrimidine>

You really haven't given us much to work with more than "this doesn't work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array; hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?  I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>   $hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From taerwin at gmail.com  Mon Jun 19 23:05:13 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 13:05:13 +1000
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <C0BCD46F.8EA5%osborne1@optonline.net>
References: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
	<C0BCD46F.8EA5%osborne1@optonline.net>
Message-ID: <c7d2b5330606192005o63ed5d6i608d6b2076399932@mail.gmail.com>

Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3

Regards,

Tim

On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
>
> Tim,
>
> The code seems to be here, not clear if there's an executable:
>
> http://seq.cs.iastate.edu/download.html
>
>
> Brian O.
>
>
> On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
>
> > Hi all,
> >
> > Does anyone have a runnable for cap3? There seems to be some discussion
> > about one in the mailing archives (
> > http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> cannot
> > find any code.
> >
> >
> >
> > Regards,
> >
> > Tim
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


From torsten.seemann at infotech.monash.edu.au  Mon Jun 19 23:07:12 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 13:07:12 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <4497338B.3030609@infotech.monash.edu.au>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
	<4497338B.3030609@infotech.monash.edu.au>
Message-ID: <44976660.7030107@infotech.monash.edu.au>

> If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
> Bio:Seq constructor.

That should be -alphabet => 'dna'.
D'oh!

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From Marc.Logghe at DEVGEN.com  Tue Jun 20 03:13:22 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 20 Jun 2006 09:13:22 +0200
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6D3D60B@ANTARESIA.be.devgen.com>

It is about 3 years old and did not test it with the current bioperl
release.
Feel free to play with it.
Cheers,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Tim Erwin
> Sent: Tuesday, June 20, 2006 5:05 AM
> To: Brian Osborne
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] cap3 runnable?
> 
> Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3
> 
> Regards,
> 
> Tim
> 
> On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
> >
> > Tim,
> >
> > The code seems to be here, not clear if there's an executable:
> >
> > http://seq.cs.iastate.edu/download.html
> >
> >
> > Brian O.
> >
> >
> > On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Does anyone have a runnable for cap3? There seems to be some 
> > > discussion about one in the mailing archives (
> > > 
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> > cannot
> > > find any code.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Tim
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cap3.pm
Type: application/octet-stream
Size: 3374 bytes
Desc: Cap3.pm
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/0976a7d9/attachment-0003.obj>

From G.Tzotzos at unido.org  Tue Jun 20 05:18:48 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 11:18:48 +0200
Subject: [Bioperl-l] Error message
Message-ID: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D

Wagramerstrasse 5
A-1400 Vienna
Austria

Email: g.tzotzos at unido.org


From G.Tzotzos at unido.org  Tue Jun 20 07:36:18 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 13:36:18 +0200
Subject: [Bioperl-l] Error message
Message-ID: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D
Vienna, Austria


From s-merchant at northwestern.edu  Tue Jun 20 10:41:33 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Tue, 20 Jun 2006 09:41:33 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <002701c69477$9ffa7c10$c2987ca5@pc13>

Hey Dave,
  I am doing a talk on dictyBase at the YAPC . I think it would be great to
meet for lunch. 

Cheers,
Sohel Merchant.

dictyBase
Northwestern University,
Chicago

>

>Just curious if any other BioPerlers will be at the YAPC conference in 

>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
Some of us from the WashU 

>GSC will be there, and it might be fun to meet some other BioPerl 

>people over lunch or something. If there's enough interest, I will 

>organize.

>

>By the way, if you're unfamiliar with the conference and are interested 

>in attending, I think registration is still open. The fee is low 

>($100).

>

>Dave

>

>

>--


From cain at cshl.edu  Tue Jun 20 12:03:26 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 20 Jun 2006 12:03:26 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1150819406.2585.27.camel@localhost.localdomain>

Hi Hilmar,

Of course you are right--I was under the influence of a perl module that
I work with that does something similar, but both of your solutions are
better.

I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
look this week.

As for next week, I plan on spending the day at NESCent on Wednesday
(though I haven't told Todd or Jeff that I am arriving early yet) just
to make sure all the details are in place.  I imagine I'll have a fair
amount of free time to hash this stuff out.  Anyone else who is in town
(that is, in Durham, NC, USA) is welcome to come draw on a white board
too. :-)

Scott


On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> You don't need a new method for this. Instead, support a -feature  
> argument.
> 
> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
> 
> This should work for any instance of Bio::SeqFeatureI. If it is a  
> B::SF::Annotated already it is obviously just a deep copy (if copy is  
> desired - could be another parameter). Otherwise more will be involved.
> 
> Alternatively, and possibly better, is to write a specialized  
> SeqFeatureI factory (that would implement  
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
> 
> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
> 		-type_ontology => $sequence_ontology,
> 		-source_ontology => $feature_source_ontology,
> 		-unflatten => 1);
> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
> 
> This is preferable because it separates business logic that isn't  
> necessarily related into defined units. I.e., the logic necessary to  
> convert an ordinary feature into a strongly typed one is different  
> from how to represent a strongly typed feature. IMHO anyway ...
> 
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
> started as the result of a discussion thread earlier this (or last?)  
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
> though not in concept.
> 
> Maybe we need to get together again and thrash out a strategy; or a  
> BOF at the GMOD meeting? I feel this does need a core group of people  
> who care, hash out a strategy that will also solve the backwards  
> compatibility problem with the current Bio::SeqFeatureI state-of- 
> limbo, and allow us to implement the decisions with a few people in a  
> concentrated effort. This will then also remove the only real large  
> stumbling block towards a 1.6 release.
> 
> Maybe we should think about a little pre-GMOD hackathon to clear up  
> this mess? Scott, you'll be there a day early? I'll be already back  
> and Jason I believe will still be in town, although he may have other  
> commitments already. Nonetheless, it shouldn't really take that much  
> but rather dedicated time, a whiteboard, and a few people who care  
> thrashing this out and then do it.
> 
> Thoughts?
> 
> 	-hilmar
> 
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
> 
> > Rob,
> >
> > I came to the same conclusion as well; I wrote my response as I was
> > heading out the door and while I was running errands, I realized the
> > right thing to do is to write a Bio::SeqFeature::Annotated method  
> > called
> > new_from_object, whose usage would be:
> >
> >   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> > ($my_BSFI, %args);
> >
> > where you would give it a Bio::SeqFeatureI compliant object and try to
> > create a BSFA like use suggested below.  You could allow passing in  
> > args
> > to control how different things are handled, like mapping non-SO types
> > to SO types.  I'll think about this over the weekend and let you  
> > know if
> > brilliance strikes me.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> >> Rather than cobble together some ad-hoc solution, I would be  
> >> interested
> >> in working on a good solution to this problem, because it seems like
> >> it's just going to get more common as more people start wanting to  
> >> write
> >> GFF3.  What about some code in whatever customarily makes these  
> >> objects
> >> (probably BSF::Annotated's new() method?) that could take another  
> >> type
> >> of Feature object and attempt to shoehorn its data into a new
> >> BSF::Annotated?  If it failed (because the type isn't in SO or
> >> whatever), it could throw() some informative error message.
> >>
> >> Then, people could write straightforward code something like:
> >>
> >> while(my $oldstylefeature = $features_in->next_feature) {
> >>     $oldstylefeature->primary_tag('something_that_is_in_so');
> >>     $oldstylefeature->something_else('some other something that  
> >> needs to
> >> be changed for compliance');
> >>     my $newfeature = Bio::SeqFeature::Annotated->new 
> >> ($oldstylefeature);
> >>     $gff3_out->write_feature($newfeature);
> >> }
> >>
> >> Does that sound like a good idea?  I'd be more than willing to  
> >> implement
> >> this, since I'm going to need to do this sort of thing with many more
> >> things than just RepeatMasker.
> >>
> >> Rob
> >>
> >> Scott Cain wrote:
> >>> Um, yeah, good question.  The reason I didn't answer you when you  
> >>> wrote
> >>> before is that I was hoping for divine inspiration for an answer  
> >>> (or for
> >>> somebody else to answer, which would have been really great :-)
> >>>
> >>> The short answer (and easy one for me to type) is that you will  
> >>> probably
> >>> need an ad hoc method to do it, which is the same thing I do when  
> >>> I need
> >>> to convert gff2 to gff3, to make sure the things I need mapped get
> >>> mapped the 'right' way (that is, the way I want them to go).  I  
> >>> don't
> >>> have any sample code that does this, but if you want to start  
> >>> working up
> >>> an ad hoc method, I will certainly try to help you as much as I can.
> >>>
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >>>
> >>>> So about that converting ye olde feature objects into
> >>>> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>>>
> >>>>
> >>>> Scott Cain wrote:
> >>>>
> >>>>> That's OK--You added a few items that should be escaped that  
> >>>>> weren't, so
> >>>>> I added those too.
> >>>>>
> >>>>> Thanks,
> >>>>> Scott
> >>>>>
> >>>>>
> >>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>>>
> >>>>>
> >>>>>> Woops, I should have said something about that.  I submitted  
> >>>>>> it before
> >>>>>> I saw that Scott had already done the escaping in CVS.
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Scott,
> >>>>>>>
> >>>>>>> Looks like Robert also submitted a bug report related to this  
> >>>>>>> as well=
> >>>>>>> ---------------------------------------------------------------- 
> >>>>>>> --------
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
> 
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/4b71554e/attachment-0003.bin>

From osborne1 at optonline.net  Tue Jun 20 12:13:51 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 12:13:51 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>
Message-ID: <C0BD96FF.8EC3%osborne1@optonline.net>

George,

The docs I'm reading say to use 'swiss', not 'swissprot' but I think there's
some other problem that may be specific to SwissProt. Can you retrieve from
GenBank? E.g.:

my $seq_object = get_sequence('genbank', 2);

Brian O.


On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:

> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
> 
> use Bio::Perl;
> use strict;
> use warnings;
> 
> my $seq_object = get_sequence('swissprot', "P09651");
> 
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
> 
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
> 
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
> 
> Thanks
> 
> George
> 
> 
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
> 
> 
> 
> George T. Tzotzos Ph.D
> Vienna, Austria
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Tzotzos at unido.org  Tue Jun 20 12:21:32 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 18:21:32 +0200
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
References: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <76750E11-3BD6-42EB-832D-3A12BC6B4BEE@unido.org>

Brian

Neither <swiss> nor <swissprot> work. However, your suggestion does  
work fine. So does Chandan's.  Many thanks to both.

Cheers

George


On 20 Jun 2006, at 18:13, Brian Osborne wrote:

> George,
>
> The docs I'm reading say to use 'swiss', not 'swissprot' but I  
> think there's
> some other problem that may be specific to SwissProt. Can you  
> retrieve from
> GenBank? E.g.:
>
> my $seq_object = get_sequence('genbank', 2);
>
> Brian O.
>
>
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>
>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>> following script to test the installation:
>>
>> use Bio::Perl;
>> use strict;
>> use warnings;
>>
>> my $seq_object = get_sequence('swissprot', "P09651");
>>
>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>
>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>> get the message below.
>>
>> Any help on the nature of the problem and how to overcome it would be
>> greatly appreciated.
>>
>> Thanks
>>
>> George
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>> swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>> WebDBSeqI.pm:153
>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>> STACK toplevel tut2.pl:5
>>
>>
>>
>> George T. Tzotzos Ph.D
>> Vienna, Austria
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From ClarkeW at AGR.GC.CA  Tue Jun 20 12:57:34 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 12:57:34 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>


The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
trace is 
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
STACK: /home/wayne/bin/mast_fasta.pl:59

And the full script is attached. 

However I would like to clarify that the actual sequence is not ACTG*,
this was a notation to represent that I had checked it to be sure that
it was a valid DNA sequence but due to confidentiality I cannot disclose
the actual sequence. I know this makes it more difficult and that I
perhaps should have been clearer about this originally. The $handle is a
Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
name  

'Clone_Name' => 'sJ1485'
        };
then the error message. I hope this is more helpful than my last
message.

Thanks, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mast_fasta.pl
Type: application/octet-stream
Size: 1998 bytes
Desc: mast_fasta.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/53770697/attachment-0003.obj>

From cjfields at uiuc.edu  Tue Jun 20 13:16:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:16:32 -0500
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <000c01c6948d$46e992d0$15327e82@pyrimidine>

Brian,

Brian,

Looks like EBI switched the url parameter for swissprot 'swall' to
'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
this and solves the issue.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Tuesday, June 20, 2006 11:14 AM
> To: George Tzotzos; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Error message
> 
> George,
> 
> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
> there's
> some other problem that may be specific to SwissProt. Can you retrieve
> from
> GenBank? E.g.:
> 
> my $seq_object = get_sequence('genbank', 2);
> 
> Brian O.
> 
> 
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
> 
> > I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> > following script to test the installation:
> >
> > use Bio::Perl;
> > use strict;
> > use warnings;
> >
> > my $seq_object = get_sequence('swissprot', "P09651");
> >
> > write_sequence(">roa1.fasta", 'fasta', $seq_object);
> >
> > I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> > get the message below.
> >
> > Any help on the nature of the problem and how to overcome it would be
> > greatly appreciated.
> >
> > Thanks
> >
> > George
> >
> >
> > ------------- EXCEPTION  -------------
> > MSG: swissprot stream with no ID. Not swissprot in my book
> > STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> > swiss.pm:179
> > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> > WebDBSeqI.pm:153
> > STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> > STACK toplevel tut2.pl:5
> >
> >
> >
> > George T. Tzotzos Ph.D
> > Vienna, Austria
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chandan.kr.singh at gmail.com  Tue Jun 20 10:46:01 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Tue, 20 Jun 2006 20:16:01 +0530
Subject: [Bioperl-l] Error message
In-Reply-To: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
References: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
Message-ID: <2d4f320606200746ja53cebs73923c510b535c44@mail.gmail.com>

Hi
It seems the 'swall' servertype on EBI no longer exists. May be this  has
already been reported  and debugged. I hope somebody throws light on it.

As for George, if u r in hurry u can use Bio::DB::SwissProt module directly.
Here is a typical code to do this

use strict ;
use warnings ;
use Bio::DB::SwissProt ;
use Bio::Perl ;
my $seq_obj = new Bio::DB::SwissProt('-servertype' => 'expasy' ,
'-hostlocation' => 'us') ;
my $seq = $seq_obj->get_Seq_by_id('ROA1_HUMAN') ;
write_sequence("> roa.sp" , 'fasta' , $seq) ;


See the module for any help .

cheers
Chandan


On 6/20/06, George Tzotzos <G.Tzotzos at unido.org> wrote:
>
> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
>
> use Bio::Perl;
> use strict;
> use warnings;
>
> my $seq_object = get_sequence('swissprot', "P09651");
>
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
>
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
>
> Thanks
>
> George
>
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
>
>
>
>
>
> George T. Tzotzos Ph.D
>
> Wagramerstrasse 5
> A-1400 Vienna
> Austria
>
> Email: g.tzotzos at unido.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From osborne1 at optonline.net  Tue Jun 20 13:33:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 13:33:07 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <000c01c6948d$46e992d0$15327e82@pyrimidine>
Message-ID: <C0BDA993.8ED3%osborne1@optonline.net>

Chris,

You beat me to it!

Brian O.


On 6/20/06 1:16 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Brian,
> 
> Brian,
> 
> Looks like EBI switched the url parameter for swissprot 'swall' to
> 'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
> this and solves the issue.
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Tuesday, June 20, 2006 11:14 AM
>> To: George Tzotzos; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Error message
>> 
>> George,
>> 
>> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
>> there's
>> some other problem that may be specific to SwissProt. Can you retrieve
>> from
>> GenBank? E.g.:
>> 
>> my $seq_object = get_sequence('genbank', 2);
>> 
>> Brian O.
>> 
>> 
>> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>> 
>>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>>> following script to test the installation:
>>> 
>>> use Bio::Perl;
>>> use strict;
>>> use warnings;
>>> 
>>> my $seq_object = get_sequence('swissprot', "P09651");
>>> 
>>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>> 
>>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>>> get the message below.
>>> 
>>> Any help on the nature of the problem and how to overcome it would be
>>> greatly appreciated.
>>> 
>>> Thanks
>>> 
>>> George
>>> 
>>> 
>>> ------------- EXCEPTION  -------------
>>> MSG: swissprot stream with no ID. Not swissprot in my book
>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>>> swiss.pm:179
>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>>> WebDBSeqI.pm:153
>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>>> STACK toplevel tut2.pl:5
>>> 
>>> 
>>> 
>>> George T. Tzotzos Ph.D
>>> Vienna, Austria
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue Jun 20 13:44:42 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 13:44:42 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A66@onncrxms4.agr.gc.ca>

Hi all, 

It seems that there is a newline character which is causing the problem,
this wasn't obvious at first due to the size of my shell window but that
is what is giving the mismatch error. Thanks to Chris and Torsten for
the help and for pointing me in the direction of validate_seq which was
helpful in finding the problem.

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun 20 13:55:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:55:28 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>
Message-ID: <000e01c69492$b74e0ec0$15327e82@pyrimidine>

> -----Original Message-----
> From: Clarke, Wayne [mailto:ClarkeW at AGR.GC.CA]
> Sent: Tuesday, June 20, 2006 11:58 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> 
> The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
> trace is
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
> STACK: /home/wayne/bin/mast_fasta.pl:59
> 
> And the full script is attached.

Have you tried a newer version of Bioperl to see if it fixed the issue?  v.
1.5.1 has been out for a bit now and it's pretty stable.

> However I would like to clarify that the actual sequence is not ACTG*,
> this was a notation to represent that I had checked it to be sure that
> it was a valid DNA sequence but due to confidentiality I cannot disclose
> the actual sequence. I know this makes it more difficult and that I
> perhaps should have been clearer about this originally. 

That's not a problem.  We run into that here a bit.  Example data is fine.

> The $handle is a
> Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
> name
> 
> 'Clone_Name' => 'sJ1485'
>         };
> then the error message. I hope this is more helpful than my last
> message.
> 
> Thanks, Wayne

Make sure you aren't using bioperl-specific methods when you run
Data::Dumper on your hash or the script crashes.

Okay, I was able to reproduce your error using PrimarySeq from v. 1.4 (BTW,
the error message changes if you use a newer version of Bioperl but it is
still there).  See if you can follow me here...

I used this script:
-------------------------
use Bio::Seq;
use Bio::SeqIO;
use Data::Dumper;

my $hash = {'Clone'     => 'test',
            'Sequence'  => 'ACTG*'};

my $seqout = Bio::SeqIO->new (-format   => 'fasta',
                              -fh       => \*STDOUT);

print Dumper($hash);

my $seq = Bio::Seq->new(-seq            => $hash->{'Sequence'},
                        -display_id     => $hash->{'Clone'});

$seqout->write_seq($seq);
-------------------------

And everything works fine, with this output:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => 'ACTG*'
        };
>test
ACTG*

Changing the anonymous hash to this causes the crash and error.

my $hash = {'Clone'     => 'test',
            'Sequence'  => ['ACTG*']};

Gets this:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => [
                          'ACTG*'
                        ]
        };

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 1
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [ARRAY(0x2354b0)] which does not look
healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\core/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:268
STACK: Bio::PrimarySeq::new C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new C:\Perl\src\bioperl\core/Bio/Seq.pm:497
STACK: C:\Perl\Scripts\seq-test\test.pl:17
-----------------------------------------------------------

It could be that the sequence data is stored in another complex data type
(object, hash) that's causing the problem.  Looks like you retrieve your
hash from another method ('my $hash = $sth->fetchrow_hashref()'); you might
want to check that method to make sure you're getting the right kind of data
into your hash.
 
Chris


From rmb32 at cornell.edu  Tue Jun 20 14:09:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 20 Jun 2006 12:09:38 -0600
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150819406.2585.27.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
Message-ID: <449839E2.5080402@cornell.edu>

Getting to know this code a little better, I notice a couple of little 
things: 

1.) my patch attached to bug 2026 draws unnecessary distinctions between 
feature types that use tags, and those that use annotations, since all 
features are now Bio::AnnotatableI's and the *_tags_* methods are 
implemented in AnnotatableI in terms of annotation objects now.  You 
guys should probably just ignore it, since from the sound of it you're 
going to be changing all of this around anyway.  Wish I could be there 
to help and learn more.

2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar 
accessors to use when translating Bio::Annotation::* objects to and from 
scalar tags.  Seems to me, this would be much better accomplished by 
using polymorphism of some sort, probably adding a multipurpose as_tag() 
accessor in Bio::AnnotationI and the objects that implement it, then 
using that in Bio::AnnotatableI instead of %tag2text.  Does this make 
sense, or am I misinterpreting something here?  Reason I've noticed this 
is because I've been wrestling with how to translate  
Bio::Annotation::Target objects to and from scalar tag values, since a 
Target is being represented as an ordered list of 3 or 4 scalar tags in 
old things that were designed to interoperate with gff2, and I can't 
figure out a nice way to do it using the rather inflexible %tag2text 
mechanism.

Sorry to be a pain, just wanted to get that in there before you guys 
start your jam session in Durham.

Rob

Scott Cain wrote:
> Hi Hilmar,
>
> Of course you are right--I was under the influence of a perl module that
> I work with that does something similar, but both of your solutions are
> better.
>
> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
> look this week.
>
> As for next week, I plan on spending the day at NESCent on Wednesday
> (though I haven't told Todd or Jeff that I am arriving early yet) just
> to make sure all the details are in place.  I imagine I'll have a fair
> amount of free time to hash this stuff out.  Anyone else who is in town
> (that is, in Durham, NC, USA) is welcome to come draw on a white board
> too. :-)
>
> Scott
>
>
> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>   
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> You don't need a new method for this. Instead, support a -feature  
>> argument.
>>
>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>
>> This should work for any instance of Bio::SeqFeatureI. If it is a  
>> B::SF::Annotated already it is obviously just a deep copy (if copy is  
>> desired - could be another parameter). Otherwise more will be involved.
>>
>> Alternatively, and possibly better, is to write a specialized  
>> SeqFeatureI factory (that would implement  
>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>
>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>> 		-type_ontology => $sequence_ontology,
>> 		-source_ontology => $feature_source_ontology,
>> 		-unflatten => 1);
>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>
>> This is preferable because it separates business logic that isn't  
>> necessarily related into defined units. I.e., the logic necessary to  
>> convert an ordinary feature into a strongly typed one is different  
>> from how to represent a strongly typed feature. IMHO anyway ...
>>
>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
>> started as the result of a discussion thread earlier this (or last?)  
>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
>> though not in concept.
>>
>> Maybe we need to get together again and thrash out a strategy; or a  
>> BOF at the GMOD meeting? I feel this does need a core group of people  
>> who care, hash out a strategy that will also solve the backwards  
>> compatibility problem with the current Bio::SeqFeatureI state-of- 
>> limbo, and allow us to implement the decisions with a few people in a  
>> concentrated effort. This will then also remove the only real large  
>> stumbling block towards a 1.6 release.
>>
>> Maybe we should think about a little pre-GMOD hackathon to clear up  
>> this mess? Scott, you'll be there a day early? I'll be already back  
>> and Jason I believe will still be in town, although he may have other  
>> commitments already. Nonetheless, it shouldn't really take that much  
>> but rather dedicated time, a whiteboard, and a few people who care  
>> thrashing this out and then do it.
>>
>> Thoughts?
>>
>> 	-hilmar
>>
>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>
>>     
>>> Rob,
>>>
>>> I came to the same conclusion as well; I wrote my response as I was
>>> heading out the door and while I was running errands, I realized the
>>> right thing to do is to write a Bio::SeqFeature::Annotated method  
>>> called
>>> new_from_object, whose usage would be:
>>>
>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
>>> ($my_BSFI, %args);
>>>
>>> where you would give it a Bio::SeqFeatureI compliant object and try to
>>> create a BSFA like use suggested below.  You could allow passing in  
>>> args
>>> to control how different things are handled, like mapping non-SO types
>>> to SO types.  I'll think about this over the weekend and let you  
>>> know if
>>> brilliance strikes me.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>       
>>>> Rather than cobble together some ad-hoc solution, I would be  
>>>> interested
>>>> in working on a good solution to this problem, because it seems like
>>>> it's just going to get more common as more people start wanting to  
>>>> write
>>>> GFF3.  What about some code in whatever customarily makes these  
>>>> objects
>>>> (probably BSF::Annotated's new() method?) that could take another  
>>>> type
>>>> of Feature object and attempt to shoehorn its data into a new
>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>> whatever), it could throw() some informative error message.
>>>>
>>>> Then, people could write straightforward code something like:
>>>>
>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>     $oldstylefeature->something_else('some other something that  
>>>> needs to
>>>> be changed for compliance');
>>>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>>>> ($oldstylefeature);
>>>>     $gff3_out->write_feature($newfeature);
>>>> }
>>>>
>>>> Does that sound like a good idea?  I'd be more than willing to  
>>>> implement
>>>> this, since I'm going to need to do this sort of thing with many more
>>>> things than just RepeatMasker.
>>>>
>>>> Rob
>>>>
>>>> Scott Cain wrote:
>>>>         
>>>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>>>> wrote
>>>>> before is that I was hoping for divine inspiration for an answer  
>>>>> (or for
>>>>> somebody else to answer, which would have been really great :-)
>>>>>
>>>>> The short answer (and easy one for me to type) is that you will  
>>>>> probably
>>>>> need an ad hoc method to do it, which is the same thing I do when  
>>>>> I need
>>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>>>> don't
>>>>> have any sample code that does this, but if you want to start  
>>>>> working up
>>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>
>>>>>           
>>>>>> So about that converting ye olde feature objects into
>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>
>>>>>>
>>>>>> Scott Cain wrote:
>>>>>>
>>>>>>             
>>>>>>> That's OK--You added a few items that should be escaped that  
>>>>>>> weren't, so
>>>>>>> I added those too.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Scott
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>>>> it before
>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Scott,
>>>>>>>>>
>>>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>>>> as well=
>>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>>> --------
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>                   
>>> -- 
>>> ---------------------------------------------------------------------- 
>>> --
>>> Scott Cain, Ph. D.                                          
>>> cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> - --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (Darwin)
>>
>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>> ImoAXD/jrbF0gXzSr2CY4tQ=
>> =XfDq
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From hlapp at gmx.net  Tue Jun 20 14:24:45 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 20 Jun 2006 14:24:45 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449839E2.5080402@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
	<449839E2.5080402@cornell.edu>
Message-ID: <A3627468-CCA4-41FD-8C09-F5E1BFCE67D0@gmx.net>

Yes, this is the sore problem area. AnnotatableI used to have only a  
single method (annotation()), the *_tag_* methods are new since 1.5  
(and truly a developer release feature - don't rely on them staying).

Likewise, the tag2text is an utterly ugly artifact (after all, this  
is an interface) rooted in the above addition. If we can't manage to  
remove it I'll remove my name from that module ;)

	-hilmar

On Jun 20, 2006, at 2:09 PM, Robert Buels wrote:

> Getting to know this code a little better, I notice a couple of little
> things:
>
> 1.) my patch attached to bug 2026 draws unnecessary distinctions  
> between
> feature types that use tags, and those that use annotations, since all
> features are now Bio::AnnotatableI's and the *_tags_* methods are
> implemented in AnnotatableI in terms of annotation objects now.  You
> guys should probably just ignore it, since from the sound of it you're
> going to be changing all of this around anyway.  Wish I could be there
> to help and learn more.
>
> 2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar
> accessors to use when translating Bio::Annotation::* objects to and  
> from
> scalar tags.  Seems to me, this would be much better accomplished by
> using polymorphism of some sort, probably adding a multipurpose  
> as_tag()
> accessor in Bio::AnnotationI and the objects that implement it, then
> using that in Bio::AnnotatableI instead of %tag2text.  Does this make
> sense, or am I misinterpreting something here?  Reason I've noticed  
> this
> is because I've been wrestling with how to translate
> Bio::Annotation::Target objects to and from scalar tag values, since a
> Target is being represented as an ordered list of 3 or 4 scalar  
> tags in
> old things that were designed to interoperate with gff2, and I can't
> figure out a nice way to do it using the rather inflexible %tag2text
> mechanism.
>
> Sorry to be a pain, just wanted to get that in there before you guys
> start your jam session in Durham.
>
> Rob
>
> Scott Cain wrote:
>> Hi Hilmar,
>>
>> Of course you are right--I was under the influence of a perl  
>> module that
>> I work with that does something similar, but both of your  
>> solutions are
>> better.
>>
>> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
>> look this week.
>>
>> As for next week, I plan on spending the day at NESCent on Wednesday
>> (though I haven't told Todd or Jeff that I am arriving early yet)  
>> just
>> to make sure all the details are in place.  I imagine I'll have a  
>> fair
>> amount of free time to hash this stuff out.  Anyone else who is in  
>> town
>> (that is, in Durham, NC, USA) is welcome to come draw on a white  
>> board
>> too. :-)
>>
>> Scott
>>
>>
>> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> You don't need a new method for this. Instead, support a -feature
>>> argument.
>>>
>>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>>
>>> This should work for any instance of Bio::SeqFeatureI. If it is a
>>> B::SF::Annotated already it is obviously just a deep copy (if  
>>> copy is
>>> desired - could be another parameter). Otherwise more will be  
>>> involved.
>>>
>>> Alternatively, and possibly better, is to write a specialized
>>> SeqFeatureI factory (that would implement
>>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>>
>>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>>> 		-type_ontology => $sequence_ontology,
>>> 		-source_ontology => $feature_source_ontology,
>>> 		-unflatten => 1);
>>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>>
>>> This is preferable because it separates business logic that isn't
>>> necessarily related into defined units. I.e., the logic necessary to
>>> convert an ordinary feature into a strongly typed one is different
>>> from how to represent a strongly typed feature. IMHO anyway ...
>>>
>>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan
>>> started as the result of a discussion thread earlier this (or last?)
>>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,
>>> though not in concept.
>>>
>>> Maybe we need to get together again and thrash out a strategy; or a
>>> BOF at the GMOD meeting? I feel this does need a core group of  
>>> people
>>> who care, hash out a strategy that will also solve the backwards
>>> compatibility problem with the current Bio::SeqFeatureI state-of-
>>> limbo, and allow us to implement the decisions with a few people  
>>> in a
>>> concentrated effort. This will then also remove the only real large
>>> stumbling block towards a 1.6 release.
>>>
>>> Maybe we should think about a little pre-GMOD hackathon to clear up
>>> this mess? Scott, you'll be there a day early? I'll be already back
>>> and Jason I believe will still be in town, although he may have  
>>> other
>>> commitments already. Nonetheless, it shouldn't really take that much
>>> but rather dedicated time, a whiteboard, and a few people who care
>>> thrashing this out and then do it.
>>>
>>> Thoughts?
>>>
>>> 	-hilmar
>>>
>>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>>
>>>
>>>> Rob,
>>>>
>>>> I came to the same conclusion as well; I wrote my response as I was
>>>> heading out the door and while I was running errands, I realized  
>>>> the
>>>> right thing to do is to write a Bio::SeqFeature::Annotated method
>>>> called
>>>> new_from_object, whose usage would be:
>>>>
>>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object
>>>> ($my_BSFI, %args);
>>>>
>>>> where you would give it a Bio::SeqFeatureI compliant object and  
>>>> try to
>>>> create a BSFA like use suggested below.  You could allow passing in
>>>> args
>>>> to control how different things are handled, like mapping non-SO  
>>>> types
>>>> to SO types.  I'll think about this over the weekend and let you
>>>> know if
>>>> brilliance strikes me.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>>
>>>>> Rather than cobble together some ad-hoc solution, I would be
>>>>> interested
>>>>> in working on a good solution to this problem, because it seems  
>>>>> like
>>>>> it's just going to get more common as more people start wanting to
>>>>> write
>>>>> GFF3.  What about some code in whatever customarily makes these
>>>>> objects
>>>>> (probably BSF::Annotated's new() method?) that could take another
>>>>> type
>>>>> of Feature object and attempt to shoehorn its data into a new
>>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>>> whatever), it could throw() some informative error message.
>>>>>
>>>>> Then, people could write straightforward code something like:
>>>>>
>>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>>     $oldstylefeature->something_else('some other something that
>>>>> needs to
>>>>> be changed for compliance');
>>>>>     my $newfeature = Bio::SeqFeature::Annotated->new
>>>>> ($oldstylefeature);
>>>>>     $gff3_out->write_feature($newfeature);
>>>>> }
>>>>>
>>>>> Does that sound like a good idea?  I'd be more than willing to
>>>>> implement
>>>>> this, since I'm going to need to do this sort of thing with  
>>>>> many more
>>>>> things than just RepeatMasker.
>>>>>
>>>>> Rob
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> Um, yeah, good question.  The reason I didn't answer you when you
>>>>>> wrote
>>>>>> before is that I was hoping for divine inspiration for an answer
>>>>>> (or for
>>>>>> somebody else to answer, which would have been really great :-)
>>>>>>
>>>>>> The short answer (and easy one for me to type) is that you will
>>>>>> probably
>>>>>> need an ad hoc method to do it, which is the same thing I do when
>>>>>> I need
>>>>>> to convert gff2 to gff3, to make sure the things I need mapped  
>>>>>> get
>>>>>> mapped the 'right' way (that is, the way I want them to go).  I
>>>>>> don't
>>>>>> have any sample code that does this, but if you want to start
>>>>>> working up
>>>>>> an ad hoc method, I will certainly try to help you as much as  
>>>>>> I can.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> So about that converting ye olde feature objects into
>>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>>
>>>>>>>
>>>>>>> Scott Cain wrote:
>>>>>>>
>>>>>>>
>>>>>>>> That's OK--You added a few items that should be escaped that
>>>>>>>> weren't, so
>>>>>>>> I added those too.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Woops, I should have said something about that.  I submitted
>>>>>>>>> it before
>>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>>
>>>>>>>>> Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Scott,
>>>>>>>>>>
>>>>>>>>>> Looks like Robert also submitted a bug report related to this
>>>>>>>>>> as well=
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> ---
>>>>>>>>>> --------
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------- 
>>>> ---
>>>> --
>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> - --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (Darwin)
>>>
>>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>>> ImoAXD/jrbF0gXzSr2CY4tQ=
>>> =XfDq
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> -------------------------------------------------------------------- 
>>> ----
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 20 16:22:45 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Jun 2006 21:22:45 +0100
Subject: [Bioperl-l] Bio::Map changes
Message-ID: <44985915.8010607@sendu.me.uk>

Some initial changes have been made to some modules in Bio::Map to allow 
Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
(see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)

Further changes are needed in some remaining Bio::Map modules for this 
addition to be complete (a number of Bio::Map related tests in the test 
suite currently fail), notably Bio::Map::Cyto* since they had 
implemented their own Range-related features.

I propose bringing all Bio::Map into line so it behaves with and makes 
good use of the RangeI nature of Position. Beyond this initial change I 
want to add relative positioning and more, but I'll describe that in a 
future post to this thread.

Can anyone see any issues with ranged positions (it's done in a backward 
compatible way)? Do any developers want to maintain control of a 
Bio::Map module or shall I just dive in?


From cjfields at uiuc.edu  Tue Jun 20 23:50:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 22:50:55 -0500
Subject: [Bioperl-l] EUtilities interface
Message-ID: <002301c694e5$e5f3a750$15327e82@pyrimidine>

I'm working on a new eutilities interface which I hope to commit by late
summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
generic web database interface, which I call Bio::DB::WebDBI, and the
EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
NCBI for any information available via Entrez Utilities (i.e. taxonomy,
pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
info like Bio::DB::WebDBSeqI.  

My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
Does anyone think this will be an issue?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From bix at sendu.me.uk  Wed Jun 21 04:20:37 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 09:20:37 +0100
Subject: [Bioperl-l] Bio::RangeI intersection proposal
Message-ID: <44990155.6050501@sendu.me.uk>

Bio::Map::PositionI (in bioperl-live) needs intersections of a list of 
ranges. It inherits from Bio::RangeI but unlike RangeI's union, 
intersection does not take a list. PositionI currently calls 
intersection repeatedly to handle a list.

If there is no particular reason for this limitation, I propose making 
RangeI intersection handle lists natively. This won't do any harm to 
existing code at the time of the change, but its possible that someone 
has written a module that implements RangeI but overrides intersection 
(without making it accept a list), so that future code written that 
expects a RangeI to handle lists will break when getting a RangeI from 
that module.

So the question is, has anyone overridden intersection in RangeI? Is the 
small risk of possible breakage compensated by the benefit of 
intersections of a list of ranges (which is surely useful in lots of 
situations, not just for PositionI)?

I'm tempted to go ahead with this unless there are objections.


From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 03:19:12 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 09:19:12 +0200
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <002701c69477$9ffa7c10$c2987ca5@pc13>
References: <002701c69477$9ffa7c10$c2987ca5@pc13>
Message-ID: <4498F2F0.7010203@biomax.com>

Sohel Merchant wrote:

> 
>>Just curious if any other BioPerlers will be at the YAPC conference in 
> 
> 
>>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).

Not in chicago, but yesterday I got the OK from Biomax management to go 
the YAPC::Europe, http://www.birmingham2006.com/. So in the end of 
August I'll be in Birmingham. Yeah!

Is anybody interested in writing parsers for Perl 6 there?

CU, Bernhard


-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************


From cjfields at uiuc.edu  Wed Jun 21 11:08:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:08:28 -0500
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <4498F2F0.7010203@biomax.com>
Message-ID: <000301c69544$8d537710$15327e82@pyrimidine>

Speaking of Perl6, there was interest here at one point in getting a
bioperl-experimental going, which at this point in the game should involve
Perl6.  If there were enough interest in it we could probably get it set up
via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
experimental modules in some way to prevent confusion (bioperl6-live???),
though I'm not up to speed Perl6-wise so I'm not sure about namespace
collisions and so on.

bioperl-experimental would be, like the name implies, a sort of testing
ground for ideas (good and bad).  It seemed like it was going to take off a
few years ago but it lost steam, I'm guess.

As for your parsers, would you build them from the ground up (i.e. from
Bio::Root::Root on up)?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernhard Schmalhofer
> Sent: Wednesday, June 21, 2006 2:19 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Sohel Merchant
> Subject: Re: [Bioperl-l] YAPC anyone?
> 
> Sohel Merchant wrote:
> 
> >
> >>Just curious if any other BioPerlers will be at the YAPC conference in
> >
> >
> >>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
> 
> Not in chicago, but yesterday I got the OK from Biomax management to go
> the YAPC::Europe, http://www.birmingham2006.com/. So in the end of
> August I'll be in Birmingham. Yeah!
> 
> Is anybody interested in writing parsers for Perl 6 there?
> 
> CU, Bernhard
> 
> 
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 11:16:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:16:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <44990155.6050501@sendu.me.uk>
Message-ID: <000401c69545$a4a3ad30$15327e82@pyrimidine>

I personally have no objections as long as it doesn't break API.  Don't know
how the senior guys feel (Jason, Brian, Heikki, Hilmar...); I'm not a user
of Bio::Map modules myself.

Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, June 21, 2006 3:21 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::RangeI intersection proposal
> 
> Bio::Map::PositionI (in bioperl-live) needs intersections of a list of
> ranges. It inherits from Bio::RangeI but unlike RangeI's union,
> intersection does not take a list. PositionI currently calls
> intersection repeatedly to handle a list.
> 
> If there is no particular reason for this limitation, I propose making
> RangeI intersection handle lists natively. This won't do any harm to
> existing code at the time of the change, but its possible that someone
> has written a module that implements RangeI but overrides intersection
> (without making it accept a list), so that future code written that
> expects a RangeI to handle lists will break when getting a RangeI from
> that module.
> 
> So the question is, has anyone overridden intersection in RangeI? Is the
> small risk of possible breakage compensated by the benefit of
> intersections of a list of ranges (which is surely useful in lots of
> situations, not just for PositionI)?
> 
> I'm tempted to go ahead with this unless there are objections.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed Jun 21 11:24:47 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:24:47 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000401c69545$a4a3ad30$15327e82@pyrimidine>
References: <000401c69545$a4a3ad30$15327e82@pyrimidine>
Message-ID: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>


On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:

> Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Actually, it doesn't go by age but by the amount of hair you still  
have. ;)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 11:28:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:28:58 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>
Message-ID: <000501c69547$6a9f28b0$15327e82@pyrimidine>

Then I'm really a senior guy...

; {

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:25 AM
> To: Chris Fields
> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> 
> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> 
> > Actually, sounds weird to have me say "senior guys"; I'm 35 years old!
> 
> Actually, it doesn't go by age but by the amount of hair you still
> have. ;)
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed Jun 21 11:53:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:53:08 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000501c69547$6a9f28b0$15327e82@pyrimidine>
References: <000501c69547$6a9f28b0$15327e82@pyrimidine>
Message-ID: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>

We could run a Mr Seniority competition at BOSC with the attendees  
judging who got the weirdest looking hair loss. You'd take the  
challenge? The judging panel would need to be gender-mixed though.

On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:

> Then I'm really a senior guy...
>
> ; {
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, June 21, 2006 10:25 AM
>> To: Chris Fields
>> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
>>
>>
>> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
>>
>>> Actually, sounds weird to have me say "senior guys"; I'm 35 years  
>>> old!
>>
>> Actually, it doesn't go by age but by the amount of hair you still
>> have. ;)
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 12:08:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 11:08:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>
Message-ID: <000301c6954c$e89c7a60$15327e82@pyrimidine>

I'd love to be at BOSC but I can't go (finishing up my postdoc this year,
which is probably the primary cause of my hair loss).  Would the judges
accept a recent picture?

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:53 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> We could run a Mr Seniority competition at BOSC with the attendees
> judging who got the weirdest looking hair loss. You'd take the
> challenge? The judging panel would need to be gender-mixed though.
> 
> On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:
> 
> > Then I'm really a senior guy...
> >
> > ; {
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Wednesday, June 21, 2006 10:25 AM
> >> To: Chris Fields
> >> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> >>
> >>
> >> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> >>
> >>> Actually, sounds weird to have me say "senior guys"; I'm 35 years
> >>> old!
> >>
> >> Actually, it doesn't go by age but by the amount of hair you still
> >> have. ;)
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 12:25:50 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 18:25:50 +0200
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <000301c69544$8d537710$15327e82@pyrimidine>
References: <000301c69544$8d537710$15327e82@pyrimidine>
Message-ID: <4499730E.8090800@biomax.com>

Chris Fields wrote:
> Speaking of Perl6, there was interest here at one point in getting a
> bioperl-experimental going, which at this point in the game should involve
> Perl6.  If there were enough interest in it we could probably get it set up
> via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
> experimental modules in some way to prevent confusion (bioperl6-live???),
> though I'm not up to speed Perl6-wise so I'm not sure about namespace
> collisions and so on.

As far as I understood it, the plan is to have a very smooth migration 
path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When 
new stuff is coming along, or when refactoring is done, you drop in

   use v6;

or

   use v6-pugs;

and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm 
or Audrey Tangs presentation at the Nordic Perl Workshop: 
http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
So I would argue against having a completely seperate Perl6 experimental
repository.

> bioperl-experimental would be, like the name implies, a sort of testing
> ground for ideas (good and bad).  It seemed like it was going to take off a
> few years ago but it lost steam, I'm guess.
> 
> As for your parsers, would you build them from the ground up (i.e. from
> Bio::Root::Root on up)?

I'm just a casual Bio::Perl user and never hacked on any internals. So I 
don't know whether the current Bio::Perl framework is a good fit.

The idea that is floating in my mind is to make a showcase of Perl 6 
parsing, by tackling the various sequences and alignment formats.
So this would involve shopping around for the cleanest parser 
implementations and porting that to Perl6.

Which repository to use is more a question of social engineering.
Are there more Pugs/Perl6 hackers interested in cool biological hacking,
or biologist aching to try out Perl6?

Regards,
   Bernhard Schmalhofer

-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************


From cjfields at uiuc.edu  Wed Jun 21 14:01:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 13:01:02 -0500
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <4499730E.8090800@biomax.com>
Message-ID: <000b01c6955c$ad0e6750$15327e82@pyrimidine>

> Chris Fields wrote:
> > Speaking of Perl6, there was interest here at one point in getting a
> > bioperl-experimental going, which at this point in the game should
> involve
> > Perl6.  If there were enough interest in it we could probably get it set
> up
> > via CVS and moving along.  We might need to split the Perl6 stuff from
> Perl5
> > experimental modules in some way to prevent confusion (bioperl6-
> live???),
> > though I'm not up to speed Perl6-wise so I'm not sure about namespace
> > collisions and so on.
> 
> As far as I understood it, the plan is to have a very smooth migration
> path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When
> new stuff is coming along, or when refactoring is done, you drop in
> 
>    use v6;
> 
> or
> 
>    use v6-pugs;
> 
> and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm
> or Audrey Tangs presentation at the Nordic Perl Workshop:
> http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
> So I would argue against having a completely seperate Perl6 experimental
> repository.

Makes sense.  I know Pugs is the Perl6 implementation in Haskell but I also
know eventually Parrot will be taking over as the compiler (hopefully).
Perl6 is pretty exciting since it's built to support OOP from the ground up,
unlike the bolted-on OOP for Perl5, and has several other features that make
it very useful (the new way regexes are handled).  I just haven't had time
to play around with it seriously enough.  I may try using Pugs a bit more,
though.

So, as long as Perl5-Perl6 work together a separate repository wouldn't be
necessary.  

> > bioperl-experimental would be, like the name implies, a sort of testing
> > ground for ideas (good and bad).  It seemed like it was going to take
> off a
> > few years ago but it lost steam, I'm guess.
> >
> > As for your parsers, would you build them from the ground up (i.e. from
> > Bio::Root::Root on up)?
>
> I'm just a casual Bio::Perl user and never hacked on any internals. So I
> don't know whether the current Bio::Perl framework is a good fit.
> 
> The idea that is floating in my mind is to make a showcase of Perl 6
> parsing, by tackling the various sequences and alignment formats.
> So this would involve shopping around for the cleanest parser
> implementations and porting that to Perl6.
> 
> Which repository to use is more a question of social engineering.
> Are there more Pugs/Perl6 hackers interested in cool biological hacking,
> or biologist aching to try out Perl6?

I suppose the best way is initially to use a non-bioperl approach using
Perl6, then try working the parsers in using 'use v6-pugs;'.  Bioperl is
heavily object-oriented so the code would probably need to be refactored
from the bottom up (or top down, depending on your view) to fit Perl6.
Having a perl5->perl6 translator helps, though.  And, again, having Perl5
and Perl6 work together helps as well.

Chris

> Regards,
>    Bernhard Schmalhofer
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************


From dwaner at scitegic.com  Wed Jun 21 14:14:00 2006
From: dwaner at scitegic.com (dwaner at scitegic.com)
Date: Wed, 21 Jun 2006 11:14:00 -0700
Subject: [Bioperl-l] EMBL release 87 format changes.
Message-ID: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>

With release 87 of EMBL (June 19th, 2006), there have been some minor 
changes to the flat file record format. In particular, the SV (sequence 
version) tag has been moved from its own line to a field in the ID line. 
See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.

Is somone already working on updating the SeqIO::embl parser, or should I 
volunteer?

- David


From bix at sendu.me.uk  Wed Jun 21 14:23:28 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 19:23:28 +0100
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <002301c694e5$e5f3a750$15327e82@pyrimidine>
References: <002301c694e5$e5f3a750$15327e82@pyrimidine>
Message-ID: <44998EA0.1010406@sendu.me.uk>

Chris Fields wrote:
> I'm working on a new eutilities interface which I hope to commit by late
> summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> generic web database interface, which I call Bio::DB::WebDBI, and the
> EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
> NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
> info like Bio::DB::WebDBSeqI.  
> 
> My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
> Does anyone think this will be an issue?

Well, I don't. Sounds good to me. What's the intended relationship 
between WebDBI and EUtilitiesI? Would your work end up in the removal of 
direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just 
convert the code that gets the XML to a one line statement or so?


From cjfields at uiuc.edu  Wed Jun 21 15:00:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 14:00:02 -0500
Subject: [Bioperl-l] EMBL release 87 format changes.
In-Reply-To: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>
Message-ID: <000c01c69564$e68b39b0$15327e82@pyrimidine>

That would be great!  Post a patch/fix via bugzilla:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

and we can add it and test it out.  Or if you have CVS access you can do it
yourself.  Not sure who's taking care of SeqIO::embl at the moment....

Added bit : you'll need to update both next_seq and write_seq.  next_seq
should probably handle both old and new EMBL format and write_seq should
only write new format (unless someone else disagrees???)

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of dwaner at scitegic.com
> Sent: Wednesday, June 21, 2006 1:14 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] EMBL release 87 format changes.
> 
> With release 87 of EMBL (June 19th, 2006), there have been some minor
> changes to the flat file record format. In particular, the SV (sequence
> version) tag has been moved from its own line to a field in the ID line.
> See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.
> 
> Is somone already working on updating the SeqIO::embl parser, or should I
> volunteer?
> 
> - David
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 17:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 16:16:38 -0500
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <44998EA0.1010406@sendu.me.uk>
Message-ID: <001b01c69577$fc7068f0$15327e82@pyrimidine>

> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Wednesday, June 21, 2006 1:23 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] EUtilities interface
> 
> Chris Fields wrote:
> > I'm working on a new eutilities interface which I hope to commit by late
> > summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> > generic web database interface, which I call Bio::DB::WebDBI, and the
> > EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can
> query
> > NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> > pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-
> only
> > info like Bio::DB::WebDBSeqI.
> >
> > My only concern is confusion over names, particularly WebDBI vs.
> WebDBSeqI.
> > Does anyone think this will be an issue?
> 
> Well, I don't. Sounds good to me. What's the intended relationship
> between WebDBI and EUtilitiesI? Would your work end up in the removal of
> direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just
> convert the code that gets the XML to a one line statement or so?

Well, right now all it does is use URI to build queries, submit them to
Entrez Utilities, then grab the response; I've been hacking at it on and off
for a few months now.  It needs some error handling and added methods
(mainly for proxies and handling WebEnv/query_key), though once I have it in
decent enough shape I'll go ahead and add it to CVS.  

Theoretically once the response is returned it can be parsed like any stream
(see WebDBSeqI/NCBIHelper for an idea of how sequences are parsed and
returned using SeqIO).  This should work as long as there is an appropriate
class to handle the data stream and the appropriate 'plugin' to parse the
data into objects; i.e. dbSNP can be handled by ClusterIO::dbSNP, sequences
by SeqIO::genbank/fasta, pubmed by Bio::Biblio::IO::pubmedxml, and so on.
If you don't have an object or want the raw data stream, you could submit a
request using the various eutility (efetch, epost, esearch) and save as raw
format to an output file or STDOUT.  

Here's a rough diagram:

                      |------------------->Bio::DB::DBFetch (EBI
interface)----->plugins for Bio* classes
Bio::Root::Root       |
LWP::UserAgent ------Bio::DB::WebDBI------>Bio::DB::EUtilitiesI (NCBI
interface)----->plugins for Bio* classes
                      |
                      |------------------->others?

You probably don't need a Bio::*IO::plugin for each type; tax data in
Bioperl seems to primarily utilizes the NCBI Tax database, so
Bio::DB::Taxonomy::entrez shouldn't be too hard to adapt to act as a plugin.
Bio::DB::Taxonomy::entrez uses XML::Twig to parse everything into
Bio::Taxonomy::Node objects and is able to retrieve single and multiple ID's
using the same method, though I would probably use XML::SAX instead.  If I
remember correctly there were issues with Bio::DB::Taxonomy that you brought
up...

Chris


From bix at sendu.me.uk  Thu Jun 22 09:28:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Jun 2006 14:28:25 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <44985915.8010607@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk>
Message-ID: <449A9AF9.2000305@sendu.me.uk>

Sendu Bala wrote:
> Some initial changes have been made to some modules in Bio::Map to allow 
> Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
> (see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)
> 
> Further changes are needed in some remaining Bio::Map modules for this 
> addition to be complete

Range is now done.

The next step is to tidy up all of Bio::Map*, which involves a major 
reimplementation of the whole system (but with no significant API 
change). Basically, the current system is a awkward mix of older 'marker 
has a single position on a map' and new 'markers have multiple positions 
on multiple maps'. This gives us strange things like SimpleMap's 
add_element method which adds a reference to the element to the map 
without the element itself knowing it is now on the map (because it is 
Position that defines what maps an element is on).

The reimplementation will make Position central to the model, allowing 
for lots of other things to work properly without anything becoming 
inconsistent (as is currently the case).

The general tidy up will involve redoing and perhaps even removing 
things. For instance, OrderedPositionWithDistance has never worked so 
will be deleted (with OrderedPosition gaining the distance functionality 
its docs says it already has).

But now is the time to speak up and change my mind if necessary!


From golharam at umdnj.edu  Thu Jun 22 17:05:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 22 Jun 2006 17:05:00 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package)
Message-ID: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>

Hi all,

I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
baseml in the PAML package to measure the distances of some non-coding
regions.  

I started with the coding regions, and used the script
bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
something similar for non-coding regions.  However, when I call
Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
meaning matrix was never defined.  

I wanted to find out if anyone on here has done this before or knows a
way to measure substitution frequencies of non-coding regions with the
PAML package.  The documentation with PAML is sparse so I'm not sure how
to interpret its output directly - that's why I'm using Bioperl.  

Hopefully someone can help me before I start digging into the
code...Thanks.

Ryan


From n.haigh at sheffield.ac.uk  Fri Jun 23 02:43:48 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 23 Jun 2006 07:43:48 +0100
Subject: [Bioperl-l] CVS Export
Message-ID: <000001c69690$61afb540$b07f6f58@nathan243dd61f>

I may have asked this previously, but I can?t find the answer to my question
anywhere so I?ll have to ask it again ? sorry.

Is it possible to export files/directories from cvs that have changed
between to tags/branches/head? Specifically, I?d like to export (as I don?t
want the cvs administrative directories) files that have been added to
Bioperl since the 1.4 release.

Cheers
Nath

----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh MPharmacol. Ph.D.
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????????????????????????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 
----------------------------------------------------------------------------
------


From cjfields at uiuc.edu  Fri Jun 23 10:58:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Jun 2006 09:58:24 -0500
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c696d5$7da6c640$15327e82@pyrimidine>

***sounds of crickets***

Ryan,

It's a pretty good possibility that Jason and the rest are on the road to
conferences and such.  There's been mention of a Durham, NC meeting and, of
course, YAPC is happening soon as well.  I wish I could help but I know
diddly about PAML besides the HOWTO on the wiki (though I may be using it
myself soon).  Sorry, you may have to be a bit patient for a more productive
response.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Thursday, June 22, 2006 4:05 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
> package)
> 
> Hi all,
> 
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
> 
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
> 
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not sure how
> to interpret its output directly - that's why I'm using Bioperl.
> 
> Hopefully someone can help me before I start digging into the
> code...Thanks.
> 
> Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Fri Jun 23 14:27:19 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Jun 2006 13:27:19 -0500
Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output
Message-ID: <CED81D34E37D5043A1211565277A51E50563FC85@exchkc02.stowers-institute.org>

Guy,

I've just downloaded and installed your latest 1.1.0 version of
exonerate but unfortunately did not find any mention in the ChangeLog of
addressing this bug, though I still see in the TODO:

    o Should GFF show all coordinates on the +ve strand? (jason_p2g eg)

I was half expecting to see this fixed in this version based on this old
thread.  

Can you please confirm that it has not yet been addressed, and accept my
request that you continue to keep this change on your list for future
versions...

Also, might you elaborate on this entry from the ChangeLog.  I don't see
it mentioned in the manpage.

    o Added %tcs etc to --ryo for dumping coding sequences 

Thanks,

Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at portal.open-bio.org 
>[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Guy Slater
>Sent: Friday, September 02, 2005 11:52 AM
>To: Cook, Malcolm
>Cc: bioperl-l
>Subject: RE: [Bioperl-l] methods, etc. for Bio::SearchIO on 
>exonerate output
>
>On Fri, 2 Sep 2005, Cook, Malcolm wrote:
>
>> Hmmmm - I'd better get some clarification from Guy too.  
>>  
>> Guy, if you don't mind reading the thread below and chiming in on our
>> discussion of interpreting the output of your excellent exonerate
>> program:
>>  
>> The sections of the manpage (
>> <http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html>
>> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear
>> relevant are these 2 excerpts:
>>  
>>  1) When an alignment is reported on the reverse complement of a
>> sequence, the coordinates are simply given on the reverse complement
>> copy of the sequence. Hence positions on the sequences are never
>> negative. Generally, the forward strand is indicated by '+', 
>the reverse
>> strand by '-', and an unknown or not-applicable strand (as 
>in the case
>> of a protein sequence) is indicated by '.' "
>>  
>> 2)  --forwardcoordinates <boolean> By default, all coordinates are
>> reported on the forward strand. Setting this option to false 
>reverts to
>> the old behaviour (pre-0.8.3) whereby alignments on the reverse
>> complement of a sequence are reported using coordinates on 
>the reverse
>> complement. 
>>  
>> We see GFF DUMP coordinates still reported on the reverse stand
>> regardless of the setting of --forwardcoordinates.  So these two
>> excerpts from you manpage seem contradictory to me.     Unless I
>> understand `--forwardcoordinates FALSE` to only effect the 
>coordinates
>> reported in the alignment section, not in the GFF DUMP 
>section, which is
>> what it appears to do in practice.
>>  
>> Guy, can you confirm that the --forwardcoordinates option 
>has no effect
>> on GFF output?
>>  
>
>Hi,
>
>Yes, it has no effect, and this is a bug
>(sorry - it was due to my misinterpretation of the GFF2 spec)
>- its on the list of things to be fixed for exonerate 1.1 (soon)
>
>> Further, can you tell us if you plan to comport more closely 
>to the GFF
>> spec, in particular in this case by reporting 
>forwardcoordinates in the
>> GFF DUMP section too?   I see 
>> I see in your TODO list "    o Should GFF show all coordinates on the
>> +ve strand? (jason_p2g eg)".  Hear hear!  I second the motion.
>>  
>> And TODO item " GFF3 support ? http://song.sf.net/" gets my 
>vote too....
>> though this is more of a sticky wicket....
>>  
>
>Yup, GFF3 support is on the list,
>but probably it will not be done in time for exonerate 1.1
>Of course, I'd welcome a patch ...    ;)
>
>(I'm mainly working on getting the cdna2genome
> and genome2genome models working properly for 1.1)
>
>Cheers,
>
>Guy.
>
>> Cheers and Thanks!
>>  
>> Malcolm Cook
>>  
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Friday, September 02, 2005 9:46 AM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> I've already talked to Guy about some of this and I assume 
>fixes will be
>> part of the next release, but it can't hurt to have more people
>> requesting.  The main problem right now is reverse strand hits in GFF
>> output are still screwed up even if you provide the 
>--forwardcoordinates
>> option. 
>> 
>> If someone wanted to write/donate a VULGAR to GFF subroutine (okay
>> VULGAR to a list of Bio::Search::HSP::GenericHSP).  We can also
>> reconstruct everything needed from that, I gave a stab at it 
>once, but
>> there was something missing (or maybe it was pre --forwardcoordinates
>> option).   
>> 
>> 
>> -jason 
>> 
>> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:
>> 
>> 
>> Jason,
>>  
>> Thanks for the scripts and clues (esp re: using the --ryo option to
>> inject the needed length into the exonerate output to compensate).
>>  
>> I'm considering asking exonerate author to comport with GFF spec.  Do
>> you think this is a road to take?
>>  
>> Cheers,
>>  
>> Malcolm
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Wednesday, August 31, 2005 12:35 PM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> 
>http://fungal.genome.duke.edu/~jes12/software/scripts/process_e
>xonerate_
>> gff3.pl
>> 
>> You may still want to massage it some, but I use the script in this
>> basic form, maybe with a few tweaks:
>> 
>> Note that it requires you to run exonerate with specific 
>--ryo options
>> so that it includes the length of the query and hit sequences in the
>> report output. should be covered in the perldoc in the script.
>> 
>> Without the ryo options enabled,  you'll need to modify the 
>script more
>> to have access to the original sequence db, use 
>Bio::DB::Fasta,  and put
>> in some $dbh->length($seqid) calls instead.
>> 
>> I don't think the part which writes HSP/match lines is 
>actually correct
>> - it is trying to roll gapped HSPs from the similarity features. 
>> 
>> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse
>> instance and/or grepping out the lines I really think I need.  
>> You may want to s/exon/CDS/ for the protein2genome output as well.
>> 
>> -jason
>> 
>> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
>> 
>> 
>> Jason, 
>> 
>> This message is in regards to an old thread  in which you offered to
>> shared a 'script for munging over' exonerate output for lading in
>> DB::GFF (c.f.
>> <http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html>
>> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)
>> 
>> Would you be willing to still share that script, if you've got it
>> around? 
>> 
>> Thanks, and regards, 
>> 
>> Malcolm Cook -  <mailto:mec at stowers-institute.org>
>> mec at stowers-institute.org - 816-926-4449
>> Database Applications Manager - Bioinformatics
>> Stowers Institute for Medical Research - Kansas City, MO  USA
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>
>-- 
>%!PS % <------ Guy St.C. Slater ------> 
>http://www.ebi.ac.uk/~guy/  <------
>210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 
>12/d{exch moveto}
>a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 
>f}a/h{setlinewidth newpath dup
>g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 
>0 108 arc d e
>18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From oldham at ucla.edu  Fri Jun 23 12:18:39 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 23 Jun 2006 09:18:39 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>

Hello again,

I finally got it to work, using the following script.  However, it takes
about 5 hours to run on a fast computer.  Using grep (in bash), on the
other hand, takes about 5 minutes (see below if you are interested).
Thanks to everyone for your help!

SLOW perl script:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_all_X';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
print @ID;
chomp @ID;

while (my $line = <PROBES>) {
	foreach my $identifier (@ID) {
		if($line=~/^>probe:\w+:$identifier:/) {
				print OUT $line;
				print OUT scalar(<PROBES>);
		}
	}
}
exit;


FAST bash script:

#!/usr/bin/bash
exec<"ID_all_X"
while read line; do
	echo $line
	grep -A 1 :$line: HG_U95Av2_probe_fasta >>myresults.txt
done


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Wednesday, June 14, 2006 6:48 AM
To: Michael Oldham; Chris Fields
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT;

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu]
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086;
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date:
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date:
>6/13/2006
>
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.9.2/373 - Release Date: 6/22/2006


From pmiguel at purdue.edu  Sat Jun 24 10:17:46 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 10:17:46 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <449D498A.9020107@purdue.edu>

Brian Osborne wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>   
I would be very disappointed to lose one part of bptutorial.pl--this was 
described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
only purpose I've ever used bptutorial.pl for--to find all the methods 
available to any given object. Eg:

bptutorial.pl 100 Bio::PrimarySeq

 ***Methods for Object Bio::PrimarySeq ********


 Methods taken from package Bio::IdentifiableI
 lsid_string   namespace_string

 Methods taken from package Bio::PrimarySeq
 accession   accession_number   alphabet   authority   can_call_new   desc
 description   direct_seq_set   display_id   display_name   id   is_circular
 length   namespace   new   object_id   primary_id   seq
 subseq   validate_seq   version

 Methods taken from package Bio::PrimarySeqI
 moltype   revcom   translate   trunc

 Methods taken from package Bio::Root::Root
 DESTROY   confess   debug   throw   verbose

 Methods taken from package Bio::Root::RootI
 carp   deprecated   stack_trace   stack_trace_dump   
throw_not_implemented   warn
 warn_not_implemented


Phillip SanMiguel


From sdavis2 at mail.nih.gov  Sat Jun 24 10:45:52 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 24 Jun 2006 10:45:52 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a singlelargefile
References: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>
Message-ID: <001a01c6979c$ff576dd0$6501a8c0@WATSON>


----- Original Message ----- 
From: "Michael Oldham" <oldham at ucla.edu>
To: "Cook, Malcolm" <MEC at stowers-institute.org>; "Chris Fields" 
<cjfields at uiuc.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Friday, June 23, 2006 12:18 PM
Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
singlelargefile


> Hello again,
>
> I finally got it to work, using the following script.  However, it takes
> about 5 hours to run on a fast computer.  Using grep (in bash), on the
> other hand, takes about 5 minutes (see below if you are interested).
> Thanks to everyone for your help!
>
> SLOW perl script:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID_all_X';
>
> unless (open(IDFILE, $IDs)) {
> print "Could not open file $IDs!\n";
> }
>
> my $probes = 'HG_U95Av2_probe_fasta';
>
> unless (open(PROBES, $probes)) {
> print "Could not open file $probes!\n";
> }
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> print @ID;
> chomp @ID;
>
> while (my $line = <PROBES>) {
> foreach my $identifier (@ID) {
> if($line=~/^>probe:\w+:$identifier:/) {
> print OUT $line;
> print OUT scalar(<PROBES>);
> }
> }
> }

This could probably be done MUCH faster using a hash on the sequence 
identifier.  (I have to admit that I didn't follow the first part of this 
conversation, so I could be misunderstanding some part of what you are 
trying to do.)  If you have a couple hundred-thousand sequences, my guess is 
that it could be done in under 30 seconds, but I could be wrong about the 
exact time.  The important part is to make a hash of your sequences with the 
key being the $identifier.  Then, loop through your @ID array doing 
something like (untested):

#open files as before and read in @ID as before

my %seq_hash;

while (my $line = <PROBES>) {
    if ($line =~/^>probe:\w+:$identifier:/) {
        $seq_hash{$identifier}=<PROBES>;
    }
}

foreach my $id (@ID) {
    print OUT ">$id\n" . $seq_hash{$id};
}


From arareko at campus.iztacala.unam.mx  Sat Jun 24 11:27:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 10:27:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D498A.9020107@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>
	<449D498A.9020107@purdue.edu>
Message-ID: <449D59C7.4030008@campus.iztacala.unam.mx>

Hi Philip,

Have you tried the Deobfuscator interface? It's a newer and better way 
to browse all the methods available in BioPerl:

http://bioperl.org/wiki/Deobfuscator
http://bioperl.org/cgi-bin/deob_interface.cgi

Regards,
Mauricio.

Phillip SanMiguel wrote:
> Brian Osborne wrote:
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
>>   
> I would be very disappointed to lose one part of bptutorial.pl--this was 
> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
> only purpose I've ever used bptutorial.pl for--to find all the methods 
> available to any given object. Eg:
> 
> bptutorial.pl 100 Bio::PrimarySeq
> 
>  ***Methods for Object Bio::PrimarySeq ********
> 
> 
>  Methods taken from package Bio::IdentifiableI
>  lsid_string   namespace_string
> 
>  Methods taken from package Bio::PrimarySeq
>  accession   accession_number   alphabet   authority   can_call_new   desc
>  description   direct_seq_set   display_id   display_name   id   is_circular
>  length   namespace   new   object_id   primary_id   seq
>  subseq   validate_seq   version
> 
>  Methods taken from package Bio::PrimarySeqI
>  moltype   revcom   translate   trunc
> 
>  Methods taken from package Bio::Root::Root
>  DESTROY   confess   debug   throw   verbose
> 
>  Methods taken from package Bio::Root::RootI
>  carp   deprecated   stack_trace   stack_trace_dump   
> throw_not_implemented   warn
>  warn_not_implemented
> 
> 
> Phillip SanMiguel
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From golharam at umdnj.edu  Sat Jun 24 10:43:29 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 10:43:29 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>
Message-ID: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>

I've managed to code three methods to calculate K into a perl script
using the algorithms as described in "Molecular Evolution" by Wen-Hsuing
Li.   I'd be happy to contribute it as a script...


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:40 AM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from 
> baseml in the PAML package to measure the distances of some non-coding

> regions.
>
> I started with the coding regions, and used the script 
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do 
> something similar for non-coding regions.  However, when I call 
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' 
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a

> way to measure substitution frequencies of non-coding regions with the

> PAML package.  The documentation with PAML is sparse so I'm not
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the 
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 12:59:21 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 12:59:21 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D59C7.4030008@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>
	<449D59C7.4030008@campus.iztacala.unam.mx>
Message-ID: <449D6F69.1090104@purdue.edu>

Yes I have. It is very useful.
But in situations where I don't have web access? Or I am working with 
Bioperl 1.5?

Mauricio Herrera Cuadra wrote:
> Hi Philip,
>
> Have you tried the Deobfuscator interface? It's a newer and better way 
> to browse all the methods available in BioPerl:
>
> http://bioperl.org/wiki/Deobfuscator
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Regards,
> Mauricio.
>
> Phillip SanMiguel wrote:
>   
>> Brian Osborne wrote:
>>     
>>> Jay,
>>>
>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>
>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>> don't want to have to maintain two bptutorials.
>>>   
>>>       
>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>> available to any given object. Eg:
>>
>> bptutorial.pl 100 Bio::PrimarySeq
>>
>>  ***Methods for Object Bio::PrimarySeq ********
>>
>>
>>  Methods taken from package Bio::IdentifiableI
>>  lsid_string   namespace_string
>>
>>  Methods taken from package Bio::PrimarySeq
>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>  length   namespace   new   object_id   primary_id   seq
>>  subseq   validate_seq   version
>>
>>  Methods taken from package Bio::PrimarySeqI
>>  moltype   revcom   translate   trunc
>>
>>  Methods taken from package Bio::Root::Root
>>  DESTROY   confess   debug   throw   verbose
>>
>>  Methods taken from package Bio::Root::RootI
>>  carp   deprecated   stack_trace   stack_trace_dump   
>> throw_not_implemented   warn
>>  warn_not_implemented
>>
>>
>> Phillip SanMiguel
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   


From arareko at campus.iztacala.unam.mx  Sat Jun 24 13:35:54 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 12:35:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D6F69.1090104@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
Message-ID: <449D77FA.70103@campus.iztacala.unam.mx>

Currently I'm modifying the Deobfuscator so it'd be capable of browsing 
the different BioPerl packages as well as their respective releases, but 
haven't got many spare time to finish it :(

Dave and I committed the Deobfuscator into the bioperl-live source tree 
(in /doc directory), so it'd be included in future releases of BioPerl. 
I'm also working on a command line version which won't need a CGI 
environment to have the same functionality, this would address the web 
access situation that you mention.

Phillip SanMiguel wrote:
> Yes I have. It is very useful.
> But in situations where I don't have web access? Or I am working with 
> Bioperl 1.5?
> 
> Mauricio Herrera Cuadra wrote:
>> Hi Philip,
>>
>> Have you tried the Deobfuscator interface? It's a newer and better way 
>> to browse all the methods available in BioPerl:
>>
>> http://bioperl.org/wiki/Deobfuscator
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> Regards,
>> Mauricio.
>>
>> Phillip SanMiguel wrote:
>>   
>>> Brian Osborne wrote:
>>>     
>>>> Jay,
>>>>
>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>
>>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>>> don't want to have to maintain two bptutorials.
>>>>   
>>>>       
>>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>>> available to any given object. Eg:
>>>
>>> bptutorial.pl 100 Bio::PrimarySeq
>>>
>>>  ***Methods for Object Bio::PrimarySeq ********
>>>
>>>
>>>  Methods taken from package Bio::IdentifiableI
>>>  lsid_string   namespace_string
>>>
>>>  Methods taken from package Bio::PrimarySeq
>>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>>  length   namespace   new   object_id   primary_id   seq
>>>  subseq   validate_seq   version
>>>
>>>  Methods taken from package Bio::PrimarySeqI
>>>  moltype   revcom   translate   trunc
>>>
>>>  Methods taken from package Bio::Root::Root
>>>  DESTROY   confess   debug   throw   verbose
>>>
>>>  Methods taken from package Bio::Root::RootI
>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>> throw_not_implemented   warn
>>>  warn_not_implemented
>>>
>>>
>>> Phillip SanMiguel
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>     
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Sat Jun 24 09:39:56 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 09:39:56 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
References: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>

baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
>
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not  
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 13:48:15 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 13:48:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D77FA.70103@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
Message-ID: <449D7ADF.3030604@purdue.edu>


Yes, that would be better than bptutorial.pl 100 then. For some modules 
bptutorial.pl 100 doesn't seem to give any of the methods they have 
access to. Whereas the deobfuscator does.

Mauricio Herrera Cuadra wrote:
> Currently I'm modifying the Deobfuscator so it'd be capable of 
> browsing the different BioPerl packages as well as their respective 
> releases, but haven't got many spare time to finish it :(
>
> Dave and I committed the Deobfuscator into the bioperl-live source 
> tree (in /doc directory), so it'd be included in future releases of 
> BioPerl. I'm also working on a command line version which won't need a 
> CGI environment to have the same functionality, this would address the 
> web access situation that you mention.
>
> Phillip SanMiguel wrote:
>> Yes I have. It is very useful.
>> But in situations where I don't have web access? Or I am working with 
>> Bioperl 1.5?
>>
>> Mauricio Herrera Cuadra wrote:
>>> Hi Philip,
>>>
>>> Have you tried the Deobfuscator interface? It's a newer and better 
>>> way to browse all the methods available in BioPerl:
>>>
>>> http://bioperl.org/wiki/Deobfuscator
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Phillip SanMiguel wrote:
>>>  
>>>> Brian Osborne wrote:
>>>>    
>>>>> Jay,
>>>>>
>>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>>
>>>>> - Do we remove the file bptutorial.pl from the package now? I'd 
>>>>> say yes, we
>>>>> don't want to have to maintain two bptutorials.
>>>>>         
>>>> I would be very disappointed to lose one part of 
>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for 
>>>> Bioinformatics_. It is the only purpose I've ever used 
>>>> bptutorial.pl for--to find all the methods available to any given 
>>>> object. Eg:
>>>>
>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>
>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>
>>>>
>>>>  Methods taken from package Bio::IdentifiableI
>>>>  lsid_string   namespace_string
>>>>
>>>>  Methods taken from package Bio::PrimarySeq
>>>>  accession   accession_number   alphabet   authority   
>>>> can_call_new   desc
>>>>  description   direct_seq_set   display_id   display_name   id   
>>>> is_circular
>>>>  length   namespace   new   object_id   primary_id   seq
>>>>  subseq   validate_seq   version
>>>>
>>>>  Methods taken from package Bio::PrimarySeqI
>>>>  moltype   revcom   translate   trunc
>>>>
>>>>  Methods taken from package Bio::Root::Root
>>>>  DESTROY   confess   debug   throw   verbose
>>>>
>>>>  Methods taken from package Bio::Root::RootI
>>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>>> throw_not_implemented   warn
>>>>  warn_not_implemented
>>>>
>>>>
>>>> Phillip SanMiguel
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>   
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From jason at bioperl.org  Sat Jun 24 14:42:57 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 14:42:57 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
References: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>

You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun 24 15:07:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 14:07:06 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D7ADF.3030604@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
	<449D7ADF.3030604@purdue.edu>
Message-ID: <EF5998FD-BA4F-439C-873E-71E55DBA0F4D@uiuc.edu>

As a quickie method I use the script from the FAQ; you have to  
install Class::Inspector:

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector->methods 
($class,'full','public')}), "\n";

Works well, though doesn't have the links and so on like  
Deobfuscator; I use HTML-generated ActiveState docs:

glaciers-115 chris$ methods.pl Bio::SeqIO
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::debug
Bio::Root::Root::except
Bio::Root::Root::finally
Bio::Root::Root::otherwise
Bio::Root::Root::throw
Bio::Root::Root::try
Bio::Root::Root::verbose
Bio::Root::Root::with
Bio::Root::RootI::carp
Bio::Root::RootI::confess
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::SeqIO::DESTROY
Bio::SeqIO::PRINT
Bio::SeqIO::READLINE
Bio::SeqIO::TIEHANDLE
Bio::SeqIO::alphabet
Bio::SeqIO::fh
Bio::SeqIO::location_factory
Bio::SeqIO::new
Bio::SeqIO::newFh
Bio::SeqIO::next_seq
Bio::SeqIO::object_factory
Bio::SeqIO::sequence_builder
Bio::SeqIO::sequence_factory
Bio::SeqIO::write_seq


Chris

On Jun 24, 2006, at 12:48 PM, Phillip SanMiguel wrote:

>
> Yes, that would be better than bptutorial.pl 100 then. For some  
> modules
> bptutorial.pl 100 doesn't seem to give any of the methods they have
> access to. Whereas the deobfuscator does.
>
> Mauricio Herrera Cuadra wrote:
>> Currently I'm modifying the Deobfuscator so it'd be capable of
>> browsing the different BioPerl packages as well as their respective
>> releases, but haven't got many spare time to finish it :(
>>
>> Dave and I committed the Deobfuscator into the bioperl-live source
>> tree (in /doc directory), so it'd be included in future releases of
>> BioPerl. I'm also working on a command line version which won't  
>> need a
>> CGI environment to have the same functionality, this would address  
>> the
>> web access situation that you mention.
>>
>> Phillip SanMiguel wrote:
>>> Yes I have. It is very useful.
>>> But in situations where I don't have web access? Or I am working  
>>> with
>>> Bioperl 1.5?
>>>
>>> Mauricio Herrera Cuadra wrote:
>>>> Hi Philip,
>>>>
>>>> Have you tried the Deobfuscator interface? It's a newer and better
>>>> way to browse all the methods available in BioPerl:
>>>>
>>>> http://bioperl.org/wiki/Deobfuscator
>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>
>>>> Regards,
>>>> Mauricio.
>>>>
>>>> Phillip SanMiguel wrote:
>>>>
>>>>> Brian Osborne wrote:
>>>>>
>>>>>> Jay,
>>>>>>
>>>>>> Excellent! Now we need to answer a few more questions for  
>>>>>> ourselves:
>>>>>>
>>>>>> - Do we remove the file bptutorial.pl from the package now? I'd
>>>>>> say yes, we
>>>>>> don't want to have to maintain two bptutorials.
>>>>>>
>>>>> I would be very disappointed to lose one part of
>>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for
>>>>> Bioinformatics_. It is the only purpose I've ever used
>>>>> bptutorial.pl for--to find all the methods available to any given
>>>>> object. Eg:
>>>>>
>>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>>
>>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>>
>>>>>
>>>>>  Methods taken from package Bio::IdentifiableI
>>>>>  lsid_string   namespace_string
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeq
>>>>>  accession   accession_number   alphabet   authority
>>>>> can_call_new   desc
>>>>>  description   direct_seq_set   display_id   display_name   id
>>>>> is_circular
>>>>>  length   namespace   new   object_id   primary_id   seq
>>>>>  subseq   validate_seq   version
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeqI
>>>>>  moltype   revcom   translate   trunc
>>>>>
>>>>>  Methods taken from package Bio::Root::Root
>>>>>  DESTROY   confess   debug   throw   verbose
>>>>>
>>>>>  Methods taken from package Bio::Root::RootI
>>>>>  carp   deprecated   stack_trace   stack_trace_dump
>>>>> throw_not_implemented   warn
>>>>>  warn_not_implemented
>>>>>
>>>>>
>>>>> Phillip SanMiguel
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pmiguel at purdue.edu  Sat Jun 24 15:37:08 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 15:37:08 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <449D9464.6030508@purdue.edu>

Here is an example bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1682

It was a bug fixed in a module in BioPerl 1.4  back in October of 2004. 
The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the 
module. However the version of the module currently available from CPAN 
is 1.6. (That is the current "stable" release, BioPerl 1.4.0)

I've written a script that relies on that bug being fixed. How should I 
deal with this when I want to give the script to others to use? Just 
tell them "You must have BioPerl 1.5 installed". Give them instructions 
for patching the module code?

How long before the next "stable" release? Maybe a year? Should not a 
BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or 
would that be very difficult?

By the way, I think the revision graph viewer is great for someone, at 
best, peripherally involved in BioPerl to figure out which module 
version is associated with which BioPerl version, for example:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/QualI.pm?graph=1
Phillip SanMiguel


From golharam at umdnj.edu  Sat Jun 24 14:57:52 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 14:57:52 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>
Message-ID: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>

Hi Jason,

It looks like DNAStatistics is only for coding sequences.  I'm trying to
calculate the Ks of exons and the K (or Ki) of introns.  All the methods
in bioperl are based on coding sequences.  Only the  PAUP package (that
I've found) does non-coding sequences.   I would have used it but you
need to pay for it and we don't have the funding to purchase much at the
moment.

I brielfy looked at PHYLIP and EMBOSS but it didn't look as
straight-forward as I was hoping it would be.  Either that, or I was
getting fustrated looking for a simple solution.  

In the end, I found a molecular evolution book that talks about several
methods used for non-coding sequences so I went ahead and implemented
them.  They seem to work well.  

Ryan


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 2:43 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script 
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Sat Jun 24 18:37:15 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 18:37:15 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
Message-ID: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>

I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
alignments and parsing the resulting alignments.

The ClustalW output is being sent to STDOUT.  Is there a way I can
redirect the output to STDERR instead?

Here's how I'm using it:

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

(Forgive me if it in the docs - I've been coding for a week straight now
including saturday)

Thanks, Ryan


From cjfields at uiuc.edu  Sat Jun 24 20:16:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 19:16:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>

On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

A BioPerl module version is not the same as the distribution  
version.  All the modules have different version numbers  
corresponding to CVS commits for various code changes.  If you want  
to see the version for the distribution, read this:

http://www.bioperl.org/wiki/ 
FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

Many 'bug fixes', you'll find, have less to do with problems/bugs in  
BioPerl code than they do with outside code changes beyond our  
control.  By that I mean changes to other programs modify output so  
parsers break (BLAST, PAML, etc), or changes to API for remote  
databases that break queries (recent changes in EBI database  
concerning Swissprot, for example).  So, the code is considered  
'stable' at the time of release, but past that point issues beyond  
our control may break certain modules parsing output, accessing  
remote databases, and so on, at any time. This link:

http://www.bioperl.org/wiki/FAQ#BioPerl_in_General

should answer a few more questions you may have.  The FAQ is very  
helpful...

In general, if there are problems with code you could look at the  
latest developer's release (1.5.1, released in Oct 2005) to see if  
any bugs have been fixed.  They may be fixed post-1.5.1 and will be  
in CVS; you can always suggest using 1.5.1 (it's pretty stable) and  
updating only the fixed modules from CVS if needed.

> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

No, it's not that easy.  BioPerl isn't like most CPAN modules with  
one or two developers.  See the wiki page for details on planning  
releases to see why:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

It takes a lot of effort and coordination, much more so than the  
average CPAN module.  I believe some of the core developers are  
meeting this weekend; maybe something will come of that and we'll get  
an idea of a next release.

Chris

> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat Jun 24 21:02:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 24 Jun 2006 21:02:36 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>


On Jun 24, 2006, at 3:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

Either way. If the patch is trivial you could also provide the patch  
as an option. Generally we don't support that though. (Not everything  
that we don't support we don't support because it doesn't work.  
Sometimes it's just a statement along 'it-probably-works-but-don't- 
bug-us-if-it-doesn't'.)

>
> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

1.5.1 fixes a number of other problems too, so there isn't really  
much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,  
so investing time into creating 1.4.1 we think is not the best  
investment we can make.

Our current goal is to release 1.5.2 and possibly more development  
versions all leading on a steady path to 1.6.0. There's very few (but  
significant) stumbling blocks on this path that will require I  
believe some dedicated time from a couple  of people and after that  
there shouldn't be any real obstacles. It's quite possible that at  
BOSC or as early as next week at the GMOD meeting we could see a leap  
forward, typically it's those meetings that pull the respective  
people away from their daily obligations (short of an actual  
hackathons).

Some time back in spring 1.6 was put in proximity to BOSC, but that's  
probably not going to happen, but quite possibly not that much  
afterwards.

	-hilmar

>
> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Jun 24 21:21:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:21:56 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c697f5$c08d1150$15327e82@pyrimidine>

According to the docs ( ;> ) the default behaviour is to return "a BioPerl
Bio::SimpleAlign object which can then be printed and/or saved in multiple
formats using the AlignIO.pm module"; you should be able to do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight now
> including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Sat Jun 24 21:38:06 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 21:38:06 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
References: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>

they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm  
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as
> straight-forward as I was hoping it would be.  Either that, or I was
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about  
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun 24 21:40:49 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:40:49 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>
Message-ID: <000401c697f8$62d41e70$15327e82@pyrimidine>

...
> > I've written a script that relies on that bug being fixed. How
> > should I
> > deal with this when I want to give the script to others to use? Just
> > tell them "You must have BioPerl 1.5 installed". Give them
> > instructions
> > for patching the module code?
> 
> Either way. If the patch is trivial you could also provide the patch
> as an option. Generally we don't support that though. (Not everything
> that we don't support we don't support because it doesn't work.
> Sometimes it's just a statement along 'it-probably-works-but-don't-
> bug-us-if-it-doesn't'.)

The bug was fixed post-1.4 release according to the link, so Phillip should
use v1.5.1 or newer.

Hilmar's right.  It's hard to address every single complaint about code not
working or method not implemented w/o having patches or fixes submitted.
It's not my top priority to fix bugs in modules submitted by other authors
when I don't know the code.  I'll try if I have the free time, but that's
getting to be a precious commodity lately...

> > How long before the next "stable" release? Maybe a year? Should not a
> > BioPerl 1.4.1 be released so CPAN would get bug fixes like this
> > one? Or
> > would that be very difficult?
> 
> 1.5.1 fixes a number of other problems too, so there isn't really
> much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,
> so investing time into creating 1.4.1 we think is not the best
> investment we can make.
> 
> Our current goal is to release 1.5.2 and possibly more development
> versions all leading on a steady path to 1.6.0. There's very few (but
> significant) stumbling blocks on this path that will require I
> believe some dedicated time from a couple  of people and after that
> there shouldn't be any real obstacles. It's quite possible that at
> BOSC or as early as next week at the GMOD meeting we could see a leap
> forward, typically it's those meetings that pull the respective
> people away from their daily obligations (short of an actual
> hackathons).
> 
> Some time back in spring 1.6 was put in proximity to BOSC, but that's
> probably not going to happen, but quite possibly not that much
> afterwards.
> 
> 	-hilmar
...

Nice to know.  I guess a Release Pumpkin will be picked as well.  BOSC is
right around the corner so I guess we can expect something announced soon as
to a possible roadmap (we can't talk about 'timelines' in the States, it's
not patriotic).  

Chris


From golharam at umdnj.edu  Sat Jun 24 23:03:01 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:03:01 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000301c697f5$c08d1150$15327e82@pyrimidine>
Message-ID: <000301c69803$df899f20$2f01a8c0@GOLHARMOBILE1>

Thanks Chris.  It is in fact when you call align() that clustalw
generates the output that you see on the console.  The alignment is
generates I'm parsing right away.  Here's the output (an example) of
what I'm referring to:

-- BEGIN --
 CLUSTAL W (1.83) Multiple Sequence Alignments


Sequence format is Pearson
Sequence 1: human           271 aa
Sequence 2: mouse           264 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  90
Guide tree        file created:   [/tmp/TX4yxP9uKQ/80W87TkT5Z.dnd]
Start of Multiple Alignment
There are 1 groups
Aligning...
Group 1: Sequences:   2      Score:5469
Alignment Score 1480
GCG-Alignment file created      [/tmp/TX4yxP9uKQ/xE4GNyY7Rc]
-- END --

How do I get this to do to stderr instead of stdout? 

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Saturday, June 24, 2006 9:22 PM
To: golharam at umdnj.edu; bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
redirect?


According to the docs ( ;> ) the default behaviour is to return "a
BioPerl Bio::SimpleAlign object which can then be printed and/or saved
in multiple formats using the AlignIO.pm module"; you should be able to
do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output 
> redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some 
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can 
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight 
> now including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Sat Jun 24 23:05:41 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:05:41 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>
Message-ID: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>

>>they make no assumption about coding sequence,
>>where do you get that impression

I get that information from the 1.5 api docs:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/

Its documented under the description section.  

Oh well, I have it coded and working...might as well use it.

Ryan
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:38 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as 
> straight-forward as I was hoping it would be.  Either that, or I was 
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Sun Jun 25 07:33:58 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Jun 2006 12:33:58 +0100
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
References: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <449E74A6.3020709@sendu.me.uk>

Ryan Golhar wrote:
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);

You can suppress the output completely using
$aln_factory->quiet(1);

(supplying quiet => 1 to new() should also work according to the docs, 
but doesn't seem to be implemented, though I could be wrong)

If you really want the messages on STDERR you could try redirecting 
STDOUT to STDERR before calling align():
open(OLDOUT, ">&STDOUT");
open(STDOUT, ">&STDERR");
my $aa_aln = $aln_factory->align(\@aa_seq);
open(STDOUT, ">&OLDOUT");

I haven't tested either of these ideas, but I think they should both 
work - try them out and let us know.

Ideally there would be a saner way of doing this, but it isn't readily 
apparent to me.


From jason at bioperl.org  Sun Jun 25 08:37:11 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 25 Jun 2006 08:37:11 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with
	(baseml from PAML package)]
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>


On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:

>>> they make no assumption about coding sequence,
>>> where do you get that impression
>
> I get that information from the 1.5 api docs:
>
> http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
great - I would also always point people to the LIVE code  
documentation not the 1.5.0-RC1 which is +1 years old, but nothing  
particular has changed in this module since 1.5.0 that I know of.   
Someday someone will put a new ball of docs up on the site, but I  
hope that will come with the next development or stable release.

> Its documented under the description section.
>
i don't really see what you refer to since there is a lot of  
documentation, but perhaps it should be clarified - I had hoped this  
was a sufficient description:
"This object contains routines for calculating various statistics and  
distances for DNA alignments."

> Oh well, I have it coded and working...might as well use it.
>
Sounds like your best bet for your situation.

For the record and in the mailing list archives - as long as you  
don't call a method that contains "KaKs" it will work fine.  You can  
calculate distances using the currently implemented distance methods:

    JukesCantor
    Uncorrected
    F81
    Kimura
    Tamura
    F84 (Felsenstien 84)
    TajimaNei
    JinNei


It will be more productive is to just drop the discussion since you  
seem to be fine without all of this anyways  - if you decide you  
would like to use it and contribute new distances methods or doc  
fixes I am sure we'll enjoy your contributions.


-jason
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Sun Jun 25 13:05:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 12:05:34 -0500
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
	with(baseml from PAML package)]
In-Reply-To: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>
Message-ID: <000901c69879$97b7d5b0$15327e82@pyrimidine>

> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing
> particular has changed in this module since 1.5.0 that I know of.
> Someday someone will put a new ball of docs up on the site, but I
> hope that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as it's
the most up-to-date, that fact isn't really emphasized much on the docs
page; the link is along with the other toolkits at the bottom of the page
and is listed as Bioperl Core Code (some users don't seem to get that, in
general, bioperl=bioperl core).  Could be this is causing a bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or that I
should be able to!).  Maybe the best way to go is to have a wiki page for
this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or at
least link to the Release Pumpkin page which indicates the code release date
for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.  Work
is underway to correct them, but do not expect this code to give you the
right answer currently!  Use dnadist/distmat in the PHLYIP or EMBOSS
packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some point
relatively soon myself so I may be up to some testing on one/both of these
modules if needed.

Chris

....


From golharam at umdnj.edu  Sun Jun 25 13:20:12 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sun, 25 Jun 2006 13:20:12 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
 with(baseml from PAML package)]
In-Reply-To: <000901c69879$97b7d5b0$15327e82@pyrimidine>
Message-ID: <000801c6987b$9e65f840$2f01a8c0@GOLHARMOBILE1>

Exactly.  Also on the page it says (in the descriptionfor
Bio::Align::DNAStatistics):

In order to use these methods there are
several pre-requisites for the alignment.

   1
   DNA alignment must be based on protein alignment. Use the subroutine
aa_to_dna_aln    in Bio::Align::Utilities to achieve this.

 Etc etc etc


The rest of the pre-reqs also mention that the sequences should be
coding sequences.  Because of this, I thought DNAStatistics was only for
coding sequences and could not be used for non-coding sequences...

Anyway, I've gotten past my troubles and am on to finish this project.
I think the isssues I ran into others might run into as well.  I'd be
happy to contribue what I can but need to finish this stuff first...
Thanks for all your help Jason, Chris, Sendu!

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Sunday, June 25, 2006 1:06 PM
To: 'Jason Stajich'; golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] DNA distance methods [was
Bio::Tools::Phylo::PAML with(baseml from PAML package)]


> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code 
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing 
> particular has changed in this module since 1.5.0 that I know of. 
> Someday someone will put a new ball of docs up on the site, but I hope

> that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as
it's the most up-to-date, that fact isn't really emphasized much on the
docs page; the link is along with the other toolkits at the bottom of
the page and is listed as Bioperl Core Code (some users don't seem to
get that, in general, bioperl=bioperl core).  Could be this is causing a
bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or
that I should be able to!).  Maybe the best way to go is to have a wiki
page for this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or
at least link to the Release Pumpkin page which indicates the code
release date for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.
Work is underway to correct them, but do not expect this code to give
you the right answer currently!  Use dnadist/distmat in the PHLYIP or
EMBOSS packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some
point relatively soon myself so I may be up to some testing on one/both
of these modules if needed.

Chris

....


From pmiguel at purdue.edu  Sun Jun 25 15:02:14 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sun, 25 Jun 2006 15:02:14 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
Message-ID: <449EDDB6.8020401@purdue.edu>

Chris Fields wrote:
> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>
> [...]
>> How long before the next "stable" release? Maybe a year? Should not a
>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or
>> would that be very difficult?
>
> No, it's not that easy.  BioPerl isn't like most CPAN modules with one 
> or two developers.  See the wiki page for details on planning releases 
> to see why:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> It takes a lot of effort and coordination, much more so than the 
> average CPAN module.  I believe some of the core developers are 
> meeting this weekend; maybe something will come of that and we'll get 
> an idea of a next release.
>
> Chris
Hi Chris,
   Thanks for the information--the key part being that a bug fix from a 
couple of years ago has not propagated into the current stable release. 
Below I'll try to convince you that this is a serious problem. (Not 
because it is your fault, of course. I'm just trying to deliver my take 
on the situation to the bioperl-programmer-warriors who happen to be 
listening...)
   It isn't a problem for me to edit the offending statement in the 
QualI.pm module on systems I generally use. Or even install a 
developer's release of bioperl. My problem is one of advocacy. Maybe I 
have a warped view of the world, but it seems that except for those 
directly involved in the bioperl or GMOD projects, everyone looks to 
CPAN when they install bioperl.
    I write scripts that I sometimes want to send to biologists even 
less programming-capable than I am. I can just barely envision those 
biologists pestering their sysadmin to do a CPAN install of bioperl 
modules so that my script will work. But installing a non-CPAN set of 
modules probably isn't going to happen.
    So, this being the case, how can I, with a clear conscious, advocate 
bioperl to the junior bioinformaticians with whom I happen to interact?
    My take, for what it is worth, is that 1.5 has become an unratified 
stable release. How hard would it be to take 1.5.1--as is--and deposit 
that in CPAN? What would be the downside?

Phillip SanMiguel
   

From hlapp at gmx.net  Sun Jun 25 15:42:20 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 25 Jun 2006 15:42:20 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>

We did not and will not deposit 1.5.1 into CPAN due to the API issues  
in some (rather central) interfaces. These issues are changes over  
the 1.4 API and some of those changes are going to go away. Once we  
deposit it into CPAN we would sanction the changed API as the new  
'official' API and would open a huge can of backward liability worms.  
If you just continue to use the 1.4 API on the 1.5.1 release you  
don't need to be concerned about an API method you're using going away.

As I said, the people from the core group of developers who have  
traditionally shepherded releases all think that doing a 1.4.1  
release wouldn't be the best investment of their time. You are most  
welcome to disagree and volunteer your time to coordinate the 1.4.1  
release, and a lot of people will appreciate your efforts - including  
the bioperl developers and 'core'. It shouldn't be much work  
theoretically.

	-hilmar

On Jun 25, 2006, at 3:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one
>> or two developers.  See the wiki page for details on planning  
>> releases
>> to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the
>> average CPAN module.  I believe some of the core developers are
>> meeting this weekend; maybe something will come of that and we'll get
>> an idea of a next release.
>>
>> Chris
> Hi Chris,
>    Thanks for the information--the key part being that a bug fix  
> from a
> couple of years ago has not propagated into the current stable  
> release.
> Below I'll try to convince you that this is a serious problem. (Not
> because it is your fault, of course. I'm just trying to deliver my  
> take
> on the situation to the bioperl-programmer-warriors who happen to be
> listening...)
>    It isn't a problem for me to edit the offending statement in the
> QualI.pm module on systems I generally use. Or even install a
> developer's release of bioperl. My problem is one of advocacy. Maybe I
> have a warped view of the world, but it seems that except for those
> directly involved in the bioperl or GMOD projects, everyone looks to
> CPAN when they install bioperl.
>     I write scripts that I sometimes want to send to biologists even
> less programming-capable than I am. I can just barely envision those
> biologists pestering their sysadmin to do a CPAN install of bioperl
> modules so that my script will work. But installing a non-CPAN set of
> modules probably isn't going to happen.
>     So, this being the case, how can I, with a clear conscious,  
> advocate
> bioperl to the junior bioinformaticians with whom I happen to  
> interact?
>     My take, for what it is worth, is that 1.5 has become an  
> unratified
> stable release. How hard would it be to take 1.5.1--as is--and deposit
> that in CPAN? What would be the downside?
>
> Phillip SanMiguel
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Jun 25 16:20:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 15:20:20 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <7C28EA28-031A-4B1C-9625-A643247445FD@uiuc.edu>


On Jun 25, 2006, at 2:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one or two developers.  See the wiki page for details on planning  
>> releases to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the  
>> average CPAN module.  I believe some of the core developers are  
>> meeting this weekend; maybe something will come of that and we'll  
>> get an idea of a next release.
>>
>> Chris
> Hi Chris,
>   Thanks for the information--the key part being that a bug fix  
> from a couple of years ago has not propagated into the current  
> stable release. Below I'll try to convince you that this is a  
> serious problem. (Not because it is your fault, of course. I'm just  
> trying to deliver my take on the situation to the bioperl- 
> programmer-warriors who happen to be listening...)
>   It isn't a problem for me to edit the offending statement in the  
> QualI.pm module on systems I generally use. Or even install a  
> developer's release of bioperl. My problem is one of advocacy.  
> Maybe I have a warped view of the world, but it seems that except  
> for those directly involved in the bioperl or GMOD projects,  
> everyone looks to CPAN when they install bioperl.

Again, it's not as easy as you make it seem.  The idea is to upgrade  
the CPAN version to stable releases (even numbered) and that odd- 
numbered releases would be developer versions.  Yes, it has been a  
while since the last stable version; it could be a while until the  
next as there have been suggestions of an interim 1.5.x release or so  
before that occurs (though he did say 1.6 could be soon after BOSC  
which is in August).  Hilmar has explained that there are some  
stumbling blocks to get around before the next major release (if  
those 'stumbling blocks' are what I think they are, I agree).  It's  
very likely implementation of changes that he mentions may require  
refactoring code, changing API, etc.  Not easy in a project like  
this, a large core of contributors and with the developers scattered  
all over the world, all with different priorities (we all have $jobs  
after all).

That's why we have a Release Pumpkin, akin to the Pumpkings that have  
ushered forth regular perl releases.  It requires a large,  
coordinated effort with one person acting as overseer, pushing  
everybody to meet deadlines.  Not easy and not, by a long shot, your  
typical CPAN module.

>    I write scripts that I sometimes want to send to biologists even  
> less programming-capable than I am. I can just barely envision  
> those biologists pestering their sysadmin to do a CPAN install of  
> bioperl modules so that my script will work. But installing a non- 
> CPAN set of modules probably isn't going to happen.
>    So, this being the case, how can I, with a clear conscious,  
> advocate bioperl to the junior bioinformaticians with whom I happen  
> to interact?

Give those biologists some credit. Quite frankly, I would expect any  
bioinformaticist or computational biologist, junior or otherwise, to  
know or at least learn how to install from CPAN or from CVS,  
otherwise they need to change their job title.  And, as a  
microbiologist myself (i.e. one of those biologists you mention) and  
as one who regularly interacts with biologists with little to no  
computer science experience, I believe I can speak from experience.   
I find the install documents that come with BioPerl and available on  
the wiki pretty much cover everything, from how to install to the  
dependencies required to problems one may encounter.   The web site  
has a tone of documentation, including the FAQ (*cough* which covers  
this ground *cough*).

If they are running perl scripts and using a system that requires  
sysadmin privileges they probably know what thy are doing anyway.  If  
not they probably have students/employees that do know what's going  
on (and who may be the ones actually running the scripts).  You can't  
please everybody, so I think you can proceed with a clear conscious  
knowing you did the best that you can to help!

>    My take, for what it is worth, is that 1.5 has become an  
> unratified stable release. How hard would it be to take 1.5.1--as  
> is--and deposit that in CPAN? What would be the downside?

Ah I see Hilmar has responded.  I think he adequately answers this.   
API is everything; changing API suddenly is bad bad bad.

> Phillip SanMiguel

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From akholloway at ucdavis.edu  Mon Jun 26 00:15:16 2006
From: akholloway at ucdavis.edu (Alisha Holloway)
Date: Sun, 25 Jun 2006 21:15:16 -0700
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
 package)
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <a06230932c0c50e71ad97@[10.0.1.2]>

Hi Ryan & Jason,

Sorry I didn't get back to you sooner.  I escaped the central valley 
heat (108!) and went to the coast for the weekend.  I do have a 
script that will call baseml and then parse the results.  Here it is 
and, Ryan, I can show you how to retrieve other parts of the data as 
well, but you may already know how to do this.  I know it's ugly, I 
got it working and didn't clean it up.  Just let me know if you need 
more info.

Alisha

At 11:05 PM -0400 6/24/06, Ryan Golhar wrote:
>  >>they make no assumption about coding sequence,
>>>where do you get that impression
>
>I get that information from the 1.5 api docs:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
>Its documented under the description section. 
>
>Oh well, I have it coded and working...might as well use it.
>
>Ryan
>-----Original Message-----
>From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
>Stajich
>Sent: Saturday, June 24, 2006 9:38 PM
>To: golharam at umdnj.edu
>Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
>package)
>
>
>they make no assumption about coding sequence,where do you get that 
>impression.  the ka,ks are for coding but the tamura/nei kimura, 
>jukes-cantor are all for any type of sequence.
>
>the phylip and emboss are pretty straightforward IMHO - you give it 
>an alignment and you get out a matrix of pairwise numbers....
>\
>but whatever makes sense to you - we are using the same methods as 
>are in Li's book (that is where I took the equations from).
>-j
>On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:
>
>>  Hi Jason,
>>
>>  It looks like DNAStatistics is only for coding sequences.  I'm
>>  trying to
>>  calculate the Ks of exons and the K (or Ki) of introns.  All the 
>>  methods
>>  in bioperl are based on coding sequences.  Only the  PAUP package 
>>  (that
>>  I've found) does non-coding sequences.   I would have used it but you
>>  need to pay for it and we don't have the funding to purchase much 
>>  at the
>>  moment.
>>
>>  I brielfy looked at PHYLIP and EMBOSS but it didn't look as
>>  straight-forward as I was hoping it would be.  Either that, or I was
>>  getting fustrated looking for a simple solution.
>>
>>  In the end, I found a molecular evolution book that talks about
>>  several
>>  methods used for non-coding sequences so I went ahead and implemented
>>  them.  They seem to work well.
>>
>>  Ryan
>>
>>
>>  -----Original Message-----
>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>  Jason
>>  Stajich
>>  Sent: Saturday, June 24, 2006 2:43 PM
>>  To: golharam at umdnj.edu
>>  Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from 
>>  PAML
>>  package)
>>
>>
>>  You should look at the Align::DNAStatistics module if you just want
>>  pairwise DNA distance.  I put in several different distance methods.
>>  Or you can use the distance methods implemented in PHYLIP or EMBOSS
>>  programs -- I thought you wanted the somewhat more sophisticated ML
>>  approaches that are implemented in PAML?
>>
>>  --jason
>>
>>  On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>>
>>>  I've managed to code three methods to calculate K into a perl script
>>>  using the algorithms as described in "Molecular Evolution" by Wen-
>>>  Hsuing
>>>  Li.   I'd be happy to contribute it as a script...
>>>
>>>
>>>
>>>  -----Original Message-----
>>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>>  Jason
>>>  Stajich
>>>  Sent: Saturday, June 24, 2006 9:40 AM
>>>  To: golharam at umdnj.edu
>>>  Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>>>  PAML
>>>  package)
>>>
>>>
>>>  baseml is not well-supported to my knowledge - I think I started with
>>>  attempt to capture a small amount of the data in the file.  There are
>  >> some people who have made modifications to possible parse it in-house
>>>  but I know of no submitted patches.   Many of the knowledgeable
>>>  people are probably at the evolution meetings  this week.
>>>
>>>  I have no idea about the full set of information in the report files
>>>  without going back to the Yang papers first.   It depends on how much
>>>  of that information you really want to capture of just the
>>>  substitution rates.
>>>
>>>  I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>>>  work+PAML.
>>>
>>>  -jason
>>>  On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>>
>>>>  Hi all,
>>>>
>>>>  I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>>>  baseml in the PAML package to measure the distances of some non-
>>>>  coding
>>>
>>>>  regions.
>>>>
>>>>  I started with the coding regions, and used the script
>>>>  bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>>>  something similar for non-coding regions.  However, when I call
>>>>  Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>>>  meaning matrix was never defined.
>>>>
>>>>  I wanted to find out if anyone on here has done this before or
>>>>  knows a
>>>
>>>>  way to measure substitution frequencies of non-coding regions with
>>>>  the
>>>
>>>>  PAML package.  The documentation with PAML is sparse so I'm not
>>>>  sure how
>>>>  to interpret its output directly - that's why I'm using Bioperl.
>>>>
>>>>  Hopefully someone can help me before I start digging into the
>>>>  code...Thanks.
>>>>
>>>>  Ryan
>>>>
>>>>  _______________________________________________
>>>>  Bioperl-l mailing list
>>>>  Bioperl-l at lists.open-bio.org
>>>>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  --
>>>  Jason Stajich
>>>  Duke University
>>>  http://www.duke.edu/~jes12
>>>
>>
>>  --
>>  Jason Stajich
>>  Duke University
>>  http://www.duke.edu/~jes12
>>
>
>--
>Jason Stajich
>Duke University
>http://www.duke.edu/~jes12


-- 
Alisha Holloway

Postdoctoral Fellow
Section of Evolution & Ecology
3347 Storer Hall
University of California
Davis, CA  95616

530-754-9551 Office
512-297-3958 Cell
530-752-1449 Fax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: batch_baseml_50nt.pl
Type: application/octet-stream
Size: 5395 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: baseml.ctl
Type: application/octet-stream
Size: 1699 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0007.obj>

From fernan at iib.unsam.edu.ar  Mon Jun 26 08:47:30 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 09:47:30 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
Message-ID: <20060626124730.GA53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
|
| We did not and will not deposit 1.5.1 into CPAN due to the API issues  
| in some (rather central) interfaces. These issues are changes over  
| the 1.4 API and some of those changes are going to go away. Once we  
| deposit it into CPAN we would sanction the changed API as the new  
| 'official' API and would open a huge can of backward liability worms.  
| If you just continue to use the 1.4 API on the 1.5.1 release you  
| don't need to be concerned about an API method you're using going away.
| 
| As I said, the people from the core group of developers who have  
| traditionally shepherded releases all think that doing a 1.4.1  
| release wouldn't be the best investment of their time. You are most  
| welcome to disagree and volunteer your time to coordinate the 1.4.1  
| release, and a lot of people will appreciate your efforts - including  
| the bioperl developers and 'core'. It shouldn't be much work  
| theoretically.
| 
| 	-hilmar
|
+----]

I understand that, being a volunteer project, people can
decide where to best invest their time. If core developers
are no longer using 1.4 in their production setups, it is
reasonable to expect that they invest all of their time in
1.5 or any other bioperl version that they're using.

However, when view as an issue related to the setting of a
policy for the whole project, then it makes sense to have a
policy saying for how long a stable release will be
supported, and when and in which case bugfixes that are committed
to and tested in the development branch (as it should be)
will get merged back to stable. 

I'm not knowledgeable enough about the bioperl release
engineering process, nor about the internal development
process, but just guessing I'd expect that whenever anyone
submits a bugfix, it should be the responsibility of
the committer to check (against the project policy,
(written or implicit) or with the core developers in a
difficult case) whether the fix should be committed to more
than one branch.

A patch like the one that started this thread, should have
been committed to the 1.4 branch without too much thinking.
And it would have cost the committer only a few seconds more
of her/his time. 

But you only get this by setting and enforcing a policy.

After a number of these fixes has accumulated, then making a
new release shouldn't represent too much effort, nor it
should be expected that the tests that passed before would
break now. And in the worst case (no tarball release),
people can be directed to obtain the most current 'stable'
code from the repository, containing all bugfixes. 

I guess that this is what was meant by Phillip.

Fernan


From hlapp at gmx.net  Mon Jun 26 09:59:00 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 09:59:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.

Sure. But for some reason he or she forgot. So what do you suggest we  
do - and I mean as a community, because this is a community project.  
Come after the guy  until he commits it to the branch? Or post an  
email to the list saying what you think is the right way and then do  
it (yourself)?

>
> But you only get this by setting and enforcing a policy.

Man, this is not a company. Take a step back and think again. What do  
you suggest we - again we as a community - do to enforce a policy?  
Take increasing levels of disciplinary action if someone keeps  
forgetting to commit to the branch?

While there are clearly some rules everybody needs to follow and if  
you violate them deliberately and repeatedly you will get your CVS  
privileges withdrawn, by and large we as a community need to accept  
some responsibility for making the project what we think it should be  
- and do so not by invoking disciplinary action but by living by  
example and by taking action yourself when you think action is due.

If Bioperl were a company and you asked for a 1.4.1 release and the  
customer service rep told you nope there's a 1.5.1 that you should  
use instead and that will do just fine, what will you do? Argue with  
him about the company policies and whether they are properly enforced  
or not?

Obviously doing so will be a waste of your time. In Bioperl it is at  
the bottom of it no less waste of your time, because instead you now  
have the opportunity to make happen what you believe needs to happen.  
We have had a history of rapidly and un-bureaucratically putting  
people in power of what they wanted to do. We have also had a history  
of not listening much to people who don't want to put their feet  
where their mouth is.

I'm sorry if what I'm saying puts people off, but really this is an  
open-source project and if you ask me it's one with the least  
barriers of entry for new developers or 'activists' that you can find  
in the open source arena. This doesn't come without some degree of  
anarchy, but really IMHO that's more of an advantage than a  
disadvantage.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Mon Jun 26 10:13:00 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Jun 2006 10:13:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <BDC70861-52D3-4389-9073-07F456661B14@bioperl.org>

fair enough - we can certainly merge fixes onto the branch -  I am  
not sure why that is such a big deal.

once the changes are made to the branch, If someone then wants to  
update to the latest code on  1.4 branch,  they would  to volunteer  
to do the last step of:
cvs export -r branch-1-4 -d bioperl-1-4-1 bioperl-live

then validate it, then make a tar ball, we can submit a 1.4.x to  
CPAN, but honestly a lot of other fixes have accumulated since the  
1.4 branch and I don't think we want to keep merging back to it, we'd  
rather move forward. the not-so-compatible changes that got checked  
in after the 1.4 branch (having to do with Annotateable) has been  
part of the problem as this has not been fully fixed to make things  
backwards compatable.

Nathan asked earlier on the list about how to get a list of modules  
added since 1.4 and I can only say how to generate a diff to the  
current version of the code which might be more than what he is  
asking for. read the docs on cvs diff where you specify the two tags  
you want to diff between.


We certainly have a problem of meeting the needs of several different  
user groups - developers who need latest code, and users who want  
stable releases.  We either get funding to support stable releases  
more deliberately, things that don't seem to be on the main radar  
screen of primary developers or people who are tied to working with  
older stable releases.  Since most of us who are coding and making  
changes are just working from a CVS checkout we don't have a lot of  
pressure to make a release -- and we don't want to dump newly buggy  
(or broken interfaces) into CPAN on purpose.  It also seems like many  
reported bugs have already been fixed on the latest branch but people  
are less interested in back-fixing on the old branch.


Our hope is that 1.6 would be a good replacement for 1.4 - presumably  
API consistent for the most part, but we are suffering from lack of  
time of people willing to do the work to make this happen.

I have mentioned in the past that I cannot be the release master for  
the project and it is time for someone else to step up and make this  
happen.  Chris Fields has done a phenomenal job answering questions,  
fixing bugs, and helping run the project as some of us have started  
to have too busy of a schedule to keep daily tabs on Bioperl.  But he  
too will probably have to cycle off as his career responsibilities  
(and job search) takes more time.   I don't have a good answer for  
anyone on how to make this happen more smoothly, I am hopeful that  
the gmod mtg will spur some more commits and a roadplan for releasing  
the next dev release and seeing what can happen with 1.6.  If we  
funded a Bioperl coordinator I am sure that would help things more  
and manage the different sets of priorities of the user groups.

I think a dedicated hackathon to bioperl work could get 1.6 out after  
one week of solid work with some bug squashing followup.

Barring that we'll have to see what everyone else wants to see done  
to get the next release out.  The person leading the release doesn't  
have to really program things they just need to organize people  
around a time-frame, a set of features that need to be tested and  
fixed, and commitments from people of what they will do.

Much of the release process is documented on the bioperl wiki site,  
if this is not clear enough please make a note on the page/talk page  
and we can start .  My hope is that the wiki can be a good repository  
of the thought process behind the project.  right now too much of it  
is floating in the minds of former and current project coordinators.

...just some of my thoughts as I get ready to be off-line starting  
next week for 4 weeks...

-jason


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> +----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
> |
> | We did not and will not deposit 1.5.1 into CPAN due to the API  
> issues
> | in some (rather central) interfaces. These issues are changes over
> | the 1.4 API and some of those changes are going to go away. Once we
> | deposit it into CPAN we would sanction the changed API as the new
> | 'official' API and would open a huge can of backward liability  
> worms.
> | If you just continue to use the 1.4 API on the 1.5.1 release you
> | don't need to be concerned about an API method you're using going  
> away.
> |
> | As I said, the people from the core group of developers who have
> | traditionally shepherded releases all think that doing a 1.4.1
> | release wouldn't be the best investment of their time. You are most
> | welcome to disagree and volunteer your time to coordinate the 1.4.1
> | release, and a lot of people will appreciate your efforts -  
> including
> | the bioperl developers and 'core'. It shouldn't be much work
> | theoretically.
> |
> | 	-hilmar
> |
> +----]
>
> I understand that, being a volunteer project, people can
> decide where to best invest their time. If core developers
> are no longer using 1.4 in their production setups, it is
> reasonable to expect that they invest all of their time in
> 1.5 or any other bioperl version that they're using.
>
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.
>
> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
>
> But you only get this by setting and enforcing a policy.
>
> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.
>
> I guess that this is what was meant by Phillip.
>
> Fernan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Mon Jun 26 10:44:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 15:44:55 +0100
Subject: [Bioperl-l] Tests
Message-ID: <449FF2E7.3040101@sendu.me.uk>

What level of testing is expected to be done in a test file? Is there 
such a thing as too many tests? Tests for every possible (documented) 
way of achieving a result with a module's method? Tests for every 
conceivable way of misusing a method?

If I come across a test for a module that doesn't test for everything 
the module can do, should I add tests as a matter of course? Would this 
be beneficial, or a waste of time (given that the module probably is 
bug-free already)?


From cjfields at uiuc.edu  Mon Jun 26 11:24:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:24:00 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <001301c69934$909f83c0$15327e82@pyrimidine>

...
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.

In a project this large which relies on a lot of outside resources
maintaining API and availability at all times, having a completely bug-free
fix for any reasonable length of time is impossible.  As a small example,
almost every time NCBI changes BLAST output, it breaks our text parsers, and
though we recommend using the BLAST XML format parser (which is much more
stable), almost everybody continues using text parsing and wants that fixed.
Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
notification, so remote BLAST parsing can break at any time.  Fold into that
any software changes that change output or API (PAML comes to mind).  Fold
into that remote database changes (EBI interface to Swissprot).  Oh, let's
not forget sequence format changes (recent SwissProt and GenBank changes).
And, worst of all, we can't expect them to maintain API or output b/c
they're updating based on user input/suggestions or bug fixes which require
them to make changes.  What's 'stable' about that?

It's very easy to say you want something and then not volunteer to do it; if
you want something then put forth the time and effort to get it done.  Put
your money where your mouth is (as they say in my home state).

Again (for the third or fourth time now), putting together a release takes
some time and effort.  I actually think it takes more effort than Hilmar
suggests; either way, it requires someone to act as the leader (release
pumpkin) to handle changes, and I don't see anybody stepping forward.
Personally, if I have the time, maybe I'll handle an interim release, but
I'm looking for a job starting in the fall as well as finishing up research
for publication so that will take up almost all the time I have.  As Hilmar
says, if you want to do it, fine.  Realize, though, many many changes have
been made since 1.4 and many more will likely be made on the road to 1.6

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.

This is a large open-source project with a ton of developers all over the
world.  Check out the AUTHORS file; it's at best incomplete and still has
about 100 contributors.  

(Hey, my name's not on there!!!)

> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
> 
> But you only get this by setting and enforcing a policy.

You need to realize what this project is, what it is not, and how it
evolved.  A little history lesson might get you (and others) to understand
just how complex it all is (and how old some of the code is).

http://www.bioperl.org/wiki/FAQ#Can_you_explain_the_Object_Model_design_and_
rationale.3F

explains a bit on the project design.

http://www.bioperl.org/wiki/History_of_BioPerl

explains how BioPerl came to be.  

This is not a job or a company but an open-source project; it's origins are
based in the scientific community.  You're probably right about the person
not committing the change to the 1.4 branch.  We probably should have a
policy for commits to stable releases.  But how can we logically rationalize
doing so now for 1.4, almost three years hence?  We're post 1.5.1 and likely
going into 1.6 as we speak.  It's too late for 1.4 changes IMHO, frankly,
but you're welcome to try.  I don't think it's worth the effort.

As for policy enforcement, what would you want us to do?  This is a
volunteer effort.  Fire him/her?  Frankly they should be commended for
getting the fix committed in the first place, and if someone points out that
it should be committed to the 1.4 branch then fine; it shouldn't be hard to
do so even long after the commit to the main branch is made.  It just
requires someone to do so.

Again, this is NOT your typical CPAN module with one or two developers or a
project that relies on doing one thing very well.  This project has over 100
developers and is supposed to do everything adequately (and many things very
well). 

> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.

You can download a tarball from the latest CVS code at any time.  There is a
link for doing just that at the bottom of the anonymous CVS page:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/

Chris


From hlapp at gmx.net  Mon Jun 26 11:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 11:30:05 -0400
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
References: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>


On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:

> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests?

No, not really.

> Tests for every possible (documented)
> way of achieving a result with a module's method?

Ideally that's the minimum.

> Tests for every conceivable way of misusing a method?

If some or known already (from reports) or you think can be  
anticipated, yes. Generally, if a method documents what are invalid  
values for its input it's a good idea to test what the method does if  
supplied with such values. The one thing it shouldn't do is silently  
ignore them, or produce a result anyway (which presumably would be a  
wrong result by definition).

>
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would  
> this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?

It would certainly be beneficial. It'd be great if you were willing  
to volunteer for this.

Note that a module being bug free now doesn't mean it always will be.  
The main point of tests is not only to weed out bugs at the time it  
is written, but also to make sure that future changes to the module  
itself, or to other modules it interacts with or inherits from, don't  
break it.

	-hilmar

>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Mon Jun 26 11:39:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:39:25 +0100
Subject: [Bioperl-l] Tests
In-Reply-To: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
References: <449FF2E7.3040101@sendu.me.uk>
	<AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
Message-ID: <449FFFAD.40506@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:
>
>> If I come across a test for a module that doesn't test for everything
>> the module can do, should I add tests as a matter of course? Would this
>> be beneficial, or a waste of time (given that the module probably is
>> bug-free already)?
> 
> It would certainly be beneficial. It'd be great if you were willing to 
> volunteer for this.

I doubt I have time to do this on the global scale[*], but certainly I 
will for the modules I work on.


Cheers,
Sendu.

* Though... it would certainly be a good way of getting to know all of 
Bioperl intimately!


From bix at sendu.me.uk  Mon Jun 26 11:42:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:42:33 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00069.6010107@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).
> 
> The general tidy up will involve redoing and perhaps even removing 
> things.

Does anyone know what the intent behind the split Bio::Map::MappableI 
and Bio::Map::MarkerI was? I somehow get the impression these started as 
one interface but then became two. The split /seems/ to be MappableI as 
a map element with one position on one map, whilst MarkerI is a map 
element with multiple positions on multiple maps. But MarkerI has no 
synopsis or description, and MappableI says it does what MarkerI does 
(but doesn't). So I'm left guessing atm.

Do we want to keep the split? If yes, what exactly should be the 
difference between the two? If no, would it be ok to just get rid of 
MarkerI (folding it back into MappableI)?


From cjfields at uiuc.edu  Mon Jun 26 11:45:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:45:51 -0500
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <001a01c69937$9a1c1320$15327e82@pyrimidine>

My opinion: tests should cover methods and expected results and are based on
what the module actually accomplishes.  Some classes (like SeqIO, SearchIO)
are normally relatively easy to build tests for b/c the expected results are
in the file being parsed.  Tests which check calculated results from modules
(Bio::Align::DNAStatictics for instance) I would think are trickier since
you should confirm the calculations are correct through independent means.

Links:

http://www.bioperl.org/wiki/Advanced_BioPerl#Designing_Good_Tests

http://search.cpan.org/~mschwern/Test-Simple-0.62/lib/Test/Tutorial.pod

The link above uses Test::Simple or Test::More; we use Test (but have
considered moving to Test::More using Devel::Cover).

My 2c

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 26, 2006 9:45 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Tests
> 
> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests? Tests for every possible (documented)
> way of achieving a result with a module's method? Tests for every
> conceivable way of misusing a method?
> 
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Mon Jun 26 12:15:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 17:15:32 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00824.20002@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).

To do this I actually need to make some slightly more significant API 
changes than I had hoped. To make Position central, all maps, mappables 
and markers need to be able to add and remove Positions (and similar 
things). As I see it, we can say that such methods are fundamental to 
the coordination required between Bio::Map modules. I feel that I'm 
therefore justified in implementing these kinds of methods in the 
interfaces (which would allow all the downstream modules that implement 
those interfaces to work in the new system without much/any alteration).

Am I justified? Should I try harder to do it without implementations in 
the interfaces?


From pmiguel at purdue.edu  Mon Jun 26 12:53:56 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Mon, 26 Jun 2006 12:53:56 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <001301c69934$909f83c0$15327e82@pyrimidine>
References: <001301c69934$909f83c0$15327e82@pyrimidine>
Message-ID: <44A01124.5040102@purdue.edu>

Chris Fields wrote:
> ...
>   
>> However, when view as an issue related to the setting of a
>> policy for the whole project, then it makes sense to have a
>> policy saying for how long a stable release will be
>> supported, and when and in which case bugfixes that are committed
>> to and tested in the development branch (as it should be)
>> will get merged back to stable.
>>     
>
> In a project this large which relies on a lot of outside resources
> maintaining API and availability at all times, having a completely bug-free
> fix for any reasonable length of time is impossible.  As a small example,
> almost every time NCBI changes BLAST output, it breaks our text parsers, and
> though we recommend using the BLAST XML format parser (which is much more
> stable), almost everybody continues using text parsing and wants that fixed.
> Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
> notification, so remote BLAST parsing can break at any time.  Fold into that
> any software changes that change output or API (PAML comes to mind).  Fold
> into that remote database changes (EBI interface to Swissprot).  Oh, let's
> not forget sequence format changes (recent SwissProt and GenBank changes).
> And, worst of all, we can't expect them to maintain API or output b/c
> they're updating based on user input/suggestions or bug fixes which require
> them to make changes.  What's 'stable' about that?
>
> It's very easy to say you want something and then not volunteer to do it; if
> you want something then put forth the time and effort to get it done.  Put
> your money where your mouth is (as they say in my home state).
>
> Again (for the third or fourth time now), putting together a release takes
> some time and effort.  I actually think it takes more effort than Hilmar
> suggests; either way, it requires someone to act as the leader (release
> pumpkin) to handle changes, and I don't see anybody stepping forward.
> Personally, if I have the time, maybe I'll handle an interim release, but
> I'm looking for a job starting in the fall as well as finishing up research
> for publication so that will take up almost all the time I have.  As Hilmar
> says, if you want to do it, fine.  Realize, though, many many changes have
> been made since 1.4 and many more will likely be made on the road to 1.6
>
>   
Hi Chris et al.,

    I was just reporting the situation from where I sit. I think this 
issue was important enough to bring to everyones attention. I've done so 
and I'm more than satisfied with the response. I hope my emails were not 
too abrasive.
    I've have now read the wiki about coordinating a release. You are 
right, that does sound hard. At least to me--I've never even used CVS, 
nor contributed a module to CPAN. I just don't see myself as being 
qualified to coordinate a 1.4.1 release. So since I'm not, for that 
reason, able to volunteer to do it myself, I'll withdraw my request for 
a new release to CPAN.
    That being said, I think Fernan's suggestion bears keeping in mind 
once 1.6 has been released and bug fixes are being committed. By that 
time, I hope I'll be savvy enough to help out in the process.
    Thanks for your attention,

Phillip SanMiguel
   

From fernan at iib.unsam.edu.ar  Mon Jun 26 15:24:51 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 16:24:51 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
	<ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
Message-ID: <20060626192451.GB53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 11:01):
| 
| On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:
| 
| >I'm not knowledgeable enough about the bioperl release
| >engineering process, nor about the internal development
| >process, but just guessing I'd expect that whenever anyone
| >submits a bugfix, it should be the responsibility of
| >the committer to check (against the project policy,
| >(written or implicit) or with the core developers in a
| >difficult case) whether the fix should be committed to more
| >than one branch.
| >
| >A patch like the one that started this thread, should have
| >been committed to the 1.4 branch without too much thinking.
| >And it would have cost the committer only a few seconds more
| >of her/his time.
| 
| Sure. But for some reason he or she forgot. So what do you suggest we  
| do - and I mean as a community, because this is a community project.  
| Come after the guy until he commits it to the branch? 

No, I never said or implied that.

| Or post an email to the list saying what you think is the
| right way and then do  it (yourself)?

Of course I could volunteer some of my time to
do that (that is, go over the commit history and see what
changes could be merged back to 1.4, if that seems to be
useful), provided I get a polite reply to my 'email
to the list saying what [I] think is the right way'.

I'm a volunteer in other open source, community projects,
and I do contribute regularly so I see no problem except the
obvious scarcity of free time in doing the same for bioperl.

| >But you only get this by setting and enforcing a policy.
| 
| Man, this is not a company. Take a step back and think again. What do  
| you suggest we - again we as a community - do to enforce a policy?  
| Take increasing levels of disciplinary action if someone keeps  
| forgetting to commit to the branch?

Seems like you were pissed off by what I said ...

What I was just trying to say is that merely by formulating
and communicating a policy you could be taking steps towards
making it a reality. Maybe 'enforcing' was an unfortunate
word to use here ... 

You don't have to punish anyone, just sending a polite email
to the list reminding people about the policy once in a
while, should be enough. It's OK if some committer doesn't
care, or just forgets about doing the right thing once in a
while ...

But of course, you might be pissed off by me talking about
something that I know nothing about (the devleopment of
bioperl), given that I'm just a bioperl user.

Perhaps my mistake was to bring here ideas from
other projects (in which I do contribute regularly) without
realizing that, not being a contributor, I could be
punished for suggesting how things could be done better.

| While there are clearly some rules everybody needs to follow and if  
| you violate them deliberately and repeatedly you will get your CVS  
| privileges withdrawn, by and large we as a community need to accept  
| some responsibility for making the project what we think it should be  
| - and do so not by invoking disciplinary action but by living by  
| example and by taking action yourself when you think action is due.

I completely agree. When I said 'setting a policy' I just
meant something along the lines of clearly stating what are
those 'rules everybody needs to follow'. My suggestion was
to add a 'merge trivial fixes back to stable' rule to that
list.

I agree with Jason: why is that such a big deal. 

| If Bioperl were a company and you asked for a 1.4.1 release and the  
| customer service rep told you nope there's a 1.5.1 that you should  
| use instead and that will do just fine, what will you do? Argue with  
| him about the company policies and whether they are properly enforced  
| or not?
| 
| Obviously doing so will be a waste of your time. In Bioperl it is at  
| the bottom of it no less waste of your time, because instead you now  
| have the opportunity to make happen what you believe needs to happen.

Right, but first i have to realize what needs to happen. I
realized it when I read your reply to Philips message.

I then proceeded to write my thoughts and send them to the
list, to see what kind of feedback I get. 

Hopefully, someone with commit privileges would think that
what I said makes sense and just proceed to doing it (saving
me from the task :)

Or perhaps, someone, as Jason did, would say that it's
not worth to try to merge back things to 1.4 and move
forward instead. In his message he even explained what the
problems and needs are (lack of man-time, need for
volunteers) and politely asked for help.

| We have had a history of rapidly and un-bureaucratically putting  
| people in power of what they wanted to do. We have also had a history  
| of not listening much to people who don't want to put their feet  
| where their mouth is.

I would call your reply (this message) a barrier of entry
for new developers. In the above paragraph I guess you are
referring to the bioperl motto: 'whoever codes it wins'.
That is true in any open source project. But at least to me,
that doesn't say that you should not listen to people just
because they haven't contributed a single line of code.

| I'm sorry if what I'm saying puts people off, but really this is an  
| open-source project and if you ask me it's one with the least  
| barriers of entry for new developers or 'activists' that you can find  
| in the open source arena. 


Let me disagree. The barriers of entry are not just the
giving away of a developer accounts and/or repository write
privileges. 

I'm a regular contributor in another open source, community
project (FreeBSD) that has more and higher barriers of entry
with respect to giving away privileges (for example for
committing changes to the repository). Nonetheless FreeBSD
has historically shown to have few and low barriers of entry
for incorporating people to the project (without the need to
give away commit privileges, making them responsible for
parts of the FreeBSD source code/documentation/ports/etc).

IMO, that comes from a very good communication of the
direction of the project, what needs to be done, how to do
it, and a tendency of privileged and older members to listen
to people's suggestions, inviting and helping people
to jump the fence and become part of the project. It's not
an untought occurrence that FreeBSD has ?mentors? that
introduce new members, help them to get acquainted with how
the project works, policies, etc. and supervise their
actions.

| This doesn't come without some degree of  
| anarchy, but really IMHO that's more of an advantage than a  
| disadvantage.
| 	-hilmar
|
+----]

Fernan

PS: finally, let me just add that english is not my native
language. Although I'm quite familiar with it, once in a
while, an unfortunate choice of words might blur my intented
meaning or the strength I wanted to convey. In case that has
been the case, let me put clearly that it has not been my
intention to criticize the way the project does things, but
to suggest ideas for the future (merge back trivial changes
to a 'stable' branch as a policy) based on my experience
with other projects. Whether that fits bioperl or not was
what I would have expected as a reply.


From cjfields at uiuc.edu  Mon Jun 26 16:18:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 15:18:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626192451.GB53298@iib.unsam.edu.ar>
Message-ID: <002701c6995d$b738f790$15327e82@pyrimidine>


> | >A patch like the one that started this thread, should have
> | >been committed to the 1.4 branch without too much thinking.
> | >And it would have cost the committer only a few seconds more
> | >of her/his time.
> |
> | Sure. But for some reason he or she forgot. So what do you suggest we
> | do - and I mean as a community, because this is a community project.
> | Come after the guy until he commits it to the branch?
> 
> No, I never said or implied that.

Right, you didn't say that.  But you didn't clarify your statements either.
I think you're treading into dangerous waters when you come in and criticize
something w/o bothering to read up on how things have been done here.  As
you say yourself below, it's 'something that I know nothing about (the
devleopment of bioperl), given that I'm just a bioperl user'.  It's akin to
"I don't think you're coding things correctly, here's the right way to do
it" w/o knowing what the code is used for.

> | Or post an email to the list saying what you think is the
> | right way and then do  it (yourself)?
> 
> Of course I could volunteer some of my time to
> do that (that is, go over the commit history and see what
> changes could be merged back to 1.4, if that seems to be
> useful), provided I get a polite reply to my 'email
> to the list saying what [I] think is the right way'.

You will get a polite email when you respond politely.  I actually agree
with many things you say, but you sure aren't making any friends here by the
way you consistently take the opposite stance and judge what other people
do.  I think you have a point about having a stable release be supported for
a period of time.  My point is, how long?  We didn't really get an idea of
that from you, did we?

> I'm a volunteer in other open source, community projects,
> and I do contribute regularly so I see no problem except the
> obvious scarcity of free time in doing the same for bioperl.

And others here also volunteer elsewhere (GMOD, DAS, Ensembl, etc).  Don't
presume we don't have experience in open-source.  That's being pretty
judgmental.  

> | >But you only get this by setting and enforcing a policy.
> |
> | Man, this is not a company. Take a step back and think again. What do
> | you suggest we - again we as a community - do to enforce a policy?
> | Take increasing levels of disciplinary action if someone keeps
> | forgetting to commit to the branch?
> 
> Seems like you were pissed off by what I said ...

????Ya think????  

You know, okay, forget it.  This is completely non-productive.  We'll all
agree to disagree, argue, whatever.  The points made here, as I see them:

1)  Commits should be made to stable releases (as well as to the main branch
in CVS) to fix bugs as long as that release is supported.  I agree with
this, but someone has to volunteer, and the length of time a release is
supported also worked out.  Almost would be better going to a regular
release schedule (once every 3-6 months or so) where the code is given as is
to CPAN, whether it passes tests or not.

2)  More communication about the direction Bioperl is heading; personally I
haven't see a problem with this as much as there is no information about a
roadmap.  That is being alleviated soon I believe, thought people out there
need to be patient.

3)  Volunteer.  If you have something you believe needs to be done and you
believe so fervently, then put up or shut up.  Make (nice polite)
suggestions otherwise.  Don't judge code or "the way things are done" and
don't presume what kind of experience people have that you don't know and
haven't met.  End of story.

Chris


From torsten.seemann at infotech.monash.edu.au  Mon Jun 26 22:57:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 27 Jun 2006 12:57:47 +1000
Subject: [Bioperl-l] Comments on new PDOC documentation
Message-ID: <44A09EAB.2030401@infotech.monash.edu.au>

Hello all,

I am very happy to see the PDOC software has been improved, as I use the 
  online web documentation frequently. Thanks to Jason, Raphael and 
Patrick for making this happen.

http://doc.bioperl.org/bioperl-live/

Now for some comments...

1. CSS

It uses CSS which is excellent, reducing HTML size and allowing easy 
tweaks to the design. However its current implementation has some issues:

A. it seems to only use ID, rather than CLASS, to specify styles.
    ID values must be unique in a page, and are for one-off styles.
    CLASS may be re-used throught a page. eg "sub" and "subArea".
    Many browsers do not enforce this however...

B. it seems to be doing unusual, but possibly deliberate, things with
    the POD when determining what CSS ID to give it, but perhaps this is
    more to do with how Bioperl formats the POD on some subheadings
    eg.
    <a name="_pod_Reporting Bugs" id="_pod_Reporting Bugs">
    <a name="_pod_AUTHOR - Ewan Birney" id="_pod_AUTHOR - Ewan Birney">

C. the "Description" sections etc are in a proportional font, but
    I think it should be "font-family: monospace" as many authors have
    exploited the traditional monospace of most editors to format
    their comments, which are now lost

2. FRAMES

I notice it still uses HTML Frames. Although this reduces code size 
also, it makes it impossible to LINK directly to a specific 
documentation page with all the frames intact. It may be better to use 3 
DIV elements which are part of each page, and they could be server-side 
included so there is no HTML duplication.

3. MERGING OF BIOPERL DOCS

One facet of the docs I find frustrating is that bioperl-live and 
bioperl-run (and the others) are separate! This means that you have to 
keep switching between them, and more importantly, class-names to 
classes in other packages are not present; this is particularly bad when 
browsing bioperl-run.

Is there any chance of creating a "merged" bioperl-doc page somehow?

4. STYLE

Choice of colours and layouts is such a personal thing.
I guess people can download http://doc.bioperl.org/css/perl.css
and re-edit it, and get their Browser to over-ride the supplied CSS with 
  their version.

5. CONCLUSION

Please don't get the wrong idea, I love the new PDOC, I would just like 
to love it more. And yes I understand the nightmare that is parsing 
Perl/POD and generating compatible CSS :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From bix at sendu.me.uk  Tue Jun 27 06:21:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 11:21:57 +0100
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <44A106C5.9040706@sendu.me.uk>

Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
Is the idea of a Bio::Score of interest? See bug, but basically an 
object that can handle multiple kinds of scores effectively.

I would like to use such a thing in Bioperl, but what standard needs to 
be met before Bioperl gets a new kind of object?


From hlapp at gmx.net  Tue Jun 27 08:24:16 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 08:24:16 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A106C5.9040706@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
Message-ID: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>

So you basically want to attach semantic information to a number, and  
type the number thereby?

If so, an ontology would be the more natural choice (and in the end  
more flexible one) for expressing this kind of information.

Have you looked at the concept of 'quantitation types', e.g. in MAGE  
(the XML [MGAE-ML] or the object model [MAGE-OM])?

There is no quantitation type ontology at a repository I know of. I  
have used my own ones in the past and they have been pretty useful.

	-hilmar

On Jun 27, 2006, at 6:21 AM, Sendu Bala wrote:

> Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
> Is the idea of a Bio::Score of interest? See bug, but basically an
> object that can handle multiple kinds of scores effectively.
>
> I would like to use such a thing in Bioperl, but what standard  
> needs to
> be met before Bioperl gets a new kind of object?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 08:52:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 13:52:05 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A129F5.3030500@sendu.me.uk>

Hilmar Lapp wrote:
> So you basically want to attach semantic information to a number, and 
> type the number thereby?

Basically, I want to be able to stick a bunch of (different kinds of) 
numbers into an object, and later get the 'best' one out (of a 
particular kind), or sort multiple of those objects.


> If so, an ontology would be the more natural choice (and in the end more 
> flexible one) for expressing this kind of information.

I'm not really sure I understand 'and type the number', or what (useful) 
flexibility doing it with an ontology would provide.


> Have you looked at the concept of 'quantitation types', e.g. in MAGE 
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

I had a quick look, but not really sure what you intended to suggest here.


> There is no quantitation type ontology at a repository I know of. I have 
> used my own ones in the past and they have been pretty useful.

Can you provide a brief example of what you mean?

If it would be appropriate to implement a Bio::Score with an ontology 
that's fine. Would we want a Bio::Score implemented though? Or are you 
suggesting each module make it's own quantitation type ontology when it 
wants to deal with numerous scores?

I like the idea of a Bio::Score because then you can compare complex 
scores from multiple different unrelated modules.


From cjfields at uiuc.edu  Tue Jun 27 10:08:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 09:08:57 -0500
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
Message-ID: <001e01c699f3$3b6cda50$15327e82@pyrimidine>

> Hilmar Lapp wrote:
> > So you basically want to attach semantic information to a number, and
> > type the number thereby?
> 
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.

The 'best one' might be tricky when dealing with different kinds of scores,
esp. scores calculated different ways.  For instance, I run RNA motif
programs quite frequently (RNAMotif, ERPIN, Infernal), but all generate
'scores' based on different criteria (algorithms, different parameters, how
the author slept, and so on).  RNAMotif in particular is hard to deal with
(though a great program) b/c the scores are based on criteria in the
descriptor file (the file used to describe the motif), so aren't comparable
to other descriptors, which may have their own method of generating scores,
let alone output from other programs.  Which one would be 'the best?'  It's
a bit subjective since the scores are predictive based upon your input,
various program limitations, specific program parameter implementations,
etc.  

I do like the idea of grouping together scores for comparison, such as when
a particular region of DNA has multiple hits from different programs with
different scores.  It would at least suffice as a test on how various
programs or experimental data would compare with one another.

> > If so, an ontology would be the more natural choice (and in the end more
> > flexible one) for expressing this kind of information.
> 
> I'm not really sure I understand 'and type the number', or what (useful)
> flexibility doing it with an ontology would provide.

I'm not sure, but maybe something along the lines of what the number (the
score) actually means, especially when compared to other scores.  In other
words, how you could compare one score or number versus the other.  An
ontology would allow more complex information to be included along with the
score information so one could make more informed choices based on how the
score was obtained, the algorithm used, the program involved, etc.  Hence
flexible.  Is that close, Hilmar?

To use my RNA program example above, I could include the information about
how the scores were obtained, the programs involved, parameters used, the
various raw scores, the time it took to run the program, etc. (i.e. you
could make it as specific as you wanted).  This could also be extended to
other data types as well besides program, such as wet bench experimental
data and so on, which I deal with quite a bit.  I think there are a few XML
specs out there besides MAGE that do this as well but I can't think of any
off the top of my head.

> > Have you looked at the concept of 'quantitation types', e.g. in MAGE
> > (the XML [MGAE-ML] or the object model [MAGE-OM])?
> 
> I had a quick look, but not really sure what you intended to suggest here.

I think the idea is that MAGE, strictly as an example, deals with microarray
data from different sources or different data systems for comparison.
Sounds a little like what you want to do.

> > There is no quantitation type ontology at a repository I know of. I have
> > used my own ones in the past and they have been pretty useful.
> 
> Can you provide a brief example of what you mean?
> 
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology when it
> wants to deal with numerous scores?
> 
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.

Which is what MAGE does in a way, but more specifically, i.e. just
microarray data from different sources.  So the array data may be calculated
in different ways based upon the specs for different machines, the way array
slides were prepared, how the experimenter slept, etc.

Chris


From hlapp at gmx.net  Tue Jun 27 10:27:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 10:27:55 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
Message-ID: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>

I would have suggested initiating a quantitation type ontology, not  
one individual per module.

An ontology would capture all your semantic information (min/max or  
range, higher or lower is better, what is a reasonable default [not  
sure there would be one], etc) and you would have a hierarchical  
structure.

You type a score by associating it with an ontology term:

	BLAST_e-value is-a expectation_value
	expectation_value has-min-value 0
	expectation_value has-max-value positive_infinity
	BLAST_p-value is-a probability_value
	probability_value has-min-value 0
	probability_value has-max-value 1
	
etc and then something being an expectation_value for instance would  
imply several attributes laid down in the ontology (probably through  
has-a statements).

It seems to me that essentially what you are trying to do is  
capturing knowledge for particular types of scores, which you would  
then use in more general purpose programs to sort from more to less  
significant, and possibly filter? If so, then hard-coding this into  
objects (all over the place or in a single place) is typically not  
the best practice; rather, the usual best-practice approach is using  
(and if necessary, constructing) an ontology. This is also the most  
re-usable approach.

	-hilmar

On Jun 27, 2006, at 8:52 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> So you basically want to attach semantic information to a number, and
>> type the number thereby?
>
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.
>
>
>> If so, an ontology would be the more natural choice (and in the  
>> end more
>> flexible one) for expressing this kind of information.
>
> I'm not really sure I understand 'and type the number', or what  
> (useful)
> flexibility doing it with an ontology would provide.
>
>
>> Have you looked at the concept of 'quantitation types', e.g. in MAGE
>> (the XML [MGAE-ML] or the object model [MAGE-OM])?
>
> I had a quick look, but not really sure what you intended to  
> suggest here.
>
>
>> There is no quantitation type ontology at a repository I know of.  
>> I have
>> used my own ones in the past and they have been pretty useful.
>
> Can you provide a brief example of what you mean?
>
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology  
> when it
> wants to deal with numerous scores?
>
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 11:25:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 16:25:06 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
Message-ID: <44A14DD2.7000402@sendu.me.uk>

Hilmar Lapp wrote:
> I would have suggested initiating a quantitation type ontology, not one 
> individual per module.

Where would such a thing 'live'? Would it be some static file somewhere 
that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology 
that can added to by a module when it needs extra terms to describe its 
particular kind of scores?


> An ontology would capture all your semantic information [snip]

Thanks, I agree that an ontology would be the way to do it...


> It seems to me that essentially what you are trying to do is capturing 
> knowledge for particular types of scores, which you would then use in 
> more general purpose programs to sort from more to less significant, and 
> possibly filter?

Yes.


> If so, then hard-coding this into objects (all over the 
> place or in a single place) is typically not the best practice; rather, 
> the usual best-practice approach is using (and if necessary, 
> constructing) an ontology. This is also the most re-usable approach.

Not having any experience with ontolgies, I can't think how this would 
all be done in practice though. Don't we need some central module 
(Bio::Score) to create the ontology (or read it in) and then present 
some suitable interface to it? For example, modules that wanted to store 
some scores might just ask Bio::Score for the ontology and type their 
scores by associating with an available ontology term, creating new 
terms if necessary (or is that something you would never do; the 
ontology needed to have been set up to cover all possible terms?). Then 
when the user has a bunch of these typed scores, surely he doesn't want 
to deal with going through the ontology himself to work out what it all 
means? Well, he could if he needs that level of control, but also he 
just wants to say Bio::Score->sort(x y z) or something.


From bix at sendu.me.uk  Tue Jun 27 12:13:46 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 17:13:46 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
Message-ID: <44A1593A.809@sendu.me.uk>

Cook, Malcolm wrote:
>
> All this semantic cruft is overkill for a moving target and will never
> settle down until your analysis results are no longer relevant.

I'm not sure what you mean by that. What moves? An evalue will always be 
an evalue. Once you know that you are in fact dealing with an evalue, 
and once your sorting algorithm knows that lower evalues are better, 
nothing changes. Likewise for other kinds of scores.

Instead of having to discover that a particular program is giving you an 
evalue, and then writing code to deal with an evalue appropriately, I 
thought it would be nicer to have a single module that knew how to deal 
with it already.


From MEC at stowers-institute.org  Tue Jun 27 12:01:45 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 27 Jun 2006 11:01:45 -0500
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>

For the use case of TFBS analysis demonstrated in the attachment to the
bug, I would expect to find potentially three scores, ala, {evalue,
bitscore, and percentmatch}.  To deal with this in existing framework
(i.e. GFF/bioperl analysis modules/TFBS), I would try to make GFFx eat
scalars as scores and pack the three values into a string and unpack
them as needed for sorting, etc.  Else put the one score I know I'm
going to 'use' in a particular analysis into 'score' and adorn column 9
with the rest.

All this semantic cruft is overkill for a moving target and will never
settle down until your analysis results are no longer relevant.

my $.02

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>Sent: Tuesday, June 27, 2006 5:22 AM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::Score of interest?
>
>Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
>Is the idea of a Bio::Score of interest? See bug, but basically an 
>object that can handle multiple kinds of scores effectively.
>
>I would like to use such a thing in Bioperl, but what standard 
>needs to 
>be met before Bioperl gets a new kind of object?
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bix at sendu.me.uk  Tue Jun 27 14:07:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 19:07:44 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
References: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
Message-ID: <44A173F0.4040302@sendu.me.uk>

Chris Fields wrote:
>> Hilmar Lapp wrote:
>>> So you basically want to attach semantic information to a number, and
>>> type the number thereby?
>> Basically, I want to be able to stick a bunch of (different kinds of)
>> numbers into an object, and later get the 'best' one out (of a
>> particular kind), or sort multiple of those objects.
> 
> The 'best one' might be tricky when dealing with different kinds of scores,
> esp. scores calculated different ways.

I didn't make myself very clear, but you don't compare different kinds 
of scores. When you want to compare two different Score objects, each of 
which may contain multiple different kinds of scores, you pick the kind 
of score you're interested in, and for that kind of score ask which 
object has the 'best' score. I can't readily think of any exceptions to 
the rule that 'best' is either the higher score or the lower score, 
depending on what kind of score you've chosen.

I may not have made myself clear in another way. One of the ideas behind 
a Bio::Score is to have a container object for multiple different kinds 
of scores (and even multiple values per kind) all generated by one 
program in one analysis on one data set.
The container then lets you pick the kind of score you want to work with 
and compare its scores with those in other Bio::Score objects that 
contain the same kind of score (most probably, ones made by the same 
analysis program but on different data sets).

Furthermore, the kind of score you want to work with could have multiple 
values from that single analysis. So the container also lets you 
summarise these values (eg. average them) before trying to compare with 
another Score object. Often, it may be that for a certain kind of score 
it makes sense (it is intended by the score-generating program) to 
always summarise the values in a certain way. So the container needs to 
know about that and 'do the right thing' so the user can just compare 
things without having to trouble himself.

So this is why I feel that to just 'use an ontology' isn't enough. 
Certainly one ought to be used when defining the kinds, but you need 
some single interface with useful methods that lets you deal with the 
actual score values easily.


From cjfields at uiuc.edu  Tue Jun 27 14:56:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 13:56:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060627181439.GD51742@iib.unsam.edu.ar>
Message-ID: <000a01c69a1b$6d0338c0$15327e82@pyrimidine>

> | 1)  Commits should be made to stable releases (as well as to the main
> branch
> | in CVS) to fix bugs as long as that release is supported.  I agree with
> | this, but someone has to volunteer, and the length of time a release is
> | supported also worked out.
> 
> I volunteer to do that (merge approved changes/fixes back to
> a stable branch), though as said by others, 1.4 may not be
> the most appropriate 'stable' branch, as too many changes
> have accumulated, and maybe it's not worth it. But I could
> do that for the next 'stable' release, 1.6 or 2.0 whichever
> comes next.
> 
> As per the length of time, I would say that a stable release
> should be supported at least until another 'stable' release
> is made. Or until it's no longer being used in production
> setups, which is only feasible to know in small
> communities.

I'm posting this to the mail list so that others can respond.

Kevin Brown (in a response to me) made some good points about updating and
maintaining stable releases in that only bug fixes are committed (i.e. no
refactoring, no new modules or features).  I personally wouldn't have a
problem in someone doing this, releasing periodic updates to stable or
developer releases to fix bugs only but I may be in the minority here.  The
rest of the core guys and others need to also speak their thoughts.  I hate
forwarding this to Jason since he's in the middle of getting ready for a
move but I think this is important enough to do so.

I can say that I am unequivocally against updating 1.4.  Too much has
changed since then and I think it would be a mess trying to figure out what
bug fixes to include, etc.  

I also am very much against placing developer's releases in CPAN; those
releases are not intended to be completely stable as they may be
implementing new features that haven't been tested completely and may
contain various other bugs.  v 1.5.1 is remarkably stable for a developer's
release but several bug fixes have been made since.  If someone wants to try
out the developer's versions or bioperl-live they are most welcome to it;
the web site docs give all the instructions one needs to install from pretty
much any platform.

Beyond that, I'm spent on this thread.

Chris 


From lstein at cshl.edu  Tue Jun 27 18:35:08 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 27 Jun 2006 18:35:08 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <200606271835.09558.lstein@cshl.edu>

Hi All,

This is rather late, but just for future reference on the mailing list,  here 
is how I would do the task using Bio::DB::Fasta.

Script 1: index the file for future use:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	
	my $filename = shift;  # name of file to index on command line
	Bio::DB::Fasta->new($filename,-makeid=>\&make_my_id)
		or die "Indexing failed";
	print "Indexing succeeded!\n";
	exit 0;

	sub make_my_id {
		my $description_line = shift;
		$description_line =~ /(\d+_at)/ or die "malformed description line";
		return $1;
	}

Run this script once to create a reusable index of the file. The index will be 
stored in the same directory as the FASTA file.

Script 2: extract the sequences using the IDs stored in a second file:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	use Bio::SeqIO;
	use IO::File;

	my $indexed_fasta_file = shift;
	my $probe_id_file         = shift;

	# open up the indexed fasta file
	my $db = Bio::DB::Fasta->new($indexed_fasta_file) or die;
	# open up a FASTA writer
	my $out = Bio::SeqIO->new(-format=>'Fasta',-fh=>\*STDOUT) or die;
	# open the probe id file
	my $in   = IO::File->new($probe_id_file) or die;

	# do the work
	while (my $id = <$in>) {
		chomp $id;
		my $seq = $db->get_Seq_by_id($id) or die;
		$out->write_seq($seq);
	}

	exit 0;

Bio::Index::Fasta will work in almost exactly the same way. The only 
difference is that the Bio::DB::Fasta will allow you to retrieve subsequences 
efficiently.

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From awitney at sgul.ac.uk  Tue Jun 27 10:08:20 2006
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 27 Jun 2006 15:08:20 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A13BD4.60802@sgul.ac.uk>


> Have you looked at the concept of 'quantitation types', e.g. in MAGE  
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

the MGED Ontology has a concept of quantitation type if that helps

http://mged.sourceforge.net/ontologies/MGEDontology.php#QuantitationType


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From william.hsiao at gmail.com  Tue Jun 27 15:52:03 2006
From: william.hsiao at gmail.com (William Hsiao)
Date: Tue, 27 Jun 2006 12:52:03 -0700
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
Message-ID: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>

Hi all,
   I've encountered a strange problem while parsing a gff file from
NCBI using perl.  I'm hoping that someone on the list may have a
solution even though this is not a bioperl issue.  Maybe someone
familiar with gff3 parsing can help :)  Essentially, I'm parsing a gff
file into a nested hash structure using the following functions:

sub parse_gff {
    my $file = shift;
    my %hash_gff;
    open (INFILE, $file) or die "Cannot find file $file\n";
    while(<INFILE>){
	next if (/^\#/);
	chomp;
	my ($seqid, $source, $type, $start, $end, $score, $strand, $phase,
$attributes) = split /\t/;
	my $attri_ref = &process_attributes($attributes);
	my %record = ('seqid'     => $seqid,
		      'source'    => $source,
		      'type'      => $type,
		      'start'     => $start,
		      'end'       => $end,
		      'score'     => $score,
		      'strand'    => $strand,
		      'phase'     => $phase,
		      'attribute' => $attri_ref);
	push @{$hash_gff{$type}}, \%record;
    }
    close INFILE;
    print Dumper %hash_gff;
    return \%hash_gff;
}

sub process_attributes {
    my $attr_string = shift;
    my @attributes = split (/\;/, $attr_string);
    my %attr;
    foreach (@attributes){
	my ($key, $value) = split /=/;
	if ($value=~/\:/){
	    my ($subkey, $subvalue) = split (/:/, $value);
	    $attr{$key}{$subkey}=$subvalue;
	}
	else{
	    $attr{$key}=$value;
	}
    }
    return \%attr;
}

   It works for all the gff files we downloaded from NCBI's microbial
genomes refseq ftp repository.  However, 3 lines from one particular
file NC_005966.gff (of Acinetobacter_sp_ADP1) can not be parsed
properly.  These lines are:

NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	start_codon	636487	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	stop_codon	635833	635835	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

   They generate an error: Can't use string
("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
 The strange part is that all I have to do is replace the word
"function" in front of "=adaptation%20to%20stress;" with another word
or simply change it to functions or functio or Function, etc, then the
line parses properly.  If I retype the word "function", it doesn't
solve the problem.  For some strange reason, when the word "function"
is there, perl tried to use "adaptation%20to%20stress" as the hash key
and failed.  The word "function" is used in other lines as well so I
don't think the problem is not caused by the word alone.
    Any suggestion on what might be happening would be greatly
appreciated.  Thank you.

Cheers,

Will

-- 
William Hsiao
PhD Student, Brinkman Laboratory
Department of Molecular Biology and Biochemistry
Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6
Phone: 604-291-4206 Fax: 604-291-5583


From bix at sendu.me.uk  Wed Jun 28 04:25:52 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 09:25:52 +0100
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
In-Reply-To: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
References: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
Message-ID: <44A23D10.1010308@sendu.me.uk>

William Hsiao wrote:
>
> sub process_attributes {
>     my $attr_string = shift;
>     my @attributes = split (/\;/, $attr_string);
>     my %attr;
>     foreach (@attributes){
> 	my ($key, $value) = split /=/;
> 	if ($value=~/\:/){
> 	    my ($subkey, $subvalue) = split (/:/, $value);
             # assign hashref to $key, assign key => value pair to that
> 	    $attr{$key}{$subkey}=$subvalue;
> 	}
> 	else{
             # assign scalar $key
> 	    $attr{$key}=$value;
> 	}
>     }
>     return \%attr;
> }

> NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

>    They generate an error: Can't use string
> ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
>  The strange part is that all I have to do is replace the word
> "function" in front of "=adaptation%20to%20stress;" with another word
> or simply change it to functions or functio or Function, etc, then the
> line parses properly.

The problem is that these lines contain function=x twice, where the 
second x contains a colon.
So your code first assigns $attr{function} = $scalar, and then tries to
do $attr{function}{before_colon} = "after_colon".

Normally the latter would auto-vivicate $attr{function} as a hash 
reference: $attr{function} == HASH(xyz) and then set before_colon => 
after_colon as a key value pair of HASH(xyz). But in this case, 
$attr{function} already exists: $attr{function} == 
"adaptation%20to%20stress". But you try and set before_colon => 
after_colon as a key value pair of that string. Which you can't do.

Basically, your data structure isn't so great, mixing scalars and hash 
references as values of %attr.

The solution may be to parse using Bioperl instead ;).


From selvik at ufl.edu  Tue Jun 27 08:54:48 2006
From: selvik at ufl.edu (Kadirvel, Selvi)
Date: Tue, 27 Jun 2006 08:54:48 -0400 (EDT)
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
Message-ID: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>

All,

(I am new to Bioinformatics and Bioperl, so please apologize if I 
get my terminology wrong)

I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
report consists of three sections namely;

1. A ranked list of the best scoring HMMs
2. A list of the best scoring domains in order of their occurrence 
in the sequence
3. Alignments for all the best scoring domains.

Section 3 can be truncated to a specific number using the ??A? 
option when building the report.

Though the Bio::SearchIO::hmmer module parses through the entire 
HMMER report (Section 1, 2 and 3), the set of values made 
available through Bio::Search::Result::ResultI seem to be using 
Section 3 alone. So when we use the ?A option to truncate, we lose 
otherwise useful information in Section 1. This information is 
lost (only) for those models that do not have any of their domains 
in the top ?A number of? best scoring domains. The fields that are 
not available are:

1.	Description of a model
2.	Score of a model
3.	Evalue of a model

If I use the older Bio::Tools::HMMER:Results module, NEITHER 
Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to 
retrieve the above listed values. Scores and Evalues are available 
for each domain but not for the model it belongs to.

I was wondering if there is any other method to access these 
values or do I have to write my own module to do this?

Any ideas/suggestions would be greatly appreciated.

Thank you!


Selvi Kadirvel

Graduate Research Assistant
High Performance Computing Center
University of Florida


From hlapp at gmx.net  Tue Jun 27 20:18:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 20:18:36 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A14DD2.7000402@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
	<44A14DD2.7000402@sendu.me.uk>
Message-ID: <E4565670-479B-4247-A3CB-3DA998AF8456@gmx.net>


On Jun 27, 2006, at 11:25 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> I would have suggested initiating a quantitation type ontology,  
>> not one
>> individual per module.
>
> Where would such a thing 'live'? Would it be some static file  
> somewhere
> that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology
> that can added to by a module when it needs extra terms to describe  
> its
> particular kind of scores?

For instance, yes. Once you read in an ontology (through  
Bio::OntologyIO indeed) it sits essentially in memory.

> [...]
> Not having any experience with ontolgies, I can't think how this would
> all be done in practice though. Don't we need some central module
> (Bio::Score) to create the ontology (or read it in) and then present
> some suitable interface to it?

Possibly - the problem is how to get the ontology=typed term given an  
analysis program and attribute name (e.g. 'score' of a feature  
object). There is no method for doing this on a feature object and  
bolting one on would be a bad idea I think.

So, the Bio::Score would be a little hybrid between an objectified  
score value that now doesn't just have a numeric value but also a  
type term, and a factory for creating the ontology (e.g., by reading  
it in from a specified or default location). I.e., you'd have

	my $value = $score->value();
	my $type = $score->type();
	# $type is-a Bio::Ontology::TermI
	my $quant_ont = $type->ontology();
	
	# see what type of score we have
	my @ancestors = $quant_ont->get_ancestor_terms($type);
	if (grep {$_->name eq 'expectation_value'} @ancestors) {
		# it's an e-value
	} elsif ( ...test for some other type...) {
		# etc
	}


> For example, modules that wanted to store
> some scores might just ask Bio::Score for the ontology and type their
> scores by associating with an available ontology term, creating new
> terms if necessary (or is that something you would never do; the
> ontology needed to have been set up to cover all possible terms?).

Yes. You'd extend it as you encounter types that aren't in the  
ontology yet, until the ontology fully captures the knowledge domain.

> Then
> when the user has a bunch of these typed scores, surely he doesn't  
> want
> to deal with going through the ontology himself to work out what it  
> all
> means? Well, he could if he needs that level of control, but also he
> just wants to say Bio::Score->sort(x y z) or something.

See above for a quick example of the logic. I'd separate that into  
its own module, like Bio::Score::Utils.

	-hilmar

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 28 10:29:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:29:17 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>

Selvi, 

Can you send me the report you are trying to parse as an attachment?  I'll
give it a look.

Judging by the pdoc this is mapped for the event handler so it should be
there.  From the %MAPPING hash:

                 'HMMER_program'   => 'RESULT-algorithm_name',
                 'HMMER_version'   => 'RESULT-algorithm_version',
                 'HMMER_query-def' => 'RESULT-query_name',
                 'HMMER_query-len' => 'RESULT-query_length',
                 'HMMER_query-acc' => 'RESULT-query_accession',
                 'HMMER_querydesc' => 'RESULT-query_description',
                 'HMMER_hmm'       => 'RESULT-hmm_name',                 
                 'HMMER_seqfile'   => 'RESULT-sequence_file',
	           'HMMER_db'        => 'RESULT-database_name',

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> Sent: Tuesday, June 27, 2006 7:55 AM
> To: bioperl-l at lists.open-bio.org
> Cc: selvik at ufl.edu
> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> evalue, description)
> 
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A?
> option when building the report.
> 
> Though the Bio::SearchIO::hmmer module parses through the entire
> HMMER report (Section 1, 2 and 3), the set of values made
> available through Bio::Search::Result::ResultI seem to be using
> Section 3 alone. So when we use the ?A option to truncate, we lose
> otherwise useful information in Section 1. This information is
> lost (only) for those models that do not have any of their domains
> in the top ?A number of? best scoring domains. The fields that are
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model
> 
> If I use the older Bio::Tools::HMMER:Results module, NEITHER
> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> retrieve the above listed values. Scores and Evalues are available
> for each domain but not for the model it belongs to.
> 
> I was wondering if there is any other method to access these
> values or do I have to write my own module to do this?
> 
> Any ideas/suggestions would be greatly appreciated.
> 
> Thank you!
> 
> 
> 
> 
> Selvi Kadirvel
> 
> Graduate Research Assistant
> High Performance Computing Center
> University of Florida
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 10:55:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:55:31 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>
Message-ID: <003501c69ac2$e70623b0$15327e82@pyrimidine>

I hate responding to myself!!  Forgot to add that there is also
Bio::Tools::Hmmpfam :

http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam

I'll check if Bio::SearchIO catches this data and let you know what I find
out.  It should at least some according to the mapping.

Chris

> Selvi,
> 
> Can you send me the report you are trying to parse as an attachment?  I'll
> give it a look.
> 
> Judging by the pdoc this is mapped for the event handler so it should be
> there.  From the %MAPPING hash:
> 
>                  'HMMER_program'   => 'RESULT-algorithm_name',
>                  'HMMER_version'   => 'RESULT-algorithm_version',
>                  'HMMER_query-def' => 'RESULT-query_name',
>                  'HMMER_query-len' => 'RESULT-query_length',
>                  'HMMER_query-acc' => 'RESULT-query_accession',
>                  'HMMER_querydesc' => 'RESULT-query_description',
>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
> 	           'HMMER_db'        => 'RESULT-database_name',
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> > Sent: Tuesday, June 27, 2006 7:55 AM
> > To: bioperl-l at lists.open-bio.org
> > Cc: selvik at ufl.edu
> > Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> > evalue, description)
> >
> > All,
> >
> > (I am new to Bioinformatics and Bioperl, so please apologize if I
> > get my terminology wrong)
> >
> > I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> > report consists of three sections namely;
> >
> > 1. A ranked list of the best scoring HMMs
> > 2. A list of the best scoring domains in order of their occurrence
> > in the sequence
> > 3. Alignments for all the best scoring domains.
> >
> > Section 3 can be truncated to a specific number using the ??A?
> > option when building the report.
> >
> > Though the Bio::SearchIO::hmmer module parses through the entire
> > HMMER report (Section 1, 2 and 3), the set of values made
> > available through Bio::Search::Result::ResultI seem to be using
> > Section 3 alone. So when we use the ?A option to truncate, we lose
> > otherwise useful information in Section 1. This information is
> > lost (only) for those models that do not have any of their domains
> > in the top ?A number of? best scoring domains. The fields that are
> > not available are:
> >
> > 1.	Description of a model
> > 2.	Score of a model
> > 3.	Evalue of a model
> >
> > If I use the older Bio::Tools::HMMER:Results module, NEITHER
> > Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> > retrieve the above listed values. Scores and Evalues are available
> > for each domain but not for the model it belongs to.
> >
> > I was wondering if there is any other method to access these
> > values or do I have to write my own module to do this?
> >
> > Any ideas/suggestions would be greatly appreciated.
> >
> > Thank you!
> >
> >
> >
> >
> > Selvi Kadirvel
> >
> > Graduate Research Assistant
> > High Performance Computing Center
> > University of Florida
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Jun 28 11:04:29 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 16:04:29 +0100
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <44A29A7D.7020602@sendu.me.uk>

Kadirvel, Selvi wrote:
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I 
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence 
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A? 
> option when building the report.

What do you mean by this? What is ??A? ?
Is this an option you're supplying to hmmpfam or a bioperl module?


> Though the Bio::SearchIO::hmmer module parses through the entire 
> HMMER report (Section 1, 2 and 3), the set of values made 
> available through Bio::Search::Result::ResultI seem to be using 
> Section 3 alone. So when we use the ?A option to truncate, we lose 
> otherwise useful information in Section 1. This information is 
> lost (only) for those models that do not have any of their domains 
> in the top ?A number of? best scoring domains. The fields that are 
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model

Each hit you get back from each result of the SearchIO is a 
Bio::Search::Hit::HMMERHit and represents the results of a particular 
model (you can also say $result->next_model).

So you can say:
$hit->name, " ", $hit->description, " ", $hit->significance, " ", 
$hit->score;

To get the information you want.
General information about the result can be had like so:
print $result->query_name, " ", $result->algorithm, " ", 
$result->hmm_name, "\n";

I have another problem (or the same one as you? I'm can't tell...) in 
that I can only get a single result, hit and hsp from my hmmpfam file!
It is doing my head in, but I might be doing something wrong so will 
look into it further before posting a bug report.


From bix at sendu.me.uk  Wed Jun 28 12:46:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 17:46:57 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A29A7D.7020602@sendu.me.uk>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
	<44A29A7D.7020602@sendu.me.uk>
Message-ID: <44A2B281.7030806@sendu.me.uk>

Sendu Bala wrote:
[ from thread Bio::SearchIO - Accessing Model parameters (score, evalue, 
description) ]
[ concerning hmmpfam output ]
> I have another problem (or the same one as you? I'm can't tell...) in 
> that I can only get a single result, hit and hsp from my hmmpfam file!
> It is doing my head in, but I might be doing something wrong so will 
> look into it further before posting a bug report.

I was just doing something wrong, but...

Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report 
a single HSP per Hit so domains with multiple alignments get separate 
Hits (more FASTA like) since they aren't really HSPs'

Strangely 1.25 (Bioperl 1.4) seems to behave like that already.

In any case, this is extremely counter-intuitive, especially given that 
next_domain is a synonym of next_hsp. I think either the synonym 
relationship remains and hits have multiple hsps (and there is only one 
hit per model), or next_domain goes off and finds the hsp that is the 
next domain of the current model. But that would be incredibly broken in 
the current model since it would be found in a different hit object...

What hmmpfam does is take a database of models which can be thought of 
as database sequences. Then it aligns each one against your query 
sequences. A model could align in multiple locations along a query 
sequence. Each one of these locations is called a domain of the model. A 
user of hmmpfam is model-centric (wants to know which models are on his 
query), and so you want to know all about how well the model did in one 
go. So you should be able to get the results for a model ($hit = 
$result->next_model), get overall info about it ($hit->score etc.), then 
get more detailed information about each domain of it (while ($hsp = 
$hit->next_domain) {...}). But right now you only get one domain and you 
have to go searching through all your other hits to find a hit with the 
same ->name() as your model of interest to get the next domain of your 
model.

In my view this is less than ideal. What do people think? Should it be 
changed?


From selvik at ufl.edu  Wed Jun 28 11:21:37 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 11:21:37 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <003501c69ac2$e70623b0$15327e82@pyrimidine>
References: <003501c69ac2$e70623b0$15327e82@pyrimidine>
Message-ID: <2679E8D1-E225-4414-8925-1EB73B83523B@ufl.edu>

Thanks for your reply Chris.

I am attaching a part of the report I am trying to parse.

Also I see that, Bio::SearchIO::hmmer.pm is parsing all three  
sections. I am not sure how (or whether) fields from Section 1 are  
actually being made available through Bio::SearchIO or Bio::Search:: 
[Hit | Hsp | Result].

I'll look into Bio::Tools::Hmmpfam and let you know if that works for  
me.

-Selvi


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ManyQueries.hmmer
Type: application/octet-stream
Size: 3684451 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060628/53dcc875/attachment-0003.obj>
-------------- next part --------------


On Jun 28, 2006, at 10:55 AM, Chris Fields wrote:

> I hate responding to myself!!  Forgot to add that there is also
> Bio::Tools::Hmmpfam :
>
> http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam
>
> I'll check if Bio::SearchIO catches this data and let you know what  
> I find
> out.  It should at least some according to the mapping.
>
> Chris
>
>> Selvi,
>>
>> Can you send me the report you are trying to parse as an  
>> attachment?  I'll
>> give it a look.
>>
>> Judging by the pdoc this is mapped for the event handler so it  
>> should be
>> there.  From the %MAPPING hash:
>>
>>                  'HMMER_program'   => 'RESULT-algorithm_name',
>>                  'HMMER_version'   => 'RESULT-algorithm_version',
>>                  'HMMER_query-def' => 'RESULT-query_name',
>>                  'HMMER_query-len' => 'RESULT-query_length',
>>                  'HMMER_query-acc' => 'RESULT-query_accession',
>>                  'HMMER_querydesc' => 'RESULT-query_description',
>>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
>> 	           'HMMER_db'        => 'RESULT-database_name',
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
>>> Sent: Tuesday, June 27, 2006 7:55 AM
>>> To: bioperl-l at lists.open-bio.org
>>> Cc: selvik at ufl.edu
>>> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters  
>>> (score,
>>> evalue, description)
>>>
>>> All,
>>>
>>> (I am new to Bioinformatics and Bioperl, so please apologize if I
>>> get my terminology wrong)
>>>
>>> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
>>> report consists of three sections namely;
>>>
>>> 1. A ranked list of the best scoring HMMs
>>> 2. A list of the best scoring domains in order of their occurrence
>>> in the sequence
>>> 3. Alignments for all the best scoring domains.
>>>
>>> Section 3 can be truncated to a specific number using the ??A?
>>> option when building the report.
>>>
>>> Though the Bio::SearchIO::hmmer module parses through the entire
>>> HMMER report (Section 1, 2 and 3), the set of values made
>>> available through Bio::Search::Result::ResultI seem to be using
>>> Section 3 alone. So when we use the ?A option to truncate, we lose
>>> otherwise useful information in Section 1. This information is
>>> lost (only) for those models that do not have any of their domains
>>> in the top ?A number of? best scoring domains. The fields that are
>>> not available are:
>>>
>>> 1.	Description of a model
>>> 2.	Score of a model
>>> 3.	Evalue of a model
>>>
>>> If I use the older Bio::Tools::HMMER:Results module, NEITHER
>>> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
>>> retrieve the above listed values. Scores and Evalues are available
>>> for each domain but not for the model it belongs to.
>>>
>>> I was wondering if there is any other method to access these
>>> values or do I have to write my own module to do this?
>>>
>>> Any ideas/suggestions would be greatly appreciated.
>>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>> Selvi Kadirvel
>>>
>>> Graduate Research Assistant
>>> High Performance Computing Center
>>> University of Florida
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From akarger at CGR.Harvard.edu  Wed Jun 28 15:49:54 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 28 Jun 2006 15:49:54 -0400
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
Message-ID: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>

>perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
-e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: acc (P09651) does not exist
---------------------------------------------------
>perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
$format="fasta";' -e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: id (ROA1_HUMAN) does not exist
---------------------------------------------------

But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
Same error for a couple other proteins.
Works for a GenBank protein.

perl 5.8.6
Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp

This worked a few months ago.
What's going on?

-Amir Karger


From cjfields at uiuc.edu  Wed Jun 28 16:27:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 15:27:15 -0500
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>
Message-ID: <006901c69af1$412c3590$15327e82@pyrimidine>

This was a recent bug due to recent changes in EBI's remote database; they
changed the name of the database from 'swall' to 'uniprot'.  Update to
bioperl-live from CVS (or just Bio::DB::SwissProt) and that should fix it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, June 28, 2006 2:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
> 
> >perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
> -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: acc (P09651) does not exist
> ---------------------------------------------------
> >perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
> $format="fasta";' -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: id (ROA1_HUMAN) does not exist
> ---------------------------------------------------
> 
> But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
> Same error for a couple other proteins.
> Works for a GenBank protein.
> 
> perl 5.8.6
> Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp
> 
> This worked a few months ago.
> What's going on?
> 
> -Amir Karger
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed Jun 28 16:39:43 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 28 Jun 2006 13:39:43 -0700
Subject: [Bioperl-l] FW:  How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <1A4207F8295607498283FE9E93B775B4019719A4@EX02.asurite.ad.asu.edu>

This was supposed to go to the list...  Still not used to Outlook...

> The points made here, as I see them:
> 
> 1)  Commits should be made to stable releases (as well as to 
> the main branch in CVS) to fix bugs as long as that release is
supported.  I 
> agree with this, but someone has to volunteer, and the length of time
a 
> release is supported also worked out.  Almost would be better going to
a regular
> release schedule (once every 3-6 months or so) where the code is given
as is
> to CPAN, whether it passes tests or not.

What I've seen in other projects is that stable is supported and bug
patched up till the next stable release.  After that support is dropped.
Once a branch was tagged stable the ONLY thing that went into it was
fixes for bugs based on the code already present.  No new features, no
refactoring of any code or modules.  I'm not certain how often things
like a stable patch release happened since most of the bugs were worked
on long before while it was still tagged as dev.  I could see, worst
case a .x release to stable every 6 months to a year until the next
stable came out if there were patches to it.  It looks like the wiki has
most of this kind of stuff documented in the previously posted link:
http://www.bioperl.org/wiki/Making_a_BioPerl_release.  I guess it would
just need a pumpkin/monkey/whatever to step up to keep things rolling...

> 2)  More communication about the direction Bioperl is 
> heading; personally I
> haven't see a problem with this as much as there is no 
> information about a
> roadmap.  That is being alleviated soon I believe, thought 
> people out there
> need to be patient.
> 
> 3)  Volunteer.  If you have something you believe needs to be 
> done and you
> believe so fervently, then put up or shut up.  Make (nice polite)
> suggestions otherwise.  Don't judge code or "the way things 
> are done" and
> don't presume what kind of experience people have that you 
> don't know and
> haven't met.  End of story.
> 
> Chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Jun 28 18:14:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:14:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A2B281.7030806@sendu.me.uk>
Message-ID: <007e01c69b00$2e091410$15327e82@pyrimidine>

> Sendu Bala wrote:
> [ from thread Bio::SearchIO - Accessing Model parameters (score, evalue,
> description) ]
> [ concerning hmmpfam output ]
> > I have another problem (or the same one as you? I'm can't tell...) in
> > that I can only get a single result, hit and hsp from my hmmpfam file!
> > It is doing my head in, but I might be doing something wrong so will
> > look into it further before posting a bug report.
> 
> I was just doing something wrong, but...
> 
> Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report
> a single HSP per Hit so domains with multiple alignments get separate
> Hits (more FASTA like) since they aren't really HSPs'
> 
> Strangely 1.25 (Bioperl 1.4) seems to behave like that already.
> 
> In any case, this is extremely counter-intuitive, especially given that
> next_domain is a synonym of next_hsp. I think either the synonym
> relationship remains and hits have multiple hsps (and there is only one
> hit per model), or next_domain goes off and finds the hsp that is the
> next domain of the current model. But that would be incredibly broken in
> the current model since it would be found in a different hit object...
>
> What hmmpfam does is take a database of models which can be thought of
> as database sequences. Then it aligns each one against your query
> sequences. A model could align in multiple locations along a query
> sequence. Each one of these locations is called a domain of the model. A
> user of hmmpfam is model-centric (wants to know which models are on his
> query), and so you want to know all about how well the model did in one
> go. So you should be able to get the results for a model ($hit =
> $result->next_model), get overall info about it ($hit->score etc.), then
> get more detailed information about each domain of it (while ($hsp =
> $hit->next_domain) {...}). But right now you only get one domain and you
> have to go searching through all your other hits to find a hit with the
> same ->name() as your model of interest to get the next domain of your
> model.
> 
> In my view this is less than ideal. What do people think? Should it be
> changed?

The model (hit-like) table scores are retained and can be retrieved via
$model->significance and the individual domain (hsp-like) evalues via
$model->evalue.  The reason you don't get all the individual domain evalues
is that only five alignments are returned by default.  You might try
changing the 'A' parameter to see if you can get more alignments; that may
work around the problem of missing domains for now.  You'll note that the
Model/Domain results returned are not based on top score but what looks like
the position of the domain in the sequence (seq-t in the last table); that's
what is stated in the hmmpfam docs.  Anyway, I tried this loop with the
reports Selvi sent and it works, but only for the ones that return
alignments:

my $result_count = 1;
while ( my $result = $searchio->next_result() ) {
  print "Result $result_count : ",$result->query_name,"\n";
  print "Result models: ",$result->num_hits,"\n";
  while (my $model = $result->next_hit) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_hsp) {
	  print "\t\tDomain : ",$domain->name,"\n";
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
  $result_count++;
}

>From the HMMER docs: "Say you have a new sequence that, according to a BLAST
analysis, shows a slew of hits to receptor tyrosine kinases. Before you
decide to call your sequence an RTK homologue, you suspiciously recall that
RTK's are, like many proteins, composed of multiple functional domains, and
these domains are often found promiscuously in proteins with a wide variety
of functions. Is your sequence really an RTK? Or is it a novel sequence that
just happens to have a protein kinase catalytic domain or fibronectin type
III domain?"

Model/domain pairs really aren't Hits/HSPs by definition, like the CVS
commit from Jason states.  The way Pfam is set up, you actually have your
query(ies) scanned using a database of Pfam domains (HMM's, built from
protein alignments for various protein families), hence the alignment in the
report is not a HSP since HSPs come from pairwise sequence alignments.  An
HSP is a pair of sequences which, when aligned, meet or exceed a maximal
cutoff.  The hmmpfam report has alignments of the sequence and the consensus
for the alignment the HMM is based on (not another sequence, so not an HSP).
This is also the same reason you can't get alignments from
Bio::Search::HSP::HMMERHSP objects since the model 'sequence' isn't a true
sequence but a consensus of sequences, so it's 'inappropriate' to use that
as an actual alignment.  Bad Bioperl user!  Bad!

I think the reasoning for keeping single model-domain pairs is that you
should consider each domain's location in the sequence as well as the number
of times they appear, regardless of whether they belong to the same model or
not.  One protein could have three ATP-binding domains and another two, and
they could be located in different positions on the sequence.  But where
they are on the sequence in relation to other domains and to each other
(i.e. positional information) is just as important, maybe more so, than how
many times that domain appears.  

Well, that and SearchIO is set up as a SAX-like parser, so I believe it
processes the model-domain alignments as the file is parsed.

My 2c: there should be a way to get all model-domain pairs in the "parsed
for domains" table (which is like a list of HSPs).  Seems the last few w/o
alignments are not retained; this may be the way the parser is set up.  I
would try getting the handler to return just evalues and similar stuff for
those and leave out sequence/alignment info, if that's possible.  Not sure
how this is handled with BLAST reports where there are more hits reported
than alignments...

Chris
_____________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 18:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:16:38 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
Message-ID: <000001c69b00$86adcc00$15327e82@pyrimidine>

Arghhhh!  Made a mistake:

> my $result_count = 1;
> while ( my $result = $searchio->next_result() ) {
>   print "Result $result_count : ",$result->query_name,"\n";
>   print "Result models: ",$result->num_hits,"\n";
>   while (my $model = $result->next_hit) {
> 	print "\tModel : ",$model->name,"\n";
> 	print "\tSignif: ",$model->significance,"\n";
> 	while (my $domain = $model->next_hsp) {
> 	  print "\t\tDomain : ",$domain->name,"\n";
                              ^^^^^^^
Should be:                    $model

> 	  print "\t\tEvalue : ",$domain->evalue,"\n";
> 	}
>   }
>   $result_count++;
> }

My bad!

Chris


From bix at sendu.me.uk  Wed Jun 28 19:00:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 00:00:11 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <007e01c69b00$2e091410$15327e82@pyrimidine>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
Message-ID: <44A309FB.2050009@sendu.me.uk>

Chris Fields wrote:
>> Sendu Bala wrote:
[snip]
>> In any case, this is extremely counter-intuitive, especially given
>> that next_domain is a synonym of next_hsp. I think either the
>> synonym relationship remains and hits have multiple hsps (and there
>> is only one hit per model)
[snip]

> The model (hit-like) table scores are retained and can be retrieved
> via $model->significance and the individual domain (hsp-like) evalues
> via $model->evalue.

I know, see my earlier post.

> The reason you don't get all the individual domain evalues is that
> only five alignments are returned by default.  You might try changing
> the 'A' parameter to see if you can get more alignments; that may 
> work around the problem of missing domains for now.

[I'm using my own data, not the OP's]
No, I have all the alignments: 'A' isn't a problem. And I can get all
the domains. The problem is I have to check multiple different hits to
find them all.


> You'll note that the Model/Domain results returned are not based on 
> top score but what looks like the position of the domain in the
> sequence (seq-t in the last table); that's what is stated in the
> hmmpfam docs.
[...]
> Well, that and SearchIO is set up as a SAX-like parser, so I believe 
> it processes the model-domain alignments as the file is parsed.

Yes, this is the problem. The parser does the obvious thing, but in my 
view it does not do the correct thing.


> Model/domain pairs really aren't Hits/HSPs by definition, like the
> CVS commit from Jason states.  The way Pfam is set up, you actually
> have your query(ies) scanned using a database of Pfam domains (HMM's,
> built from protein alignments for various protein families), hence
> the alignment in the report is not a HSP since HSPs come from
> pairwise sequence alignments.  An HSP is a pair of sequences which,
> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> has alignments of the sequence and the consensus for the alignment
> the HMM is based on (not another sequence, so not an HSP).

But this is just semantics. It doesn't /matter/ that its not really 
truly a sequence that's being aligned. The parser needs to present to 
the user the information in the file. As we see in the OP's example, it 
simply fails to do this because the parser isn't model-centric while the 
file it is parsing /is/.

And in any case, your argument doesn't hold because even the current 
parser /does/ store domains in hsp objects! It just only stores one hsp 
per hit, repeatedly, which is nonsensical.

[to avoid confusion, in the following the use of 'model' is in the 
programming sense, whilst 'Model' refers to the things generated by hmmer]

The correct model to describe the file being parsed is one that is able 
provide to the user all the available results for all Models that hit a 
query sequence, even when there are no alignments in the file. To make 
this fit the SearchIO scheme, we must have one hit per Model. The hit 
has hsps which are the domains. This perfectly matches the information 
in the file. It matches something like a Blast, where you have one hit 
per database sequence/query sequence combo.

A hit could end up with no hsps (no domains), but we may not even care. 
Sometimes you really do just want to know if a particular model hit at 
all, and with what evalue/score. The current parsing model isn't 
guaranteed to tell you this even when you can read it yourself in the 
file being parsed.

You can guess at the intent of the original authors, I think, just by 
looking at those method synonyms. next_hit == next_model. next_hsp == 
next_domain. This makes perfect sense. This is the way to correctly 
model the information in the file. The problem is that next_model 
doesn't give you the next Model (because each Model has multiple hits), 
and next_domain doesn't give you the next domain (because each hit only 
has one domain).


> I think the reasoning for keeping single model-domain pairs is that
> you should consider each domain's location in the sequence as well as
> the number of times they appear, regardless of whether they belong to
> the same model or not.  One protein could have three ATP-binding
> domains and another two, and they could be located in different
> positions on the sequence.  But where they are on the sequence in
> relation to other domains and to each other (i.e. positional
> information) is just as important, maybe more so, than how many times
> that domain appears.

Well, that's for the user to decide. But the way the results are 
presented needs to make sense. If blast results came back with all hsps 
listed out in sequence position order, would you have multiple hits per 
database sequence each with one hsp? No, because the meaning is 
completely wrong. The 'hit' is the collection of alignments of a 
particular database sequence hitting a query sequence. The alignments 
are stored in a bunch of hsps. It is absurd to have more than one hit 
object for a database+query sequence combo, because then we have 
multiple hit objects duplicating the exact same information, and 'hit' 
no longer has any meaning - it is a collection of /some/ of the 
alignments? Yet this is exactly what we have with hmmpfam result parsing.


From selvik at ufl.edu  Wed Jun 28 16:11:56 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 16:11:56 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
References: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
Message-ID: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>

Sendu,

>
> What do you mean by this? What is ??A? ?
> Is this an option you're supplying to hmmpfam or a bioperl module?

I was referring to the '-A' option when running hmmpfam. So if I were  
to use  '-A 5', Section 3 will have only the top scoring (first) five  
HSPs.

>
> So you can say:
> $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> $hit->score;
>
> To get the information you want.
> General information about the result can be had like so:
> print $result->query_name, " ", $result->algorithm, " ",
> $result->hmm_name, "\n";

I do use the same methods that you have suggested. Let me try to  
explain my problem in detail. Lets say I have a report that was  
generated using this "-A 5" option. I want to get the description,  
score, evalue of a model that *does not* have a domain in the top 5  
high scoring HSPs. This information *exists* in the report in Section  
1 but neither $result->next_hit or $hit->next_hsp can see it.

Details of ALL domains  are available through:

     foreach $domain ($result->each_Domain)
     {
            $domain-> [ hmmname, hmmacc, start, end, hstart, hend,  
evalue ]
     }

where $result is a Bio::Tools::HMMER::Results object. But this again  
represents information in Section 2. It gives us domain scores and  
evalues (and not model scores and evalues.)

I am working around this by finding the sum of scores (evalues) of  
all domains in a model. But there seems to be no work-around to  
retrieve the description. $domain->hmmacc contains only the first  
string of the description.

-Selvi


From jason at bioperl.org  Wed Jun 28 22:53:25 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 28 Jun 2006 22:53:25 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
	<44A309FB.2050009@sendu.me.uk>
Message-ID: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>

I don't have any time to really debate this sadly - I definitely went  
back and forth on how to solve this and not many people ever spoke up  
about what the WANTED.  So glad to hear there are opinions out there  
now.

I think the bug fix you refer to had to do with not returning things  
ordered by E-value -- the creation machinery only only builds Hit  
objects when there are HSP objects being built.  Basically the  
parsing is linear in terms of the file, we read "Model" (Hit) data  
first and store them in a hash keyed by the name of the domain, but  
we only >>build<< the "Hits" when seen HSPs, hence the problem when  
the -A option limits alignments but reports Hits that don't have  
individual alignments.  This has to do with the order of things not  
syncing up and/or dealing with the -A option when there is leftover  
Hit data but no HSPs to populate them.  We also had this problem in  
BLAST reports and had to work around that, but I never bothered  
solving it in HMMER I guess.  Glad there are other people who are  
going to fix the problems!

The one "alignment" (HSP) per hit was a workaround to the problem  
that Hits were being returned in the order the HSPs came in (Sequence  
order) -- because that is the order they were being built in -- not  
in the sorted order of the Hits as seen in the report.

Feel free to propose an alternative implement for parser as you see  
fit as long as the API is preserved.  you can contibute a new  
SearchIO plugin and HMMERSearchResultListener to deal with it - or I  
guess do what I also do and just run hmmer2table and deal with things  
in a tab-delimited format.

Personally my interests lie in the actual domains so the Hit objects  
are superfluous in my own work so it never bothered me to have one  
per Hit and it flows more naturally to things like GFF, etc.  You can  
aggregate them however you like after the fact pretty simply so I  
don't find this too hard to deal with, but if this a major deterrent  
for people I guess have at it ( I think the speed of object creation  
is a larger problem that I hope that someone will work on soon).

I'd appreciate you including the salient points of how the report is  
interpreted on the wiki at some point (with 8X10 glossy pictures and  
circles and arrows on the back...http://en.wikipedia.org/wiki/Alice% 
27s_Restaurant) so the debate can be archived too.

-jason

On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:

> Chris Fields wrote:
>>> Sendu Bala wrote:
> [snip]
>>> In any case, this is extremely counter-intuitive, especially given
>>> that next_domain is a synonym of next_hsp. I think either the
>>> synonym relationship remains and hits have multiple hsps (and there
>>> is only one hit per model)
> [snip]
>
>> The model (hit-like) table scores are retained and can be retrieved
>> via $model->significance and the individual domain (hsp-like) evalues
>> via $model->evalue.
>
> I know, see my earlier post.
>
>> The reason you don't get all the individual domain evalues is that
>> only five alignments are returned by default.  You might try changing
>> the 'A' parameter to see if you can get more alignments; that may
>> work around the problem of missing domains for now.
>
> [I'm using my own data, not the OP's]
> No, I have all the alignments: 'A' isn't a problem. And I can get all
> the domains. The problem is I have to check multiple different hits to
> find them all.
>
>
>> You'll note that the Model/Domain results returned are not based on
>> top score but what looks like the position of the domain in the
>> sequence (seq-t in the last table); that's what is stated in the
>> hmmpfam docs.
> [...]
>> Well, that and SearchIO is set up as a SAX-like parser, so I believe
>> it processes the model-domain alignments as the file is parsed.
>
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.
>
>
>> Model/domain pairs really aren't Hits/HSPs by definition, like the
>> CVS commit from Jason states.  The way Pfam is set up, you actually
>> have your query(ies) scanned using a database of Pfam domains (HMM's,
>> built from protein alignments for various protein families), hence
>> the alignment in the report is not a HSP since HSPs come from
>> pairwise sequence alignments.  An HSP is a pair of sequences which,
>> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
>> has alignments of the sequence and the consensus for the alignment
>> the HMM is based on (not another sequence, so not an HSP).
>
> But this is just semantics. It doesn't /matter/ that its not really
> truly a sequence that's being aligned. The parser needs to present to
> the user the information in the file. As we see in the OP's  
> example, it
> simply fails to do this because the parser isn't model-centric  
> while the
> file it is parsing /is/.
>
> And in any case, your argument doesn't hold because even the current
> parser /does/ store domains in hsp objects! It just only stores one  
> hsp
> per hit, repeatedly, which is nonsensical.
>
> [to avoid confusion, in the following the use of 'model' is in the
> programming sense, whilst 'Model' refers to the things generated by  
> hmmer]
>
> The correct model to describe the file being parsed is one that is  
> able
> provide to the user all the available results for all Models that  
> hit a
> query sequence, even when there are no alignments in the file. To make
> this fit the SearchIO scheme, we must have one hit per Model. The hit
> has hsps which are the domains. This perfectly matches the information
> in the file. It matches something like a Blast, where you have one hit
> per database sequence/query sequence combo.
>
> A hit could end up with no hsps (no domains), but we may not even  
> care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.
>
> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple  
> hits),
> and next_domain doesn't give you the next domain (because each hit  
> only
> has one domain).
>
>
>> I think the reasoning for keeping single model-domain pairs is that
>> you should consider each domain's location in the sequence as well as
>> the number of times they appear, regardless of whether they belong to
>> the same model or not.  One protein could have three ATP-binding
>> domains and another two, and they could be located in different
>> positions on the sequence.  But where they are on the sequence in
>> relation to other domains and to each other (i.e. positional
>> information) is just as important, maybe more so, than how many times
>> that domain appears.
>
> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all  
> hsps
> listed out in sequence position order, would you have multiple hits  
> per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result  
> parsing.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Wed Jun 28 23:40:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 22:40:28 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>
Message-ID: <000301c69b2d$c3fdc6a0$15327e82@pyrimidine>

According to CVS, using -A0 (no alignments) is supposed to work since v.
1.5.1 and (I'm guessing here) should return HMMERHit/HMMERHSP objects with
no sequences, just the values from the table.  By this reasoning using -A5
should work but the first five Hit/HSP pairs will give you sequences and any
remaining should give nothing, just the Sequence Model combined evalue
(which you can get by $model->significance) and individual Domain (HSP-like)
evalues ($domain->evalue).  I don't get these either (I only get a max of 5
model/domain pairs). 

So, I tried a little experiment using the first single result output for
this query from your combined file (nbd27e02.y1  716 69 831 ; translated),
which was the first one I came across with more than five model/domain
pairs, and this scripted loop:

while ( my $result = $searchio->next_result() ) {
  print "Query: ",$result->query_name,"\n";
  while (my $model = $result->next_model) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_domain) {
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
}

I get this with the file containing the alignments.  For anyone following,
I'm using bioperl-live, perl 5.8, WinXP:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 2.6e-43
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 40
        Model : IBN_N
        Signif: 2.1
                Evalue : 2.1
        Model : Arm
        Signif: 6e-38
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 0.0096

If I manually delete the alignments (make it like -A0 output) I get this:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 157.3
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 52.1
                Evalue : 40
        Model : IBN_N
        Signif: -3.6
                Evalue : 2.1
        Model : Arm
        Signif: 139.5
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0096
        Model : Arm
        Signif: 139.5
                Evalue : 2.2e-13
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0032
        Model : Arm
        Signif: 139.5
                Evalue : 0.00019

i.e. all the model/domain pairs!  So I think it's safe to say that this is a
bug; the last few don't get processed but should.  I'll drop a bug report
into Bugzilla along with the test files and script so it can be confirmed.
This shouldn't be too hard to fix but it make take a few days; I'm pretty
busy here until Saturday.
 
Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Selvi Kadirvel
> Sent: Wednesday, June 28, 2006 3:12 PM
> To: bioperl-l at lists.open-bio.org
> Cc: Selvi Kadirvel
> Subject: Re: [Bioperl-l] Bio::SearchIO - Accessing Model parameters
> (score,evalue, description)
> 
> Sendu,
> 
> >
> > What do you mean by this? What is ??A? ?
> > Is this an option you're supplying to hmmpfam or a bioperl module?
> 
> I was referring to the '-A' option when running hmmpfam. So if I were
> to use  '-A 5', Section 3 will have only the top scoring (first) five
> HSPs.
> 
> >
> > So you can say:
> > $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> > $hit->score;
> >
> > To get the information you want.
> > General information about the result can be had like so:
> > print $result->query_name, " ", $result->algorithm, " ",
> > $result->hmm_name, "\n";
> 
> I do use the same methods that you have suggested. Let me try to
> explain my problem in detail. Lets say I have a report that was
> generated using this "-A 5" option. I want to get the description,
> score, evalue of a model that *does not* have a domain in the top 5
> high scoring HSPs. This information *exists* in the report in Section
> 1 but neither $result->next_hit or $hit->next_hsp can see it.
> 
> Details of ALL domains  are available through:
> 
>      foreach $domain ($result->each_Domain)
>      {
>             $domain-> [ hmmname, hmmacc, start, end, hstart, hend,
> evalue ]
>      }
> 
> where $result is a Bio::Tools::HMMER::Results object. But this again
> represents information in Section 2. It gives us domain scores and
> evalues (and not model scores and evalues.)
> 
> I am working around this by finding the sum of scores (evalues) of
> all domains in a model. But there seems to be no work-around to
> retrieve the description. $domain->hmmacc contains only the first
> string of the description.
> 
> -Selvi
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 01:20:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:20:10 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
Message-ID: <000d01c69b3b$b17776d0$15327e82@pyrimidine>

> I know, see my earlier post.
...
> [I'm using my own data, not the OP's]
...

Sorry, I was typing that one up over a three-hour period in between
experiments, so I didn't go back and check everything before I sent it.
Pretty much the entire file Selvi sent me (and the entire group, grrr) shows
that the domains in the domain table are not completely parsed, and the
number of reported hits correlates with the number of alignments present.
In other words, only five or less hits are reported based on the alignments
and the default max alignments reported per result is five.  I figured out
that it is a bug and plan on submitting it to Bugzilla.

What you are talking about and what Selvi describes are two separate issues.
I dealt with Selvi's for the moment; let's deal with yours.

> > Well, that and SearchIO is set up as a SAX-like parser, so I believe
...
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.

Yes, and that's your opinion.  To tell the truth I'm quite neutral on this;
I'm trying to reason along the lines the contributors for the module
intended.  The fact of the matter is the parser is set up to do it this way,
and it was set up this way by others (not you or I); modifying it to suit
one's personal wants and needs is not our job here.  I don't have issues
while I'm running it so I really don't see what the problem is, well,
besides the reported bug I found along with Selvi's help.

My view on all this before I quit for the night:

I'm really don't want to get into what I consider nit-picky issues (the
'semantics' you mention; it's a simple difference in opinion and a small one
at that).  We can agree to disagree, whatever.  The issue immediately at
hand, what I consider the most important, is that Selvi has uncovered a bug
with the code, as is.  But I'm going to vent here a bit.  It's late, I'm
tired, and this whole thing irks me.  It irks me a great deal. 

Personally, I don't think right now is the time to think about refactoring
this particular module, esp. since I find it essentially works.  I believe
that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
instance, or refactoring SearchIO::blast etc to use hashes instead of
objects to speed things up.  Or creating something yourself.  Or doing what
you currently are doing (Bio::Map).  In other words, areas where use is
high, code is aging, and refactoring is more productive.

I'll add that I'm not trying to dissuade you from trying to build your own
variation of a SearchIO HMMER parser; by all means go ahead.  The above is
how I feel.  You can build your own parser to do what you want; you can even
base it off the current SearchIO HMMER parser and see if you can set it up
to give you the results you want, using a different handler and so on.  Just
don't break the API or modify the current code based strictly on what your
opinion of how it should work is.  It was probably set up this way for a
particular reason.

According to the SearchIO HOWTO the intent for SearchIO was to 'genericize'
parsing reports with 'similar' styles, like BLAST, FASTA, HMMER, and so on.
The most prevalently parsed reports, by a long stretch, are BLAST reports,
which is what the system is based on: 

http://www.bioperl.org/wiki/HOWTO:SearchIO#Design

So the SearchIO system is based on the >assumption< that these reports can
be divi'd up with the data mapped into categories (Results, Hits, HSPs), so
similar objects should be able to handle them.  Domain data are currently
stored in HSP objects (HMMERHSP), but that's nothing more than a convenient
way to store HMMER report data in my opinion; the alignment matches,
strictly speaking, are not HSP's.  You could rename HMMERHit HMMERModel and
HMMERHsp HMMERDomain, but they would still, if they fit into SearchIO and
used the current event handlers, implement HitI/HSPI by inheriting from
GenericHit/GenericHSP.  Ergo, any easy way you go about it here, HMMERHit
is-a HitI and HMMERHsp is-a HSPI.  You could probably work around it by
building the 'correct' object hierarchy by setting up your own handler and
SearchIO plugin, but that risks changing API.  And, really, if you decide to
go down that path, consider what Jason is talking about when he mentions
using "under-the-hood" hashes.

> A hit could end up with no hsps (no domains), but we may not even care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.

For every model (hit) you should have a corresponding domain (HSP) or more
depending on your view of how the parser works, even if the domain (HSP) is
only present in the table and not in an alignment.  You shouldn't have
models w/o domains from your query (hits w/o hsps); that doesn't make any
sense.  If hmmpfam output has this then it's a serious issue, but, again,
that doesn't make sense.  All that information is in the tables in the
hmmpfam output; you can even build objects w/o alignments present (-A0)
straight from the tables.

If you wanted to know whether a particular model hit at all, grab all the
model objects ($result->models) and run through them to see if your expected
model (Annexin, Phosphoribosyl, or whatever) is there using a map/grep
block, regex, or whatever; you could autovivicate a hash or similar data
structure indicating that a particular sequence has x domains of y type.  Or
iterate through them like you would for a BLAST report.  I don't see what's
difficult about this; I do it for BLAST sequences, SeqFeatures, and many
other BioPerl objects all the time!  Yes, it can be slow; that's an issue
with object instantiation and Perl and there is no easy way around it
besides refactoring the SearchIO parsers/eventhandlers to send back hashes,
as Jason has suggested.

> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple hits),
> and next_domain doesn't give you the next domain (because each hit only
> has one domain).

....

> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all hsps
> listed out in sequence position order, would you have multiple hits per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result parsing.

The problem is that the module is geared to parse the output as simply as
possible, so it does it by sequence order, just like the output.  And, as
is, it makes sense to me why Eddy and Co. set it that way, not that I
completely agree with it.  Hmmpfam output is designed for annotating
sequences using Pfam HMM's, so the results are hard-coded to appear in
sequence order, not based on score or evalue.  That's the way it is; not
necessarily the best way IMHO (I would have a way to sort by evalue or model
myself as an option), but it's the only way that's currently available.
Yes, each Model can match more than one domain on a query sequence.  Again,
that this is the 'correct way' to set up this parser is your opinion; if you
want, design your own SearchIO parser.  Like I said, I don't have a problem
with using this module myself.  And I'm a bit reticent to spend the energy
overhaulin' this module when I could spend my time working on something else
I consider more constructive (or destructive, depending on your view).  

And, frankly, it's not up to the user when using code they didn't create.
You have to deal with it.  Or code something yourself to do things the way
you want.  You have the power to do that; most bioperl users don't simply
b/c they probably don't understand the class structure and OO nature of
Bioperl.  It's just a matter of where you want to spend your energy: dealing
with something that interests you or fixing other's people's broken code.


Chris


From cjfields at uiuc.edu  Thu Jun 29 01:23:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:23:03 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <000e01c69b3c$18d58fb0$15327e82@pyrimidine>

...
 
> I think the bug fix you refer to had to do with not returning things
> ordered by E-value -- the creation machinery only only builds Hit
> objects when there are HSP objects being built.  Basically the
> parsing is linear in terms of the file, we read "Model" (Hit) data
> first and store them in a hash keyed by the name of the domain, but
> we only >>build<< the "Hits" when seen HSPs, hence the problem when
> the -A option limits alignments but reports Hits that don't have
> individual alignments.  This has to do with the order of things not
> syncing up and/or dealing with the -A option when there is leftover
> Hit data but no HSPs to populate them.  We also had this problem in
> BLAST reports and had to work around that, but I never bothered
> solving it in HMMER I guess.  Glad there are other people who are
> going to fix the problems!

Yeah, just figured that one out.  I see the two tables are parsed into two
arrays, so it is feasible to have the leftover (Hit/HSP|Model/Domain)
whatever converted into the proper objects like without any alignments (-A0
optional output).  I plan on reporting this in Bugzilla and will work on it,
but can't get to it immediately (probably not 'til Friday-Saturday at the
earliest).  If Sendu wants to tackle it I don't have a problem.

> The one "alignment" (HSP) per hit was a workaround to the problem
> that Hits were being returned in the order the HSPs came in (Sequence
> order) -- because that is the order they were being built in -- not
> in the sorted order of the Hits as seen in the report.

The SAX method, I gather, getting in the way.  

> Feel free to propose an alternative implement for parser as you see
> fit as long as the API is preserved.  you can contibute a new
> SearchIO plugin and HMMERSearchResultListener to deal with it - or I
> guess do what I also do and just run hmmer2table and deal with things
> in a tab-delimited format.

Or set it up as hashes, which you have mentioned before for BLAST.

> Personally my interests lie in the actual domains so the Hit objects
> are superfluous in my own work so it never bothered me to have one
> per Hit and it flows more naturally to things like GFF, etc.  You can
> aggregate them however you like after the fact pretty simply so I
> don't find this too hard to deal with, but if this a major deterrent
> for people I guess have at it ( I think the speed of object creation
> is a larger problem that I hope that someone will work on soon).

Agreed, though now it's finding the time....


Chris 

> I'd appreciate you including the salient points of how the report is
> interpreted on the wiki at some point (with 8X10 glossy pictures and
> circles and arrows on the back...http://en.wikipedia.org/wiki/Alice%
> 27s_Restaurant) so the debate can be archived too.
> 
> -jason
> 
> On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:
> 
> > Chris Fields wrote:
> >>> Sendu Bala wrote:
> > [snip]
> >>> In any case, this is extremely counter-intuitive, especially given
> >>> that next_domain is a synonym of next_hsp. I think either the
> >>> synonym relationship remains and hits have multiple hsps (and there
> >>> is only one hit per model)
> > [snip]
> >
> >> The model (hit-like) table scores are retained and can be retrieved
> >> via $model->significance and the individual domain (hsp-like) evalues
> >> via $model->evalue.
> >
> > I know, see my earlier post.
> >
> >> The reason you don't get all the individual domain evalues is that
> >> only five alignments are returned by default.  You might try changing
> >> the 'A' parameter to see if you can get more alignments; that may
> >> work around the problem of missing domains for now.
> >
> > [I'm using my own data, not the OP's]
> > No, I have all the alignments: 'A' isn't a problem. And I can get all
> > the domains. The problem is I have to check multiple different hits to
> > find them all.
> >
> >
> >> You'll note that the Model/Domain results returned are not based on
> >> top score but what looks like the position of the domain in the
> >> sequence (seq-t in the last table); that's what is stated in the
> >> hmmpfam docs.
> > [...]
> >> Well, that and SearchIO is set up as a SAX-like parser, so I believe
> >> it processes the model-domain alignments as the file is parsed.
> >
> > Yes, this is the problem. The parser does the obvious thing, but in my
> > view it does not do the correct thing.
> >
> >
> >> Model/domain pairs really aren't Hits/HSPs by definition, like the
> >> CVS commit from Jason states.  The way Pfam is set up, you actually
> >> have your query(ies) scanned using a database of Pfam domains (HMM's,
> >> built from protein alignments for various protein families), hence
> >> the alignment in the report is not a HSP since HSPs come from
> >> pairwise sequence alignments.  An HSP is a pair of sequences which,
> >> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> >> has alignments of the sequence and the consensus for the alignment
> >> the HMM is based on (not another sequence, so not an HSP).
> >
> > But this is just semantics. It doesn't /matter/ that its not really
> > truly a sequence that's being aligned. The parser needs to present to
> > the user the information in the file. As we see in the OP's
> > example, it
> > simply fails to do this because the parser isn't model-centric
> > while the
> > file it is parsing /is/.
> >
> > And in any case, your argument doesn't hold because even the current
> > parser /does/ store domains in hsp objects! It just only stores one
> > hsp
> > per hit, repeatedly, which is nonsensical.
> >
> > [to avoid confusion, in the following the use of 'model' is in the
> > programming sense, whilst 'Model' refers to the things generated by
> > hmmer]
> >
> > The correct model to describe the file being parsed is one that is
> > able
> > provide to the user all the available results for all Models that
> > hit a
> > query sequence, even when there are no alignments in the file. To make
> > this fit the SearchIO scheme, we must have one hit per Model. The hit
> > has hsps which are the domains. This perfectly matches the information
> > in the file. It matches something like a Blast, where you have one hit
> > per database sequence/query sequence combo.
> >
> > A hit could end up with no hsps (no domains), but we may not even
> > care.
> > Sometimes you really do just want to know if a particular model hit at
> > all, and with what evalue/score. The current parsing model isn't
> > guaranteed to tell you this even when you can read it yourself in the
> > file being parsed.
> >
> > You can guess at the intent of the original authors, I think, just by
> > looking at those method synonyms. next_hit == next_model. next_hsp ==
> > next_domain. This makes perfect sense. This is the way to correctly
> > model the information in the file. The problem is that next_model
> > doesn't give you the next Model (because each Model has multiple
> > hits),
> > and next_domain doesn't give you the next domain (because each hit
> > only
> > has one domain).
> >
> >
> >> I think the reasoning for keeping single model-domain pairs is that
> >> you should consider each domain's location in the sequence as well as
> >> the number of times they appear, regardless of whether they belong to
> >> the same model or not.  One protein could have three ATP-binding
> >> domains and another two, and they could be located in different
> >> positions on the sequence.  But where they are on the sequence in
> >> relation to other domains and to each other (i.e. positional
> >> information) is just as important, maybe more so, than how many times
> >> that domain appears.
> >
> > Well, that's for the user to decide. But the way the results are
> > presented needs to make sense. If blast results came back with all
> > hsps
> > listed out in sequence position order, would you have multiple hits
> > per
> > database sequence each with one hsp? No, because the meaning is
> > completely wrong. The 'hit' is the collection of alignments of a
> > particular database sequence hitting a query sequence. The alignments
> > are stored in a bunch of hsps. It is absurd to have more than one hit
> > object for a database+query sequence combo, because then we have
> > multiple hit objects duplicating the exact same information, and 'hit'
> > no longer has any meaning - it is a collection of /some/ of the
> > alignments? Yet this is exactly what we have with hmmpfam result
> > parsing.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 29 03:02:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:02:49 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
Message-ID: <44A37B19.7030908@sendu.me.uk>

Chris Fields wrote:
>
> Personally, I don't think right now is the time to think about refactoring
> this particular module, esp. since I find it essentially works.  I believe
> that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
> instance, or refactoring SearchIO::blast etc to use hashes instead of
> objects to speed things up.  Or creating something yourself.  Or doing what
> you currently are doing (Bio::Map).  In other words, areas where use is
> high, code is aging, and refactoring is more productive.

Hmmer parsing happens to be important to me, in fact vital for my work. 
I've been using my own parser up till now, so didn't know what the 
Bioperl one was like. I'd like to use Bioperl for more things, 
preferably everything.


> I'll add that I'm not trying to dissuade you from trying to build your own
> variation of a SearchIO HMMER parser; by all means go ahead.  The above is
> how I feel.  You can build your own parser to do what you want; you can even
> base it off the current SearchIO HMMER parser and see if you can set it up
> to give you the results you want, using a different handler and so on.  Just
> don't break the API or modify the current code based strictly on what your
> opinion of how it should work is.  It was probably set up this way for a
> particular reason.

Well, I don't like the idea of there being multiple SearchIO parsers for 
the same thing.

[...]
> And, frankly, it's not up to the user when using code they didn't create.
> You have to deal with it.  Or code something yourself to do things the way
> you want.  You have the power to do that; most bioperl users don't simply
> b/c they probably don't understand the class structure and OO nature of
> Bioperl.  It's just a matter of where you want to spend your energy: dealing
> with something that interests you or fixing other's people's broken code.

My original question was essentially: does doing it my way make sense? 
And implicitly: would doing it my way be of any harm? Ie. can I go ahead 
and change how the parser reports results and groups them together? I 
don't think it will involve an API change, but the results it generates 
will obviously be very different.


From bix at sendu.me.uk  Thu Jun 29 03:54:50 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:54:50 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <44A3874A.9040803@sendu.me.uk>

Jason Stajich wrote:
>
> Feel free to propose an alternative implement for parser as you see  
> fit as long as the API is preserved.  you can contibute a new  
> SearchIO plugin and HMMERSearchResultListener to deal with it - or [snip]

What's the thinking behind the way SearchIOs work? Is it necessary or 
desirable to always do it with events and listeners? Or is it enough to 
simply return a ResultI regardless of how you made it?


From cjfields at uiuc.edu  Thu Jun 29 09:27:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 08:27:00 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <A1A48284-6FD6-4898-9438-DEEB105496EC@uiuc.edu>


On Jun 29, 2006, at 2:02 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> Personally, I don't think right now is the time to think about  
>> refactoring
>> this particular module, esp. since I find it essentially works.  I  
>> believe
>> that energy is better spent elsewhere, such as SeqIO::genbank/ 
>> swiss/embl for
>> instance, or refactoring SearchIO::blast etc to use hashes instead of
>> objects to speed things up.  Or creating something yourself.  Or  
>> doing what
>> you currently are doing (Bio::Map).  In other words, areas where  
>> use is
>> high, code is aging, and refactoring is more productive.
>
> Hmmer parsing happens to be important to me, in fact vital for my  
> work.
> I've been using my own parser up till now, so didn't know what the
> Bioperl one was like. I'd like to use Bioperl for more things,
> preferably everything.

We're not deterring you from setting up your own parser, something  
both Jason and I suggested.  I just don't see what the major issue  
is; hmmerpfam results never really contain the same number of hits  
per query that BLAST does (I get at the very most 30-40 and that is  
usually based on repeats).  I believe the best place to spend this  
energy first and foremost is fixing the bug.

>> I'll add that I'm not trying to dissuade you from trying to build  
>> your own
>> variation of a SearchIO HMMER parser; by all means go ahead.  The  
>> above is
>> how I feel.  You can build your own parser to do what you want;  
>> you can even
>> base it off the current SearchIO HMMER parser and see if you can  
>> set it up
>> to give you the results you want, using a different handler and so  
>> on.  Just
>> don't break the API or modify the current code based strictly on  
>> what your
>> opinion of how it should work is.  It was probably set up this way  
>> for a
>> particular reason.
>
> Well, I don't like the idea of there being multiple SearchIO  
> parsers for
> the same thing.

See, here's the thing: if the community-at-large decides to use your  
version of the parser then, by default it will become the only HMMER  
SearchIO parser and we'll deprecate the old one.  I just don't think  
this is the way I would go about it.  Jason has mentioned that object  
instantiation is a bigger issue with parsing (speed) than anything  
else; why not, if you plan on doing this, set up a Handler to return  
hashes, or do it completely under-the-hood?  Have it be the 'new,  
faster way to run SearchIO.'  Don't rehash (pardon the bad pun) the  
way things were esp. when proposals are out there to improve the  
toolkit.

> [...]
>> And, frankly, it's not up to the user when using code they didn't  
>> create.
>> You have to deal with it.  Or code something yourself to do things  
>> the way
>> you want.  You have the power to do that; most bioperl users don't  
>> simply
>> b/c they probably don't understand the class structure and OO  
>> nature of
>> Bioperl.  It's just a matter of where you want to spend your  
>> energy: dealing
>> with something that interests you or fixing other's people's  
>> broken code.
>
> My original question was essentially: does doing it my way make sense?
> And implicitly: would doing it my way be of any harm? Ie. can I go  
> ahead
> and change how the parser reports results and groups them together? I
> don't think it will involve an API change, but the results it  
> generates
> will obviously be very different.

And my point is that both ways make sense, at least to me (and it  
sounds like to Jason though I could be wrong).  Again, create a new  
version of the parser based on what you want to do and accomplish.   
Don't just modify something the community at-large uses based on your  
whims. Make the changes to a new module and let the community  
decide.  As an example, BioPerl, for the longest time, had several  
BLAST parsers; we directed everybody over to SearchIO and most people  
seem to like it; hence the others are deprecated.

And changing the results returned by some could be considered  
changing the API or a bug.  If someone using this module has an  
automated pipeline set up for annotation using Pfam, hmmpfam,  
Bioperl, and a database, and their setup expects single model/domain  
pairs, yeah, your changes will break that.  Maybe small,  
inconsequential even, but it's possible (and even true; many genome  
annotation pipelines are set up exactly how I describe).

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ClarkeW at AGR.GC.CA  Thu Jun 29 10:31:14 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 10:31:14 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>

Hi all, 

 
Recently I was working on a project which required some manipulation of
Quality files. I may be wrong in this, but I don't believe that there is
a Quality format for Bio:SeqIO. If there is, someone could point me in
the right direction as I could write a much nicer script then what I
currently have, if not I was wondering if anyone here has any use for
such a thing. I am pretty new to developing but would be willing to give
it a shot, as I feel that for all the use I get out of BioPerl with no
thanks to anyone who spent time on writing something I used, I could try
and contribute my limited amount. Any comments would be appreciated, and
don't be afraid to tell me this is a lost cause. I realize that quality
files tend to be less important than FASTA sequence files. I will give
you a little information on me so that you know what to expect/what I am
working with.

I am a fourth year bioinformatics student, and am currently working as a
summer student. I have some limited experience with writing perl modules
and test scripts. Mostly I write perl to do specific jobs, that I or
someone else has come up with to fill some immediate need of the
company. I am interested in most things bioinformatics/computer
sci/biology and am hoping to do Graduate studies when I finish my
degree.

Well that's enough for now, if you have any comments/suggestions I would
appreciate it.

 
Cheers, Wayne


From cjfields at uiuc.edu  Thu Jun 29 10:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 09:55:16 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <001601c69b8c$08cdce70$15327e82@pyrimidine>

> Recently I was working on a project which required some manipulation of
> Quality files. I may be wrong in this, but I don't believe that there is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could try
> and contribute my limited amount. Any comments would be appreciated, and
> don't be afraid to tell me this is a lost cause. I realize that quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in BioPerl,
which is the sequence implementation for sequences with quality data and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as a
> summer student. I have some limited experience with writing perl modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Thu Jun 29 11:01:52 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 11:01:52 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>


Thanks Chris, 

I don't know how I didn't come up with this before. Can I use
Bio::SeqIO::qual as follows?

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Thursday, June 29, 2006 8:55 AM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] BioPerl and quality files

> Recently I was working on a project which required some manipulation
of
> Quality files. I may be wrong in this, but I don't believe that there
is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to
give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could
try
> and contribute my limited amount. Any comments would be appreciated,
and
> don't be afraid to tell me this is a lost cause. I realize that
quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I
am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in
BioPerl,
which is the sequence implementation for sequences with quality data
and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as
a
> summer student. I have some limited experience with writing perl
modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I
would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 11:21:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:21:21 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>
Message-ID: <002001c69b8f$ad754450$15327e82@pyrimidine>

It should work that way, yes:  

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

# the below should return a Bio::Seq::Quality object
my $seq = $in->next_seq; 

You might want to check the other SeqIO modules as well depending on your
format:

...

Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

...

Chris

> Thanks Chris,
> 
> I don't know how I didn't come up with this before. Can I use
> Bio::SeqIO::qual as follows?
> 
> my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');
> 
> Cheers, Wayne
...


From cjfields at uiuc.edu  Thu Jun 29 11:23:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:23:20 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
Message-ID: <002101c69b8f$f48bd070$15327e82@pyrimidine>

Sendu, 

The HOWTO explains everything:

http://www.bioperl.org/wiki/HOWTO:SearchIO

under "Implementation."  I learned this the hard way when I started working
on SearchIO::blast and wondered why it had so many *_element methods.  

Yes, you will need an EventHandler if you implement SearchIO; the
EventHandler should implement Bio::SearchIO::EventHandlerI interface.  You
might not need one that returns objects though (i.e. it could return
hashes).  And you could possibly get around the event handler somehow,
though if you plan on doing that, why not just work on Bio::Tools::Hmmpfam
as an alternative parser?  We've had other BLAST parsers before
(Bio::Tools::BPLite comes to mind); if they aren't maintained and there is a
viable alternative they can be deprecated.  Hence the reason I mentioned
working on your own version of SearchIO::hmmer; if that module becomes most
prevalently used we can deprecate the older version.

The idea that a SearchIO plugin should act like a SAX parser is based on the
fact that many files being parsed are quite large, so it would be nice to
have everything parsed as a stream (on-the-go) as opposed to preprocessing
everything into an object hierarchy (which can be very memory intensive for
large files).  Whether this is done in practice in all SearchIO modules is
another thing; it may be based upon what particular fixes were made over
time or the contributor's intentions.  

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 29, 2006 2:55 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
> 
> Jason Stajich wrote:
> >
> > Feel free to propose an alternative implement for parser as you see
> > fit as long as the API is preserved.  you can contibute a new
> > SearchIO plugin and HMMERSearchResultListener to deal with it - or
> [snip]
> 
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Thu Jun 29 11:05:54 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 29 Jun 2006 16:05:54 +0100
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <44A3EC52.7030502@colibase.bham.ac.uk>

Hi Wayne.

I think Bio::SeqIO::qual is what you are looking for.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From jason at bioperl.org  Thu Jun 29 14:04:12 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 29 Jun 2006 14:04:12 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
Message-ID: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>

however you want - the idea of listeners at the time was to make it  
more SAX like so we could throw away events we didn't want and speed  
up the whole system when there was some idea of how you wanted the  
data filtered.  That may have been too much wishful thinking and I  
just couldn't do it alone.


On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> Feel free to propose an alternative implement for parser as you see
>> fit as long as the API is preserved.  you can contibute a new
>> SearchIO plugin and HMMERSearchResultListener to deal with it - or  
>> [snip]
>
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it  
> enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From prettyblondegirl222 at yahoo.com  Thu Jun 29 14:23:56 2006
From: prettyblondegirl222 at yahoo.com (S S)
Date: Thu, 29 Jun 2006 11:23:56 -0700 (PDT)
Subject: [Bioperl-l] TAKE ME OFF
Message-ID: <20060629182356.93810.qmail@web51305.mail.yahoo.com>

  
---------------------------------
How low will we go? Check out Yahoo! Messenger?s low  PC-to-Phone call rates.


From cjfields at uiuc.edu  Thu Jun 29 23:53:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 22:53:22 -0500
Subject: [Bioperl-l] SearchIO::blast, was Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
	<166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
Message-ID: <7511BE75-3A87-4E78-BFEA-2B38210BAD85@uiuc.edu>

If we can work around the listener/handler that'll definitely speed  
things up.  I was thinking about tackling the SearchIO::blast parser  
next, refactoring it to use hashes as a separate plugin module; if I  
don't need the handler for that then it'll speed things up a bit.

Chris

On Jun 29, 2006, at 1:04 PM, Jason Stajich wrote:

> however you want - the idea of listeners at the time was to make it
> more SAX like so we could throw away events we didn't want and speed
> up the whole system when there was some idea of how you wanted the
> data filtered.  That may have been too much wishful thinking and I
> just couldn't do it alone.
>
>
> On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:
>
>> Jason Stajich wrote:
>>>
>>> Feel free to propose an alternative implement for parser as you see
>>> fit as long as the API is preserved.  you can contibute a new
>>> SearchIO plugin and HMMERSearchResultListener to deal with it - or
>>> [snip]
>>
>> What's the thinking behind the way SearchIOs work? Is it necessary or
>> desirable to always do it with events and listeners? Or is it
>> enough to
>> simply return a ResultI regardless of how you made it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Fri Jun 30 08:45:15 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 30 Jun 2006 14:45:15 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>

Hi,

>My original question was essentially: does doing it my way make sense?
With respect to Sendu's points, I can only say that a colleague
(developer) and I were surprised that the HMMer parser did not group
the hits as the blast parser does, in "Hit" and "Hsp".
When we realized how hmmer parsing worked we continued with to use it
but used a check for multiple hits of one domain on 1 query sequence
(e.g. in hmmpfam).

Regards,
Bernd


From jason at bioperl.org  Fri Jun 30 10:05:01 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Jun 2006 10:05:01 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
Message-ID: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>

I understand the confusion and it was the intention of having HSPs  
grouped together under the same Hit initialy just like BLAST reports  
- but somehow in the bug-fix-cycle the way to deal with the fact that  
"HSPs" aren't ordered by the overall Hit table led to this design  
decision - the problem before was something with the ordering, but I  
must admit to not being able to remember what specifically was the  
problem t I can't really remember why I changed things to do this.   
Does 1.4 actually do it the way you expect?

Again, more user feedback is definitely critical to make these tools  
useful to everyone so please don't bashful about reporting your  
preferences.

-j

On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:

> Hi,
>
>> My original question was essentially: does doing it my way make  
>> sense?
> With respect to Sendu's points, I can only say that a colleague
> (developer) and I were surprised that the HMMer parser did not group
> the hits as the blast parser does, in "Hit" and "Hsp".
> When we realized how hmmer parsing worked we continued with to use it
> but used a check for multiple hits of one domain on 1 query sequence
> (e.g. in hmmpfam).
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Fri Jun 30 11:56:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 10:56:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
Message-ID: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>

It may have been just simpler to have it be one HSP (domain) per Hit  
(model) as that's how the reports are generated.  My reasoning was  
that using the one domain per model made sense based on what you are  
actually trying to do, which is annotate the sequence based on the  
order the domain appears.  Most others may not view it that way,  
which is fine.  One can always gather the relevant HSP's, convert to  
seqfeatures, then sort them if order is important, I suppose.

I would say, if the overall consensus is to modify it to have  
multiple domain hits per model (similar to BLAST) then Sendu should  
go ahead and make those changes then announce it on the list so no  
one can gripe about it later.  My main concern was not changing  
things so dramatically that it'll break for someone, but seeing as  
we've had a lengthy discussion about it already they should have  
piped up by now!   Well, that and trying to return everything as  
hashes as Jason suggested.  From looking at SearchIO::hmmer we need  
to make sure that both hmmsearch and hmmpfam work the same way (looks  
like they have different sections) and that the reported bug about  
missing hits (Bug 2036) is fixed as well.

Chris

On Jun 30, 2006, at 9:05 AM, Jason Stajich wrote:

> I understand the confusion and it was the intention of having HSPs
> grouped together under the same Hit initialy just like BLAST reports
> - but somehow in the bug-fix-cycle the way to deal with the fact that
> "HSPs" aren't ordered by the overall Hit table led to this design
> decision - the problem before was something with the ordering, but I
> must admit to not being able to remember what specifically was the
> problem t I can't really remember why I changed things to do this.
> Does 1.4 actually do it the way you expect?
>
> Again, more user feedback is definitely critical to make these tools
> useful to everyone so please don't bashful about reporting your
> preferences.
>
> -j
>
> On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:
>
>> Hi,
>>
>>> My original question was essentially: does doing it my way make
>>> sense?
>> With respect to Sendu's points, I can only say that a colleague
>> (developer) and I were surprised that the HMMer parser did not group
>> the hits as the blast parser does, in "Hit" and "Hsp".
>> When we realized how hmmer parsing worked we continued with to use it
>> but used a check for multiple hits of one domain on 1 query sequence
>> (e.g. in hmmpfam).
>>
>> Regards,
>> Bernd
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Jun 30 12:14:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Jun 2006 17:14:05 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
Message-ID: <44A54DCD.3050708@sendu.me.uk>

Chris Fields wrote:
> It may have been just simpler to have it be one HSP (domain) per Hit 
> (model) as that's how the reports are generated.  My reasoning was that 
> using the one domain per model made sense based on what you are actually 
> trying to do, which is annotate the sequence based on the order the 
> domain appears.  Most others may not view it that way, which is fine.  
> One can always gather the relevant HSP's, convert to seqfeatures, then 
> sort them if order is important, I suppose.
> 
> I would say, if the overall consensus is to modify it to have multiple 
> domain hits per model (similar to BLAST) then Sendu should go ahead and 
> make those changes then announce it on the list so no one can gripe 
> about it later.  My main concern was not changing things so dramatically 
> that it'll break for someone

Going on your earlier suggestion, I was thinking about making 
SearchIO::hmmpfam instead, which would get used if you set the format to 
'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I 
suppose I would make a SearchIO::hmmsearch as well, if necessary.


[...]
> that the reported bug about missing hits (Bug 2036) is fixed as well.

However, having never made a SearchIO plugin before, it will be some 
time before I get my head around it. I'll want to make one the current 
HOWTO:SearchIO way before I can think about doing it a better way 
(hashes) as well. So I can say I'll make a move on this at some point in 
the future, but if someone wants to fix Bug 2036 in the mean time, they 
are welcome to. Again as suggested, my priority is Bio::Map right now.


From rmb32 at cornell.edu  Fri Jun 30 13:01:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 10:01:38 -0700
Subject: [Bioperl-l] parser for GeneSeqer
Message-ID: <44A558F2.2050304@cornell.edu>

Hi all,

I find myself needing a parser for GeneSeqer output, so I'm writing one 
(which I will submit for your consideration when it's working).  In a 
nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of 
ESTs to genomic sequence, then using those alignments to predict where 
in the genomic sequence the genes are.  So really what you get from this 
is a bunch of hierarchical features.

I don't really know where I should put it in the bioperl hierarchy 
though.  Probably FeatureIO?

And what's the current fashion for objects it should emit?  
Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 30 13:43:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:43:56 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A54DCD.3050708@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
	<44A54DCD.3050708@sendu.me.uk>
Message-ID: <E2C6F66F-9B85-42D3-B2A0-BD7C8B222572@uiuc.edu>

I'll try looking at it this weekend.  A suggested workaround is to  
either try setting -A for no alignments or setting it to a high  
number to retrieve all of them.  It's pretty serious as the error  
silently dumps those domains, so for those using automated annotation  
pipelines would miss it unless they are also checking the raw output.

You could design a SearchIO::hmmpfam parser then expand it to take in  
hmmsearch output at a later point, or keep them separate.  I like the  
idea of having modules that are more specific about what they parse;  
seems at some point you reach serious code bloat and maintenance  
becomes an issue.  Look at SearchIO::blast; it parses various text  
BLAST output very well but with some serious obfuscation.  Just don't  
know how productive it would be to separate out the PSI-BLAST and  
bl2seq stuff since they are pretty close to a standard BLAST  
report... oh well.

To Jason : good luck on your move.  Drop  us a line here to let us  
know everything went well.

Chris

On Jun 30, 2006, at 11:14 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> It may have been just simpler to have it be one HSP (domain) per Hit
>> (model) as that's how the reports are generated.  My reasoning was  
>> that
>> using the one domain per model made sense based on what you are  
>> actually
>> trying to do, which is annotate the sequence based on the order the
>> domain appears.  Most others may not view it that way, which is fine.
>> One can always gather the relevant HSP's, convert to seqfeatures,  
>> then
>> sort them if order is important, I suppose.
>>
>> I would say, if the overall consensus is to modify it to have  
>> multiple
>> domain hits per model (similar to BLAST) then Sendu should go  
>> ahead and
>> make those changes then announce it on the list so no one can gripe
>> about it later.  My main concern was not changing things so  
>> dramatically
>> that it'll break for someone
>
> Going on your earlier suggestion, I was thinking about making
> SearchIO::hmmpfam instead, which would get used if you set the  
> format to
> 'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I
> suppose I would make a SearchIO::hmmsearch as well, if necessary.
>
>
> [...]
>> that the reported bug about missing hits (Bug 2036) is fixed as well.
>
> However, having never made a SearchIO plugin before, it will be some
> time before I get my head around it. I'll want to make one the current
> HOWTO:SearchIO way before I can think about doing it a better way
> (hashes) as well. So I can say I'll make a move on this at some  
> point in
> the future, but if someone wants to fix Bug 2036 in the mean time,  
> they
> are welcome to. Again as suggested, my priority is Bio::Map right now.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Jun 30 13:54:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:54:23 -0500
Subject: [Bioperl-l] parser for GeneSeqer
In-Reply-To: <44A558F2.2050304@cornell.edu>
References: <44A558F2.2050304@cornell.edu>
Message-ID: <2FB066C7-12E6-46D8-8F4A-BD096BE2A0CA@uiuc.edu>

If you plan on generating seqfeatures from this output you could  
check out the Bio::Tools core modules for examples.  There are a few  
there that take program output and convert them to  
Bio::SeqFeature::Generic objects, including Bio::Tools:RNAMotif and  
Bio::Tools::tRNAscanSE.  If alignments are involved you might want  
something like Bio::SeqFeature::FeaturePair.  Not sure about using  
the SeqFeature::Annotation or others; I thought that the some of the  
Annotation/Annotatable stuff might be changing soon but I may be wrong.

Chris

On Jun 30, 2006, at 12:01 PM, Robert Buels wrote:

> Hi all,
>
> I find myself needing a parser for GeneSeqer output, so I'm writing  
> one
> (which I will submit for your consideration when it's working).  In a
> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> ESTs to genomic sequence, then using those alignments to predict where
> in the genomic sequence the genes are.  So really what you get from  
> this
> is a bunch of hierarchical features.
>
> I don't really know where I should put it in the bioperl hierarchy
> though.  Probably FeatureIO?
>
> And what's the current fashion for objects it should emit?
> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>
> Rob
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From rmb32 at cornell.edu  Fri Jun 30 15:32:11 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 12:32:11 -0700
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
In-Reply-To: <29201430510651801@webmail.iastate.edu>
References: <29201430510651801@webmail.iastate.edu>
Message-ID: <44A57C3B.8040808@cornell.edu>

Aha!  Isn't it amazing what gets revealed when you just get off your 
butt and ask on the mailing list.

I'll look at that code straightaway.  The concept is quite attractive to 
me, since GenomeThreader is the next program that I'm going to be 
integrating into my analysis stuff.  Unfortunately, (I am under the 
impression that) my GeneSeqer parser is almost finished.

This brings us to the next question, what about parsing the 
GenomeThreader XML?  Would be lovely to have a Bioperl interface for 
that.  Is there some code floating about for that too?

Rob

Michael E Sparks wrote:
> Hi Rob,
>
> For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
>  You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/
>
> There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
> an XML format also used by the GenomeThreader spliced alignment program, whose
> schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
> 0README in the above directory will give you an overview of what tools I've made
> available.  Hope you find it useful!
>
> Regards,
> Michael
>
> --
> Thanks,
> Michael E Sparks
> Graduate Assistant, Brendel Lab
> 2128 Molecular Biology Building
> Iowa State University
> Ames, IA 50011-3260
> 1-515-294-4063
> http://www.public.iastate.edu/~mespar1/
>
>
> Forwarded Message:
>   
>> To: <plantgdb at iastate.edu>
>> From: "Shannon D Schlueter" <sds at iastate.edu>
>> Subject: FW: [Bioperl-l] parser for GeneSeqer
>> Date: Fri, 30 Jun 2006 13:01:46 -0500
>> -----
>>     
>>> Date: Fri, 30 Jun 2006 10:01:38 -0700
>>> From: Robert Buels <rmb32 at cornell.edu>
>>> User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
>>> To: bioperl-l at bioperl.org
>>> Subject: [Bioperl-l] parser for GeneSeqer
>>> Sender: bioperl-l-bounces at lists.open-bio.org
>>>
>>> Hi all,
>>>
>>> I find myself needing a parser for GeneSeqer output, so I'm writing one
>>> (which I will submit for your consideration when it's working).  In a
>>> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
>>> ESTs to genomic sequence, then using those alignments to predict where
>>> in the genomic sequence the genes are.  So really what you get from this
>>> is a bunch of hierarchical features.
>>>
>>> I don't really know where I should put it in the bioperl hierarchy
>>> though.  Probably FeatureIO?
>>>
>>> And what's the current fashion for objects it should emit? 
>>> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>>>
>>> Rob
>>>
>>> --
>>> Robert Buels
>>> SGN Bioinformatics Analyst
>>> 252A Emerson Hall, Cornell University
>>> Ithaca, NY  14853
>>> Tel: 503-889-8539
>>> rmb32 at cornell.edu
>>> http://www.sgn.cornell.edu
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>
>
>
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From mespar1 at iastate.edu  Fri Jun 30 15:20:29 2006
From: mespar1 at iastate.edu (Michael E Sparks)
Date: Fri, 30 Jun 2006 14:20:29 -0500 (CDT)
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
Message-ID: <29201430510651801@webmail.iastate.edu>

Hi Rob,

For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
 You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/

There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
an XML format also used by the GenomeThreader spliced alignment program, whose
schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
0README in the above directory will give you an overview of what tools I've made
available.  Hope you find it useful!

Regards,
Michael

--
Thanks,
Michael E Sparks
Graduate Assistant, Brendel Lab
2128 Molecular Biology Building
Iowa State University
Ames, IA 50011-3260
1-515-294-4063
http://www.public.iastate.edu/~mespar1/


Forwarded Message:
> To: <plantgdb at iastate.edu>
> From: "Shannon D Schlueter" <sds at iastate.edu>
> Subject: FW: [Bioperl-l] parser for GeneSeqer
> Date: Fri, 30 Jun 2006 13:01:46 -0500
> -----
> >Date: Fri, 30 Jun 2006 10:01:38 -0700
> >From: Robert Buels <rmb32 at cornell.edu>
> >User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
> >To: bioperl-l at bioperl.org
> >Subject: [Bioperl-l] parser for GeneSeqer
> >Sender: bioperl-l-bounces at lists.open-bio.org
> >
> >Hi all,
> >
> >I find myself needing a parser for GeneSeqer output, so I'm writing one
> >(which I will submit for your consideration when it's working).  In a
> >nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> >ESTs to genomic sequence, then using those alignments to predict where
> >in the genomic sequence the genes are.  So really what you get from this
> >is a bunch of hierarchical features.
> >
> >I don't really know where I should put it in the bioperl hierarchy
> >though.  Probably FeatureIO?
> >
> >And what's the current fashion for objects it should emit? 
> >Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
> >
> >Rob
> >
> >--
> >Robert Buels
> >SGN Bioinformatics Analyst
> >252A Emerson Hall, Cornell University
> >Ithaca, NY  14853
> >Tel: 503-889-8539
> >rmb32 at cornell.edu
> >http://www.sgn.cornell.edu
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Thu Jun  1 00:43:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:43:48 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311707.08196.lstein@cshl.edu>
Message-ID: <002801c68514$72f11480$15327e82@pyrimidine>


> -----Original Message-----
> From: Lincoln Stein [mailto:lstein at cshl.edu]
> Sent: Wednesday, May 31, 2006 4:07 PM
> To: Chris Fields
> Cc: 'Hilmar Lapp'; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> 
> > Instances: 17	Module : Bio::DB::SeqFeature::Store
> 
> This is intentional. Bio::DB::SeqFeature::Store is intended to be a
> virtual
> base class. The throw_not_implemented() calls are there to force
> developers
> to override the needed interface methods.
> 
> If this is not the right way to do it, let me know and I'll fix it.

That's the right way, though I don't really know what the 'right way' is.
Sorry Lincoln, didn't mean to imply anything directly at you specifically; I
responded to your last post to stay in the thread, so to speak.  It was
meant to be a general statement that some classes haven't implemented
methods specified by their abstract base or interface class.  This is just
output from a quickie script I wrote up to check on this and see how many of
these statements are out there, and since there isn't a full-proof method to
know what an abstract base class is, it pulls in a few abstract classes
(such as yours) along with all the others.  At least there aren't as many
hits as Torsten's ~400-500 for 'return undef'! 

Anyway, I'm not sure what would be the best place to address code problems
or issues like the unimplemented methods issue or Torsten's audits (list,
wiki, etc); it's a delicate issue b/c it's bordering on code critiquing and
what constitutes good vs. bad code.  I remember some pretty heated arguments
about the 'proper' way to do things a while back involving AUTOLOAD'ing
methods, which I think is summarized somewhere in the wiki.  Myself, I'm a
microbiologist and not a programmer, so I'm prone to bouts of hackery, but I
try to have the code at least do what the docs state.

Chris

> Lincoln
> 
> 
> > Instances: 2	Module : Bio::DB::SeqVersion
> > Instances: 3	Module : Bio::DB::Taxonomy
> > Instances: 1	Module : Bio::FeatureIO::bed
> > Instances: 1	Module : Bio::Map::Marker
> > Instances: 1	Module : Bio::MapIO::fpc
> > Instances: 1	Module : Bio::MapIO::mapmaker
> > Instances: 1	Module : Bio::Restriction::IO::bairoch
> > Instances: 1	Module : Bio::Restriction::IO::itype2
> > Instances: 1	Module : Bio::Restriction::IO::withrefm
> > Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> > Instances: 3	Module : Bio::Tools::Run::WrapperBase
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > > Sent: Wednesday, May 31, 2006 1:15 PM
> > > To: Hilmar Lapp
> > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > > Subject: Re: [Bioperl-l] For CVS developers - potential
> > > pitfallwith"returnundef"
> > >
> > > If the documentation says "returns false" then I expect to be able to
> do
> > > this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless @result;
> > >
> > > If the documentation says "returns undef" then I expect this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless $result[0];
> > >
> > > Lincoln
> > >
> > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > > If the subroutine is documented to return "false" on failure, then
> > > > > one must call
> > > > > return (or "return ()" ).
> > > >
> > > > The problem seems to be that 'a value that evaluates to either true
> > > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > > false' ('a value or no value) are not the same in perl. And what
> > > > would/should one expect if the doc states 'true on success and false
> > > > otherwise'?
> > > >
> > > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > > I.e., avoid documenting 'a value or false' because it may be
> > > > ambiguous (not only) to the less proficient. 'True or false' should
> > > > imply a value being returned.
> > > >
> > > > Comments?
> > > >
> > > > 	-hilmar
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Thu Jun  1 00:56:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:56:12 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <002901c68516$316d4fe0$15327e82@pyrimidine>

Mauricio et al,

Sounds good, except that there are a few issues with the formatting done by
Pod::Simple::Wiki, such as changing some things to <code> tags when they
obviously aren't code; I don't know if thee is a work around for that
(Jay?).  It may not be anything too serious though.  

There was a similar issue with the INSTALL doc conversion to wiki that I ran
into, in that I don't think it will be easy converting one way or the other
(POD->wiki or wiki->POD or text), so syncing updates with wiki and CVS docs
could be an issue we'll have to face in the future.

We could strip the POD out of the script and have the docs on the wiki
(Brian's idea), or have minimal POD in the tutorial and keep the wiki
updated, just to simplify things, but this may not appeal to those who use
perldoc frequently (I personally use browsable prettified HTML).

cjf

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Wednesday, May 31, 2006 5:49 PM
> To: Chris Fields
> Cc: 'Brian Osborne'; 'Jay Hannah'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Brian, Jay, Chris,
> 
> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.
> 
> I don't think it should be a lot of problem to maintain both tutorials,
> as long as the 'main' one is the one in the CVS tree. By reading what
> Jay did in order to convert it into mediawiki format, I suppose this can
> be easily done again for each new change to the script (again, this is
> just my guessing). Besides, as far as I've seen, there aren't frequent
> commits to the script at all.
> 
> I've added a link in the left menu of the wiki. If you think it should
> point to the Tutorials page instead of the Bptutorial.pl page please let
> me know.
> 
> Regards,
> Mauricio.
> 
> Chris Fields wrote:
> > Brian, Jay,
> >
> > I think it would be nice to have the tutorial prominently displayed
> somehow
> > (Jay's suggestion), with a link provided via the tutorials page.
> Hopefully
> > this will help with the bioperl newbies.
> >
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> >> Sent: Wednesday, May 31, 2006 8:58 AM
> >> To: Jay Hannah; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> >>
> >> Jay,
> >>
> >> Excellent! Now we need to answer a few more questions for ourselves:
> >>
> >> - Do we remove the file bptutorial.pl from the package now? I'd say
> yes,
> >> we
> >> don't want to have to maintain two bptutorials.
> >>
> >> - What do we do with the script part of bptutorial.pl? It certainly
> could
> >> be
> >> excised and put into the examples/ directory, for example, but this
> would
> >> break a few of the paths that are being used.
> >>
> >> - A link to bptutorial? Or a link to the existing tutorials page?
> >> http://www.bioperl.org/wiki/Tutorials.
> >>
> >> Any thoughts on these?
> >>
> >>
> >> Brian O.
> >>
> >>
> >> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> >>
> >>> http://www.bioperl.org/wiki/Bptutorial.pl
> >>>
> >>> I think I just partially fulfilled this TODO:
> >>>
> >>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >>>
> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> >> the
> >>> wiki page via my web browser. (Is that proper procedure? Is the plan
> to
> >> just
> >>> do that manually from time to time as the document changes?)
> >>>
> >>> Now what?
> >>>
> >>> Should there be a new link on the far left of bioperl.org called
> >> "Tutorial"?
> >>> It's an amazing document. IMHO it should be listed prominently on
> >> bioperl.org.
> >>> HTH,
> >>>
> >>> j
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From osborne1 at optonline.net  Thu Jun  1 01:37:15 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 21:37:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <C0A3BD0B.8A2C%osborne1@optonline.net>

Mauricio,

Bernd didn't say he want the _script_ in the package, he said he wanted
bptutorial.pl in the package, not indicating whether it was the
documentation or the script that was important. It's my suspicion that the
documentation is more important than the script, and this is what my last
letter was asking, in part: is the script important? Or can we focus on the
text/POD part?

Brian O.


On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
<arareko at campus.iztacala.unam.mx> wrote:

> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.


From cjfields at uiuc.edu  Thu Jun  1 01:42:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 20:42:54 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <100682f110067a83.10067a83100682f1@emich.edu>
Message-ID: <002a01c6851c$b3b8a980$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Stephen Gordon Lenk
> Sent: Wednesday, May 31, 2006 4:52 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> Isn't it fairly standard in OO schemes/languages to have an exception
> thrown if a method
> can't be found at the
> end of a search up the class hierarchy? I recall being very mad at
> Smalltalk because "method
> not found" kept
> biting me. C++ has pure virtual base classes that do not allow objects to
> be instantiated
> directly; they are
> meant to be inherited and then implemented.

Perl will throw an error if it can't find a method in a class hierarchy.
It will do a few things first before dying, like looking for AUTOLOAD, etc.
AUTOLOAD has it's supporters and detractors; I try to stay away from it as
much as possible.

Not sure about C++ like pure virtual classes in Perl5, i.e. not allowing
direct object instantiation, but Perl6 is supposed to have them, at least
according to Apocalypse 12.  From what Mr. Wall says about OOP in Perl5,
it's essentially 'bolted on' but works with caveats (is 'private' really
'private'?).  Perl6 is rebuilt from scratch (internals are OO).

> Perl 6 was mentioned a bit back. Is this issue addressed there? Should it
> be? Do the Bioperl
> people feed their
> needs into Perl 6 so that all the code effort to make Bio::Root is handled
> for them in the next
> effort by Perl 6
> itself. Make the Perl 6 people solve these issues with your input, then
> you will not have to
> deal with
> implementing it yourselves. I'll just bet that you are not the only
> potential users of Perl 6 who
> will have to solve
> these issues eventually.

I think Perl6 will solve most (if not all) these problems since it's a
complete rebuild.  In fact, it's pretty much a new language altogether from
what I have seen (and the little I have played around with using Pugs).
Parrot is supposed to handle mixes of Perl5/Perl6, so it may not be
necessary to immediately convert all of bioperl to Perl6.  Though I have
also heard of a Perl5->6 converter in the works as well...  

>From an OO standpoint, I believe everything is considered an object in
Perl6, though it's not supposed to force you into using objects according to
the Apocalypses that I have read.  I actually see a lot there that reminds
me of C++ (but in a Perl-ish way, of course).  Apocalypse 12 is a good
primer, though you may want to go through the others first, they're heavy
slogging:

http://dev.perl.org/perl6/doc/design/apo/A12.html

Not sure what you mean by 'feeding our needs into Perl6'.  I have
periodically checked on perl6 progress and they seem to have everything well
under control.

Chris
 
> ----- Original Message -----
> From: Hilmar Lapp <hlapp at gmx.net>
> Date: Wednesday, May 31, 2006 5:21 pm
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> >
> > On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> >
> > > What about modules that have 'throw_not_implemented' statements
> > > present?
> >
> > Those are often if not always legitimate - the problem are those
> > that
> > don't have them but fail to override an inherited interface or
> > abstract method.
> >
> > If something is not implemented what is the better way to express
> > this other than throwing an exception? (and if it's not an
> > interface
> > or abstract base class, saying so in the documentation)
> >
> > 	-hilmar
> >
> > --
> >
> =========================================================
> ==
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >
> =========================================================
> ==
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jay at jays.net  Thu Jun  1 01:54:01 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 20:54:01 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <447E48B9.4080503@jays.net>

Brian Osborne wrote:
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.

We certainly wouldn't want to try to maintain two copies, one POD one in wiki. That would be the worst of all options. One option that hasn't been mentioned yet is to keep maintenance of that in POD in the distro (leaving the cool runability alone), and then flag that document as unchangeable in the wiki with a note on top "Maintenance of this document is done in POD in the distro. Submit POD patches to bioperl-l and we'll re-post an updated copy to this wiki."

Just a thought.

> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.

/README says this:

 scripts/    - Useful production-quality scripts with POD documentation
 examples/   - Scripts demonstrating the many uses of Bioperl

I'm personally not clear on the difference. Little stuff should start in examples/ and graduate to scripts/ once they've matured? 

Is the doc/ tree being abandoned?

doc/faq        (empty?)
doc/howto      
doc/howto/examples
doc/howto/figs (empty?)
doc/howto/html (empty?)
doc/howto/pdf  (empty?)
doc/howto/sgml (empty?)
doc/howto/txt  (empty?)
doc/howto/xml  (empty?)

Does all that stuff officially live in and is being changed in the wiki, never to return to the distro?

Any reason those empty dirs aren't nuked out of CVS?

Chris Fields wrote:
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  

Sorry, I spent zero time on the whole conversion. I'm not sure what parts didn't convert well. I've never done that conversion before, and know nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran off to work. :)

Mauricio Herrera Cuadra wrote:
> I've added a link in the left menu of the wiki. If you think it should 
> point to the Tutorials page instead of the Bptutorial.pl page please let 
> me know.

Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?

Documentation  (linked on the left menu)
- Quick start
- FAQ
- HOWTOs
- Tutorials

(What's the conceptual difference between a HOWTO and a tutorial?)

It's hard for me to dive into a wiki lifestyle for the huge documentation pillars since it can't ever get back into the distro... (can it?)  Small, throw away stuff is great for the wiki, but huge, established, thoughtful, long documents should be left in the distro? Present (and searchable) on the wiki but static?

Why isn't the short "Current events" just listed on the top of the "News" page?

Sick of my endless questions yet? -grin-

j


From cjfields at uiuc.edu  Thu Jun  1 03:09:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 22:09:38 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <000001c68528$d1b6ec10$15327e82@pyrimidine>


...

> We certainly wouldn't want to try to maintain two copies, one POD one in
> wiki. That would be the worst of all options. One option that hasn't been
> mentioned yet is to keep maintenance of that in POD in the distro (leaving
> the cool runability alone), and then flag that document as unchangeable in
> the wiki with a note on top "Maintenance of this document is done in POD
> in the distro. Submit POD patches to bioperl-l and we'll re-post an
> updated copy to this wiki."
> 
> Just a thought.

There are probably three schools of thought on docs: those that like nice
docs with links within and beyond BioPerl (hence the wiki), those who like
including docs with the distribution, and those that would like both.  The
latter would be nice but isn't realistic unless we can come up with a way to
sync changes between the wiki and CVS those docs we want to include with the
distribution w/o too much trouble.  I'm in the first school of thought since
rich text with links is better and more informative than plain text any day.
It might be a very small school though...

> > - What do we do with the script part of bptutorial.pl? It certainly
> could be
> > excised and put into the examples/ directory, for example, but this
> would
> > break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?

Most docs have been moved over to the wiki, which generates nicely formatted
docs for printing.
...

> Does all that stuff officially live in and is being changed in the wiki,
> never to return to the distro?

It's easier to add changes in the wiki and add markup, links, etc.  Much
richer text, so on.
 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know
> nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing
> then ran off to work. :)

No big deal.  

> Mauricio Herrera Cuadra wrote:
> > I've added a link in the left menu of the wiki. If you think it should
> > point to the Tutorials page instead of the Bptutorial.pl page please let
> > me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Okay, though Mauricio may know a bit more on how/if this can be done.
Mauricio?

> (What's the conceptual difference between a HOWTO and a tutorial?)

I believe the reasoning is along these lines: HOWTO's are focused in on
specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
has greater detail. The tutorials are more broadly based (sort of a general
bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
has additional information over the tutorial (at least it did the last time
I looked at the tutorial, which has been a while).

> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on
> the wiki but static?

Hence the problem we face now.  It is something we need to really look into
before adding too much more to the wiki.  IMHO, I think we should have very
little information directly in the distribution itself since it's already
quite large.  It's almost as easy to have a bare-bones INSTALL file, which
would point to the wiki for additional information.  But I may be very much
alone in that train of thought ; >

> Why isn't the short "Current events" just listed on the top of the "News"
> page?

Don't know.
 
> Sick of my endless questions yet? -grin-

Not really.

cjf

> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jay at jays.net  Thu Jun  1 04:58:29 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 23:58:29 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000001c68528$d1b6ec10$15327e82@pyrimidine>
References: <000001c68528$d1b6ec10$15327e82@pyrimidine>
Message-ID: <447E73F5.40403@jays.net>

Chris Fields wrote:
>> Is the doc/ tree being abandoned?
> 
> Most docs have been moved over to the wiki, which generates nicely formatted
> docs for printing.

Oh. Well, if we've already jumped off that cliff I say we just go for it. Move everything to the wiki, nuke the empty CVS dirs, and call it good.

I hereby volunteer to strip the code out of bptutorial.pl and put it wherever. Where should I put it when I'm done? (examples/tutorial.pl?)

>> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> I believe the reasoning is along these lines: HOWTO's are focused in on
> specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
> has greater detail. The tutorials are more broadly based (sort of a general
> bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
> has additional information over the tutorial (at least it did the last time
> I looked at the tutorial, which has been a while).

Huh. Sounds like a subtle line. I might suggest picking one name or the other and shuffling everything into one list on the wiki. 

>> It's hard for me to dive into a wiki lifestyle for the huge documentation
>> pillars since it can't ever get back into the distro... (can it?)  Small,
>> throw away stuff is great for the wiki, but huge, established, thoughtful,
>> long documents should be left in the distro? Present (and searchable) on
>> the wiki but static?
> 
> Hence the problem we face now.  It is something we need to really look into
> before adding too much more to the wiki.  IMHO, I think we should have very
> little information directly in the distribution itself since it's already
> quite large.  It's almost as easy to have a bare-bones INSTALL file, which
> would point to the wiki for additional information.  But I may be very much
> alone in that train of thought ; >

If the doc/ tree has already moved then I guess I just joined the all-wiki camp. I assume it stores full revision history and we have backups in case somebody blows something up. Any system is better than multiple systems breeding inconsistencies. Keep the spammers/clueless out and/or quickly remove their nonsense and I'm pro-wiki. Revisions email reviewers?

>> Sick of my endless questions yet? -grin-
> 
> Not really.

Give it a few more posts. It'll come. :)

j
Current toy: http://openlab.jays.net/


From ULNJUJERYDIX at spammotel.com  Thu Jun  1 06:53:46 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 1 Jun 2006 14:53:46 +0800
Subject: [Bioperl-l] **Fwd: Re: SOLVED ver2 Bio::Graphics::Panel make
	ruler have neg values
Message-ID: <5b6410e0605312353l1fbf8256hc8a2b85d0f0ac199@mail.gmail.com>

 Thanks Lincoln! Your code worked in ver 1.4 as well.
think the prob i had was due to me just adapting from the blast output
tutorial so i had something like
my $feature =
Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end,
-source=>$source);

and maybe also because I didn't have the + sign for the numbers

on a side note, I think that the ability to offset the ruler might prove
useful for some applications. Will spend more time to understand the
$relative_coords_offset option in the arrow.pm when i can afford to, and
perhaps help contribute an offset option to arrow.pm

cheers
kevin

Content-Disposition: inline
>
> Hi Kevin,
>
> Since you are modifying the Panel.pm source code, why don't you just go
> ahead
> and use the current Bio::Graphics development tree? Since 1.5.1 it
> supports
> negative coordinates. Here's an illustration:
>
> #!/usr/bin/perl
>
> use strict;
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
>
> my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
> my $feature =
> Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
> my $panel   = Bio::Graphics::Panel->new(-start=> -200,
>                                          -end  => +200,
>                                          -width=>800,
>                                          -pad_left=>10,
>                                          -pad_right=>10);
> $panel->add_track($whole,
>                    -glyph=>'arrow',
>                    -double=>1,
>                    -tick=>2);
> $panel->add_track($feature,
>                   -glyph=>'box',
>                    -stranded=>1);
> print $panel->png;
>
> exit 0;
>
> The resulting image is attached.
>
> Lincoln
>
> On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> > I am so sorry for the truncated email accidentally hit reply.
> > if anyone is interested i have opted to change
> >
> > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> > in linux its
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
> >
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
> >
> > to
> >
> >       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
> >
> > just  for this one-off use.
> >
> >
> >
> > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> > option for coords offset?
> >     my $relative_coords_offset =
> $self->option('relative_coords_offset');
> >     $relative_coords_offset    = 1 unless defined
> $relative_coords_offset;
> > but entering the option -relative_coords_offset=>1000 in the arrow
> glyphs
> > didn't do anything...
> >
> >
> >
> > Hi!
> >
> > > oh it was in a slightly different header asking about the create image
> > > map feature.
> > > I am using the stable version 1.4 of bioperl now. In any case I have
> not
> > > added the sequence as a feature annotated seq. as I already have the
> bp
> > > where the TF binds (in 1-1050 numberings) so what I did was to just
> add
> > > graded segments based on the position.
> > > I saw that there is a scale function for the arrow glyp however, it is
> a
> > > multiply function, can it be hacked to take in a offset value (ie
> minus
> > > the
> > > scale by 1000?)
> > >
> > > cheers
> > > kevin
> > >
> > >
> > > Hi,
> > >
> > > > For some reason I didn't see the first posting on this. In current
> > >
> > > bioperl
> > >
> > > > live, the ruler can have negative numberings - I use this routinely.
> > > > You need
> > > > to create a feature that starts in negative coordinates. What is
> > >
> > > happening
> > >
> > > > to
> > > > you when you try this?
> > > >
> > > > Lincoln
> > > >
> > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > > Hi
> > > > > thanks for the help offered thus far!
> > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer
> seq
> > > >
> > > > using
> > > >
> > > > > bioperl. therefore i was asked to make the numberings as such
> (-1000)
> > >
> > > is
> > >
> > > > > there any way at all to do this in bioperl without changing the
> .pm
> > > >
> > > > file?
> > > >
> > > > > thanks guys..
> > > > > kevin
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > Lincoln D. Stein
> > > > Cold Spring Harbor Laboratory
> > > > 1 Bungtown Road
> > > > Cold Spring Harbor, NY 11724
> > > > (516) 367-8380 (voice)
> > > > (516) 367-8389 (fax)
> > > > FOR URGENT MESSAGES & SCHEDULING,
> > > > PLEASE CONTACT MY ASSISTANT,
> > > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 07:59:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 08:59:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <447E9E59.6090709@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
>> Just looking for all return undef;s isn't enough. It's entirely possible
>> to do something like:
>>
>> my $return_value;
>> {
>>    # do something that assigns to return_value on success
>>    # on failure, just do nothing
>> }
>> return $return_value;
> 
> Agreed, though looking for these is obviously much harder.  
> 
> The way to get around those is:
> 
> return $return_value if $return_value;
> return;
> 
> which I've seen used in a number of get/set methods. 

Though if anyone is using that cookie-cutter/macro style, that's much 
worse because now you can't return 0.

return $return_value if defined($return_value);
return;

In any case, it burns the eyes. I share Lincoln's POV. I also fully 
understand your point about not being able to trust the docs 
(Bio::Map::Marker...). But the solution is to change the code so they 
match the docs when the docs make sense, not change the code so that it 
no longer matches the docs[*]. In a massive OO project like bioperl the 
users need to be able to rely on the docs. You can't turn around and say 
"you've used this method for years, but now I'm changing how it works 
because you might have used the method incorrectly". Ideally any code 
changes add functionality or improve it's working without affecting code 
  that uses the method correctly according to its old docs.


* though if there isn't time/interest in changing the code, and the 
method never worked as per the docs, then by all means change the docs 
to avoid confusion - just don't change the docs on a method that worked 
according to the docs, because then you can assume people use the method 
and will be affected by the change


From lstein at cshl.edu  Thu Jun  1 15:40:38 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 1 Jun 2006 11:40:38 -0400
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
Message-ID: <200606011140.38726.lstein@cshl.edu>

Hi,

The border is coming from the HTML <img. To get rid of it, set -border=>0 in 
the img() call.

Lincoln


On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> Hello everybody,
>
> does anybody know how to remove the background color of the Panel.
> Currently, I am not adding anything to it, so I can troubleshot the
> problem, and I have tried setting up
> all color attributes I could find to the panel, but no luck. Whatever I do,
> I get the BLUE border of the panel.
>
> Has anybody faced the same problem?
>
> Thanks in advance,
>
> Jelena
>
> And here is the code I am currently using:
>
> ---------------------------------------------------------------------------
>-------------------------------- my $panel =
>     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
>                               -width => 800,
>                               -pad_left => 10,
>                               -pad_right => 10,
>                               -key_color => 'white',
>                               -bgcolor => 'white',
>                               -gridcolor=>'black',
>                               -fgcolor => 'black',
>                               -grid => 0,
>                               );
>    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
>      -url  => '/tmpimages');
>    #make clickable image
>    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
>    print $map;
>
> ---------------------------------------------------------------------------
>--------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From arareko at campus.iztacala.unam.mx  Thu Jun  1 16:13:05 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:13:05 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A3BD0B.8A2C%osborne1@optonline.net>
References: <C0A3BD0B.8A2C%osborne1@optonline.net>
Message-ID: <447F1211.2010705@campus.iztacala.unam.mx>

You're right Brian. I also think that the text/POD part is more 
important than the script. Since we're more into moving everything to 
the Wiki, I believe this would be the right approach.

Moving the script part of the tutorial into the examples/ directory is 
also a nice idea.

Mauricio.

Brian Osborne wrote:
> Mauricio,
> 
> Bernd didn't say he want the _script_ in the package, he said he wanted
> bptutorial.pl in the package, not indicating whether it was the
> documentation or the script that was important. It's my suspicion that the
> documentation is more important than the script, and this is what my last
> letter was asking, in part: is the script important? Or can we focus on the
> text/POD part?
> 
> Brian O.
> 
> 
> On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> <arareko at campus.iztacala.unam.mx> wrote:
> 
>> I agree with what Bernd Web said in another reply. For some people will
>> be nice to still be able to run the script from the codebase and
>> interact with it.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 16:20:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:20:34 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1211.2010705@campus.iztacala.unam.mx>
Message-ID: <000b01c68597$5026bdf0$15327e82@pyrimidine>

Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
/scripts or /examples then?

Also, what do we do about similar situation with other docs moved to the
wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
distribution pointing out the wiki docs instead?

Chris

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Thursday, June 01, 2006 11:13 AM
> To: Brian Osborne
> Cc: Chris Fields; bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> You're right Brian. I also think that the text/POD part is more
> important than the script. Since we're more into moving everything to
> the Wiki, I believe this would be the right approach.
> 
> Moving the script part of the tutorial into the examples/ directory is
> also a nice idea.
> 
> Mauricio.
> 
> Brian Osborne wrote:
> > Mauricio,
> >
> > Bernd didn't say he want the _script_ in the package, he said he wanted
> > bptutorial.pl in the package, not indicating whether it was the
> > documentation or the script that was important. It's my suspicion that
> the
> > documentation is more important than the script, and this is what my
> last
> > letter was asking, in part: is the script important? Or can we focus on
> the
> > text/POD part?
> >
> > Brian O.
> >
> >
> > On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
> > <arareko at campus.iztacala.unam.mx> wrote:
> >
> >> I agree with what Bernd Web said in another reply. For some people will
> >> be nice to still be able to run the script from the codebase and
> >> interact with it.
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 16:28:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 11:28:38 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447E9E59.6090709@mrc-dunn.cam.ac.uk>
Message-ID: <000c01c68598$704b15d0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 2:59 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> Chris Fields wrote:
> >
> > Sendu Bala wrote:
> >> Just looking for all return undef;s isn't enough. It's entirely
> possible
> >> to do something like:
> >>
> >> my $return_value;
> >> {
> >>    # do something that assigns to return_value on success
> >>    # on failure, just do nothing
> >> }
> >> return $return_value;
> >
> > Agreed, though looking for these is obviously much harder.
> >
> > The way to get around those is:
> >
> > return $return_value if $return_value;
> > return;
> >
> > which I've seen used in a number of get/set methods.
> 
> Though if anyone is using that cookie-cutter/macro style, that's much
> worse because now you can't return 0.
> 
> return $return_value if defined($return_value);
> return;

Makes sense.  Really, this all comes down to semantics and the context of
how the method is called and what is expected as a return value.  I suppose
it also depends on what one considers 'best practice,' which can be
subjective.  I don't want us getting into a situation in which we come
across as critiquing someone else's code w/o some valid points, i.e.
Lincoln's point about complaining.  I think that's why this thread is pretty
important, in that we're getting a broad range of opinions on the issue.

> In any case, it burns the eyes. 

Yep, I agree. 

> I share Lincoln's POV. I also fully
> understand your point about not being able to trust the docs
> (Bio::Map::Marker...). But the solution is to change the code so they
> match the docs when the docs make sense, not change the code so that it
> no longer matches the docs[*]. In a massive OO project like bioperl the

So you know, Lincoln and I both support the idea of an audit.  He also notes
(and I agree) that people will likely complain.  

Anyway, changing the code to match the docs makes sense therotically, but in
practice that doesn't always work.  Any situation where code does not behave
as expected (i.e. as described in the docs) are bugs and can be reported as
such.  The problem arises when the docs are completely wrong, as
Bio::Restriction::IO was before I made changes to it.  In many cases simple
small code changes won't work, such as when methods inherit from an
interface but don't implement all methods (so essentially are incomplete).

Hilmar made the point that we should change the docs to reflect
inconsistencies in particular plugin modules for IO classes (AlignIO has a
few modules with unimplemented write methods, and so on).  When the code
radically varies, such as in the Restriction::IO case (where none of the
write methods worked), the docs should be changed in the IO class to reflect
this.  Of course, you should also add a bit to the TO DO section of POD and
add a bit to the Project Priority List on the wiki to point this out, both
of whichI did.  It comes down to 'truth in advertising', does it do what's
expected.

> users need to be able to rely on the docs. You can't turn around and say
> "you've used this method for years, but now I'm changing how it works
> because you might have used the method incorrectly". Ideally any code

Not what I did, BTW.  The API is intact; you can still use the write methods
if you want (they throw errors just fine).  In fact, I didn't change any
methods except in one module (Restriction::IO::bairoch), where I added a
warning to the read method b/c it didn't work as expected, and I filed a bug
report.  Essentially, the only thing I changed was the docs to reflect what
the code currently can accomplish (at least until you read the TO DO).  We
already had one person email the group asking why code in the synopsis
didn't work.

Adding read and write methods to most of these modules (making the code do
what the docs reflect, in your words) is a lot of work, esp. for someone
like me unfamiliar with the class architecture and methods for those
modules.  IMHO, contributions to bioperl should accomplish what is reflected
in their docs once added to the core; if a write method hasn't been written,
then add it to the docs in a TO DO section or add a warning to the synopsis.
Don't put in the docs what you intend the code to accomplish down the road
but what it does currently.  Is that unreasonable?

Anyway, when something doesn't perform as expected (produces invalid output
or contains errors), it's considered a bug.  That includes misrepresenting
what a module does in the docs.  When we try to fix bugs we have to decipher
what the intent of the original author was from the docs and code, then try
to get it to work by modifying the code.  In extreme cases (such as
unimplemented methods) that may mean writing up entire methods from scratch.
The read and write methods for IO modules are normally the longest methods
in a class.  That's a heck of a lot of effort for something that a large
majority of us aren't interested in taking up, esp. when the submitting
author should have had everything up to spec (i.e. what's in the docs) when
adding it to the core.

> changes add functionality or improve it's working without affecting code
>   that uses the method correctly according to its old docs.
> 
> 
> * though if there isn't time/interest in changing the code, and the
> method never worked as per the docs, then by all means change the docs
> to avoid confusion - just don't change the docs on a method that worked
> according to the docs, because then you can assume people use the method
> and will be affected by the change

Again, didn't do that.  The methods in the docs either didn't exist (not
implemented) or didn't work (contained bugs).  The docs were changed b/c
they were misleading.

-chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 16:36:07 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 11:36:07 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
References: <C0A31929.89F9%osborne1@optonline.net> <447E48B9.4080503@jays.net>
Message-ID: <447F1777.3070906@campus.iztacala.unam.mx>

Jay Hannah wrote:
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should 
>> point to the Tutorials page instead of the Bptutorial.pl page please let 
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Nice idea, I'll check with Jason if it's possible (in mediawiki) to 
create a new Documentation sidebar to hold this 4 sections.

> (What's the conceptual difference between a HOWTO and a tutorial?)

My concept is that Tutorials cover a wider aspect of BioPerl, contrary 
to the HOWTO's which focus on a certain topic.

> Why isn't the short "Current events" just listed on the top of the "News" page?

I don't know, maybe because it was important when Jason started the Wiki 
a couple of months ago. Do you think it should be erased from the sidebar?

> Sick of my endless questions yet? -grin-
> 
> j
> 

Of course not! :)

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 16:46:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:46:03 +0100
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <000c01c68598$704b15d0$15327e82@pyrimidine>
References: <000c01c68598$704b15d0$15327e82@pyrimidine>
Message-ID: <447F19CB.4090607@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Sendu Bala wrote:
[snip]
>> users need to be able to rely on the docs. You can't turn around and say
>> "you've used this method for years, but now I'm changing how it works
>> because you might have used the method incorrectly". Ideally any code
> 
> Not what I did, BTW.
[snip]
>> * though if there isn't time/interest in changing the code, and the
>> method never worked as per the docs, then by all means change the docs
>> to avoid confusion - just don't change the docs on a method that worked
>> according to the docs, because then you can assume people use the method
>> and will be affected by the change
> 
> Again, didn't do that.

I'm very sorry that I allowed the ambiguity, but my comments were 
certainly not directed at your recent changes to Bio::Restriction::IO. 
In fact, I put in the above * comment to exclude your changes from my 
discussion; you changed the docs because the code never did what they 
said they did (the docs were bad). That's fine (good!). My comments were 
a general point, slightly directed at the idea of changing all the 
return undef;s - changing the code so that it no longer matches the docs 
of a previously working method. That's what I think is bad. Though in 
this particular case it shouldn't make any difference at all.


From osborne1 at optonline.net  Thu Jun  1 16:46:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 12:46:02 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <C0A4920A.8A5B%osborne1@optonline.net>

Chris,

I think the INSTALL* files should be in the package, this is the de facto
convention for 99% of the packages I've ever seen. Then any Wiki page just
links to the file in CVS.

Personally I don't like the idea of maintaining a Wiki page and a file that
both say essentially the same thing (this is what has happened with the
INSTALL and INSTALL.WIN files). I've spent plenty of time merging redundant
text and removing files that contained these redundancies so it's
unfortunate to see them appear anew, sooner or later they'll get out of sync
despite best intentions. The most likely cause will be someone other than
the person who created the initial duplication (and promised to maintain
both) making a change in one of the two files.

Brian O.


On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?


From sb at mrc-dunn.cam.ac.uk  Thu Jun  1 16:57:27 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 01 Jun 2006 17:57:27 +0100
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine>
References: <000b01c68597$5026bdf0$15327e82@pyrimidine>
Message-ID: <447F1C77.5040403@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sounds good to me.  I guess the tutorial (post-stripping)would be moved to
> /scripts or /examples then?
> 
> Also, what do we do about similar situation with other docs moved to the
> wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in the
> distribution pointing out the wiki docs instead?

Imho, something like an installation document should be there in full so 
once you've downloaded you can install without reference to anything 
else. Also, an installation document could be considered specific to the 
release version. Which is to say, it never goes out of date even if new 
versions of bioperl are released with new installation instructions - it 
applies to the installation directory it is found in.

The wiki can have the latest installation instructions, and you don't 
have to worry about keeping things synced.


From cjfields at uiuc.edu  Thu Jun  1 17:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:13:30 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447F1C77.5040403@mrc-dunn.cam.ac.uk>
Message-ID: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>

So basically have a minimal set of installation instructions in CVS and a
more detailed installation instructions on the wiki.  Sounds reasonable
enough but bioperl is a pretty complex distribution (lots of additional
modules required, platform-specific issues, so on).  Maybe we can come up
with a pared-down INSTALL file which combines the basic elements for
installing on UNIX/Windows/Mac/FreeBSD and points out dependencies.  

I still like the idea of just having a simple conversion from wiki->txt
direct from the web page (i.e. best of both worlds).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 01, 2006 11:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris Fields wrote:
> > Sounds good to me.  I guess the tutorial (post-stripping)would be moved
> to
> > /scripts or /examples then?
> >
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?
> 
> Imho, something like an installation document should be there in full so
> once you've downloaded you can install without reference to anything
> else. Also, an installation document could be considered specific to the
> release version. Which is to say, it never goes out of date even if new
> versions of bioperl are released with new installation instructions - it
> applies to the installation directory it is found in.
> 
> The wiki can have the latest installation instructions, and you don't
> have to worry about keeping things synced.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s-merchant at northwestern.edu  Thu Jun  1 17:17:32 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Thu, 1 Jun 2006 12:17:32 -0500
Subject: [Bioperl-l] Bio::OntologyIO
Message-ID: <000001c6859f$446f7fd0$c2987ca5@pc13>

Hi Everyone,

    I would like to announce the availability of an obo format parser
which can parse GO, PO, PATO and other ontology files in obo format. The
parser can be used through the Bio::OntologyIO module. Thanks to HIlamar
Lapp and Chris Mungall for their invaluable contributions.

 
Thanks,

Sohel Merchant.

 
Sohel Merchant

dictyBase

Bioinformatics Software Engineer

Center for Genetic Medicine

Northwestern University

676 St. Clair Street, Suite 1206

Chicago IL 60611

 
From cjfields at uiuc.edu  Thu Jun  1 17:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:35 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4920A.8A5B%osborne1@optonline.net>
Message-ID: <001101c685a3$53f4bf70$15327e82@pyrimidine>

I understand your point, though I think the wiki gives us an opportunity add
helpful links and use markup to help clarify things a bit more.  I have seen
several distributions which don't have INSTALL files, just simple README
with very basic instructions (Bio::ASN1::EntrezGene is one).  

I've been reluctant to mess around with the wiki Install pages too much more
b/c of syncing problems, just as you mentioned.  I will look into thing a
bit more to see if there's an easier way to go about converting wiki->text.

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 11:46 AM
> To: Chris Fields; 'Mauricio Herrera Cuadra'
> Cc: bioperl-l at lists.open-bio.org; 'Jay Hannah'
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> I think the INSTALL* files should be in the package, this is the de facto
> convention for 99% of the packages I've ever seen. Then any Wiki page just
> links to the file in CVS.
> 
> Personally I don't like the idea of maintaining a Wiki page and a file
> that
> both say essentially the same thing (this is what has happened with the
> INSTALL and INSTALL.WIN files). I've spent plenty of time merging
> redundant
> text and removing files that contained these redundancies so it's
> unfortunate to see them appear anew, sooner or later they'll get out of
> sync
> despite best intentions. The most likely cause will be someone other than
> the person who created the initial duplication (and promised to maintain
> both) making a change in one of the two files.
> 
> Brian O.
> 
> 
> On 6/1/06 12:20 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > Also, what do we do about similar situation with other docs moved to the
> > wiki (INSTALL, INSTALL.WIN, etc)?  Should we have a placeholder file in
> the
> > distribution pointing out the wiki docs instead?


From cjfields at uiuc.edu  Thu Jun  1 17:46:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 12:46:45 -0500
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <447F19CB.4090607@mrc-dunn.cam.ac.uk>
Message-ID: <001201c685a3$59d78da0$15327e82@pyrimidine>


....

> > Again, didn't do that.
> 
> I'm very sorry that I allowed the ambiguity, but my comments were
> certainly not directed at your recent changes to Bio::Restriction::IO.
> In fact, I put in the above * comment to exclude your changes from my
> discussion; you changed the docs because the code never did what they
> said they did (the docs were bad). That's fine (good!). My comments were
> a general point, slightly directed at the idea of changing all the
> return undef;s - changing the code so that it no longer matches the docs
> of a previously working method. That's what I think is bad. Though in
> this particular case it shouldn't make any difference at all.

Agreed.  In any case, if tests have been properly set up then they should
catch problems.  This is, of course, if they are properly set up.  

Chris


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gad14 at cornell.edu  Thu Jun  1 19:10:31 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Thu, 01 Jun 2006 15:10:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447D5668.7070500@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu>
	<447BFB20.40501@mrc-dunn.cam.ac.uk>	<447C7985.9000404@cornell.edu>
	<447D5668.7070500@mrc-dunn.cam.ac.uk>
Message-ID: <447F3BA7.9030500@cornell.edu>

Problem solved, albeit, in a slightly hacky way.

I tried to make seek() work for a good long while with the SearchIO 
blast results object, but I just couldn't get it to work. (Probably b/c 
seek wants to see a genuine file handle-- not a SearchIO filehandle.) I 
used SearchIO's fh() to get the handle and could while(<$fh>) through 
the data but when I used seek($fh,0,0) to reset the cursor position in 
the handle in prep for another loop, i got an error complaining about my 
use of seek() by indicating that "SEEK" could not be found in Seekable.pm.

I concluded that it was not going to be possible and instead made an 
array if SeqFeature objects which contain all the relevant blast output 
data (i.e. the m8/hit table stuff).

It still seems unfortunate that one can't reuse the SearchIO object for 
cases when the SearchIO blast report needs to be accessed mltiple times.

Thanks for your help,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>>Thanks for your comment Sendu, it was very helpful. I think this must be 
>>what's going on.. I am using $blast_report->next_result in both 
>>subroutines. It appears that analyzing the blast results first w/ my 
>>sort subroutine empties (?) the $blast_result object so that when I try 
>>to print, there is nothing left to print. (and visa-versa when I print 
>>first then try to sort).
>>So, from the looks of things, using next_result has the effect of 
>>popping the Bio::Search::Result::ResultI objects off of the SearchIO 
>>blast report object??
> 
> 
> Not quite. It's more or less exactly like opening a file and then trying 
> to read it all twice like this:
> open(FILE, "file");
> while (<FILE>) {
>      print # prints each line in the file
> }
> while (<FILE>) {
>      print # never happens, we never enter this while loop
> }
> 
> To get the second while loop to print anything we need to say seek(FILE, 
> 0, 0) before it. Or in the first while loop store each line in an array, 
> and then make the second loop a foreach through that array.
> 
> 
> 
>>It seems I could get around this by making a copy of the blast report by 
>>setting it to another new variable...(not the most elegant solution) but 
>>I'm having trouble with this...
>>
>>If I do:
>>
>>    my $blast_report_copy = $blast_report;
>>
>>I'm just copying the reference to the SearchIO blast result, so it 
>>doesn't help me. How can I make another physical copy of this blast 
>>result object? Seems like a simple thing but how to do it is escaping me.
> 
> 
> Not really a good idea, and it may not work anyway if the object 
> contains a filehandle. But for a simple object you might recursively 
> loop through the data structure and copy each element out into a similar 
> data structure.
> 
> 
> 
>>But better yet, the way to go is to 'reset the counter,' or to find a 
>>way to look at/print/sort the results without removing data from the 
>>blast result object. How is this done though??
> 
> 
> It would be rather nice if this worked:
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);
> 
> Finally, the most sensible way (assuming bioperl provides no methods of 
> its own for this) of solving the problem is, the first time you go 
> through each next_result, next_hit and next_hsp, just store the returned 
> objects in an array of arrays of arrays. Then the second time get the 
> objects from your array structure instead of with the method calls.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jelenaob at gmail.com  Thu Jun  1 15:45:49 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Thu, 1 Jun 2006 08:45:49 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
In-Reply-To: <200606011140.38726.lstein@cshl.edu>
References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>
	<200606011140.38726.lstein@cshl.edu>
Message-ID: <5042a62b0606010845u79a5d5b3h131c4ed54f90fee3@mail.gmail.com>

Thanks Lincoln.

I figure out the solution just after I post a question, Murpfy's law ... but
my post left hanging in my email ... :(

The problem is in CGI->img method.

Instead of  print $cgi->img({-src=>$url,-usemap=>"#$mapname"});

I should have used: rint $cgi->img({-src=>$url,-usemap=>"#$mapname",
-border=>undef});

Thanks anyways for your help.

Cheers,

Jelena

On 6/1/06, Lincoln Stein <lstein at cshl.edu> wrote:
>
> Hi,
>
> The border is coming from the HTML <img. To get rid of it, set -border=>0
> in
> the img() call.
>
> Lincoln
>
>
>
> On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote:
> > Hello everybody,
> >
> > does anybody know how to remove the background color of the Panel.
> > Currently, I am not adding anything to it, so I can troubleshot the
> > problem, and I have tried setting up
> > all color attributes I could find to the panel, but no luck. Whatever I
> do,
> > I get the BLUE border of the panel.
> >
> > Has anybody faced the same problem?
> >
> > Thanks in advance,
> >
> > Jelena
> >
> > And here is the code I am currently using:
> >
> >
> ---------------------------------------------------------------------------
> >-------------------------------- my $panel =
> >     Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
> >                               -width => 800,
> >                               -pad_left => 10,
> >                               -pad_right => 10,
> >                               -key_color => 'white',
> >                               -bgcolor => 'white',
> >                               -gridcolor=>'black',
> >                               -fgcolor => 'black',
> >                               -grid => 0,
> >                               );
> >    my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url'
> ,
> >      -url  => '/tmpimages');
> >    #make clickable image
> >    print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
> >    print $map;
> >
> >
> ---------------------------------------------------------------------------
> >--------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>


From osborne1 at optonline.net  Thu Jun  1 19:36:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:36:27 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <000d01c6859e$b47cc2c0$15327e82@pyrimidine>
Message-ID: <C0A4B9FB.8A71%osborne1@optonline.net>

Chris,

Right - how would this be done?

Brian O.


On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> I still like the idea of just having a simple conversion from wiki->txt
> direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 19:44:13 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:44:13 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <C0A4BBCD.8A74%osborne1@optonline.net>

Jay,

You asked about the doc/ directory. The only directory I see in my
bioperl-live/doc directory is examples/, the reason this remains is that it
contains scripts and images related to the Graphics HOWTO, in theory these
could be moved to the Wiki and the examples/ directory deleted. One
explanation for why you see doc/html and all those other dirs is that you
aren't using the 'cvs -d' option (there are other explanations) when you
update.

If examples/ is removed then presumably the README can be removed and
makedoc.pl moved elsewhere.

Brian O.


On 5/31/06 9:54 PM, "Jay Hannah" <jay at jays.net> wrote:

> Brian Osborne wrote:
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
> 
> We certainly wouldn't want to try to maintain two copies, one POD one in wiki.
> That would be the worst of all options. One option that hasn't been mentioned
> yet is to keep maintenance of that in POD in the distro (leaving the cool
> runability alone), and then flag that document as unchangeable in the wiki
> with a note on top "Maintenance of this document is done in POD in the distro.
> Submit POD patches to bioperl-l and we'll re-post an updated copy to this
> wiki."
> 
> Just a thought.
> 
>> - What do we do with the script part of bptutorial.pl? It certainly could be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?
> 
> doc/faq        (empty?)
> doc/howto      
> doc/howto/examples
> doc/howto/figs (empty?)
> doc/howto/html (empty?)
> doc/howto/pdf  (empty?)
> doc/howto/sgml (empty?)
> doc/howto/txt  (empty?)
> doc/howto/xml  (empty?)
> 
> Does all that stuff officially live in and is being changed in the wiki, never
> to return to the distro?
> 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
>> Jay, looks like there are still some weird formatting issues with the
>> bptutorial wiki page, something which I ran into before when getting the
>> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
>> spaces preceding a line denotes code for some reason).  Not much you can do
>> in these cases except remove the extra spaces in those spots.  Looking good
>> though!  
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know nothing
> about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran
> off to work. :)
> 
> Mauricio Herrera Cuadra wrote:
>> I've added a link in the left menu of the wiki. If you think it should
>> point to the Tutorials page instead of the Bptutorial.pl page please let
>> me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials
> 
> (What's the conceptual difference between a HOWTO and a tutorial?)
> 
> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on the
> wiki but static?
> 
> Why isn't the short "Current events" just listed on the top of the "News"
> page?
> 
> Sick of my endless questions yet? -grin-
> 
> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun  1 19:47:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 14:47:40 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A4B9FB.8A71%osborne1@optonline.net>
Message-ID: <001301c685b4$3dbfb820$15327e82@pyrimidine>

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, June 01, 2006 2:36 PM
> To: Chris Fields; 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Chris,
> 
> Right - how would this be done?

I'll look into a few of the wiki converters, there are a few things that
claim to convert wiki to other formats (and vice versa).  It may not be
direct, though.  I'll post anything if I figure something out.

Chris
 
> Brian O.
> 
> 
> On 6/1/06 1:13 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > I still like the idea of just having a simple conversion from wiki->txt
> > direct from the web page (i.e. best of both worlds).


From osborne1 at optonline.net  Thu Jun  1 19:45:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 01 Jun 2006 15:45:39 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E73F5.40403@jays.net>
Message-ID: <C0A4BC23.8A75%osborne1@optonline.net>

Jay,

Yes, good idea, thank you for volunteering.

Brian O.


On 6/1/06 12:58 AM, "Jay Hannah" <jay at jays.net> wrote:

> I hereby volunteer to strip the code out of bptutorial.pl and put it wherever.
> Where should I put it when I'm done? (examples/tutorial.pl?)


From hubert.prielinger at gmx.at  Thu Jun  1 20:33:45 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 01 Jun 2006 14:33:45 -0600
Subject: [Bioperl-l] remoteblast xml problem
Message-ID: <447F4F29.9070600@gmx.at>

hi,
I have the following program and it worked quite well, for retrieving 
remoteblast results in a textfile,
now I have altered it to to xml, and it didn't work anymore.....
it takes all the parameter at the commandline, submits the query, but I 
don't retrieve any results file anymore.....

it seems that it hangs in a endless loop......
the only output I get is:  $rc is not a ref! over and over..... it 
doesn't enter the else term anymore....

every help is appreciated, thanks in advance


#!/usr/bin/perl -w

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use IO::String;
use Bio::SearchIO;


#use lib qw(/usr/local/bioperl/bioperl-1.5.1);

print "Please insert database:\t";
my $db_STD = <STDIN>;
chomp $db_STD;

print "Please insert matrix:\t";
my $matrix_STD = <STDIN>;
chomp $matrix_STD;

print "Please insert count:\t";
my $count_STD = <STDIN>;
chomp $count_STD;

print "Please insert gapcosts:\t";
my $gapcosts_STD = <STDIN>;
chomp $gapcosts_STD;

my $prog   = 'blastp';
my $db     = $db_STD;           
my $e_val  = '20000';
my $matrix = $matrix_STD;               
my $wordSize = '2';


my @data;
my $line_dataArray;
my $rid;
my $count = $count_STD;           
my @params = (
  '-prog'   => $prog,
  '-data'   => $db,
  '-expect' => $e_val,
  '-MATRIX_NAME' => $matrix,
  '-readmethod' => 'xml',
  '-WORD_SIZE' => $wordSize,
);

my $seqio_obj = Bio::SeqIO->new(
  -file   => "aloneblosum62.txt",
  -format => "raw",
);

print "entering blast....";

my $xmlFactory = Bio::Tools::Run::RemoteBlast->new(@params);


$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
    $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = 
$gapcosts_STD;                   
    $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000';
     $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'ALIGNMENTS'} = '1000';
    $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML';
   

print "Blast entered successfully \n";

while ( my $query = $seqio_obj->next_seq ) {
  print "submit Sequence...just do it....\n";
 
  my $r = $xmlFactory->submit_blast($query);
  print $query->seq;
  print "\n";
 
 
#    sleep 30;

  # Wait for the reply and save the output file
  print "entering while loop for saving Output.... \n";
 
  while ( my @rids = $xmlFactory->each_rid ) {
      foreach my $rid (@rids) {
           
          my $rc = $xmlFactory->retrieve_blast($rid);
          if ( !ref($rc) ) {
              print '$rc is not a ref!', "\n";
              if ( $rc < 0 ) {
                  print "Remove rid ...\n";
                  $xmlFactory->remove_rid($rid);
              }
              # sleep 5;
          }
          else {

              print "retrieved Results successfully \n";
              print $rid;
              print "\n";
              my $filename = "comp80swiss$count.xml";
              $xmlFactory->save_output($filename);
              print "File saved successfully \n";
              my $checkinput = $xmlFactory->file;
              open(my $fh,"<$checkinput") or die $!;
              while(<$fh>){
                print;
              }
              close $fh;
              $count++;
              $xmlFactory->remove_rid($rid);
          }
      }
      print "\n";
      print "\n";

  }
}


From emmanuel.quevillon at versailles.inra.fr  Thu Jun  1 21:15:42 2006
From: emmanuel.quevillon at versailles.inra.fr (Emmanuel Quevillon)
Date: Thu, 01 Jun 2006 23:15:42 +0200
Subject: [Bioperl-l] How to submit new module?
Message-ID: <447F58FE.7020603@versailles.inra.fr>

Hi,

I just created some new parsers for TargetP, TandemRepeatFinder and
RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
like to know the differents steps procedure to submit them to BioPerl
and to be integrated in the next release (I hope)?
Is there any documentation about it?

Thanks

-- 
Emmanuel

---------------------------------------------------------------------
Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>

INRA-URGI / Bayer CropScience
523 Place des Terrasses             http://www.infobiogen.fr
91000 EVRY                          http://urgi.infobiogen.fr
Tel : 01 60 87 37 42                http://www.bayercropscience.com

PGP public key server : http://pgp.mit.edu/
Key ID : 0x0B84357F
---------------------------------------------------------------------


From cjfields at uiuc.edu  Thu Jun  1 21:36:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:36:05 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447F3BA7.9030500@cornell.edu>
Message-ID: <001b01c685c3$63840070$15327e82@pyrimidine>

Genevieve, 

seek() won't work here; all the file IO is handled through Bio::Root::IO
methods.  The SearchIO system is set up like an XML SAX parser so if you
want to save objects as they come you'll have to store the object refs in an
array, like so:

my @hsps;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      while ($hsp = $hit->next_hsp) {
         push @hsps, $hsp;
      }
   }
}

Or similarly with hits: 

my @hits;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      push @hits, $hit;
   }
}

Or you could use more complex data structures (array of arrays) as Sendu
suggested.  You should be able to sort like anything else by calling methods
within the sort:

# total number of hsps
my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;

# if you really like your accessions in alphabetical order
my @sorted = sort {$a->accession cmp $b->accession} @hits;

Then if you wanted to print later you could sort based on something else,
like the score:

my @sort_score = sort {$a->score <=> $b->score} @hits;

So you would end up with something like the following subroutines:

sub sort_results{
   my $report = shift;
   while($result = $report->next_result()){
      while(my $hit = $result->next_hit()){
         push @hits, $hit;
      }
   }
   my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;
   print $_->accession,"\t",$_->num_hsps,"\n" for @sorted;
}

sub print_blast_results{
   my $report = shift;
   my @sort_score = sort {$a->score <=> $b->score} @hits;
   for my $h (@sort_score) {
      while (my $hsp = $h->next_hsp) {
         # might use something else here like hit->name or accession,
         # not sure what you want
         my $q_name = $hsp->seq_id; 
         print join(", ",$q_name,$h->name,$hsp->bits)."\n";
         }
   }
}


Just so you know, I couldn't get display_id or display_name to work when
using the Bio::Search::HSP::GenericHSP object.  Your results may vary.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Genevieve DeClerck
> Sent: Thursday, June 01, 2006 2:11 PM
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Problem solved, albeit, in a slightly hacky way.
> 
> I tried to make seek() work for a good long while with the SearchIO
> blast results object, but I just couldn't get it to work. (Probably b/c
> seek wants to see a genuine file handle-- not a SearchIO filehandle.) I
> used SearchIO's fh() to get the handle and could while(<$fh>) through
> the data but when I used seek($fh,0,0) to reset the cursor position in
> the handle in prep for another loop, i got an error complaining about my
> use of seek() by indicating that "SEEK" could not be found in Seekable.pm.
> 
> I concluded that it was not going to be possible and instead made an
> array if SeqFeature objects which contain all the relevant blast output
> data (i.e. the m8/hit table stuff).
> 
> It still seems unfortunate that one can't reuse the SearchIO object for
> cases when the SearchIO blast report needs to be accessed mltiple times.
> 
> Thanks for your help,
> Genevieve
> 
> 
> 
> Sendu Bala wrote:
> 
> > Genevieve DeClerck wrote:
> >
> >>Thanks for your comment Sendu, it was very helpful. I think this must be
> >>what's going on.. I am using $blast_report->next_result in both
> >>subroutines. It appears that analyzing the blast results first w/ my
> >>sort subroutine empties (?) the $blast_result object so that when I try
> >>to print, there is nothing left to print. (and visa-versa when I print
> >>first then try to sort).
> >>So, from the looks of things, using next_result has the effect of
> >>popping the Bio::Search::Result::ResultI objects off of the SearchIO
> >>blast report object??
> >
> >
> > Not quite. It's more or less exactly like opening a file and then trying
> > to read it all twice like this:
> > open(FILE, "file");
> > while (<FILE>) {
> >      print # prints each line in the file
> > }
> > while (<FILE>) {
> >      print # never happens, we never enter this while loop
> > }
> >
> > To get the second while loop to print anything we need to say seek(FILE,
> > 0, 0) before it. Or in the first while loop store each line in an array,
> > and then make the second loop a foreach through that array.
> >
> >
> >
> >>It seems I could get around this by making a copy of the blast report by
> >>setting it to another new variable...(not the most elegant solution) but
> >>I'm having trouble with this...
> >>
> >>If I do:
> >>
> >>    my $blast_report_copy = $blast_report;
> >>
> >>I'm just copying the reference to the SearchIO blast result, so it
> >>doesn't help me. How can I make another physical copy of this blast
> >>result object? Seems like a simple thing but how to do it is escaping
> me.
> >
> >
> > Not really a good idea, and it may not work anyway if the object
> > contains a filehandle. But for a simple object you might recursively
> > loop through the data structure and copy each element out into a similar
> > data structure.
> >
> >
> >
> >>But better yet, the way to go is to 'reset the counter,' or to find a
> >>way to look at/print/sort the results without removing data from the
> >>blast result object. How is this done though??
> >
> >
> > It would be rather nice if this worked:
> > my $blast_report = $factory->blastall($ref_seq_objs);
> > my $blast_fh = $blast_report->fh();
> > while (<$blast_fh>) {
> >      # $_ is a ResultI object, use as normal
> > }
> > seek($blast_fh, 0, 0); # this would be great, but does it work?
> > while <$blast_fh>) {
> >      # go through the results again in your second subroutine
> > }
> >
> > An alternative hacky way of doing it, which may also not work, would be
> > to go through your $blast_report as normal, but then before going
> > through it a second time, say
> > my $fh = $blast_report->_fh;
> > seek($fh, 0, 0);
> >
> > Finally, the most sensible way (assuming bioperl provides no methods of
> > its own for this) of solving the problem is, the first time you go
> > through each next_result, next_hit and next_hsp, just store the returned
> > objects in an array of arrays of arrays. Then the second time get the
> > objects from your array structure instead of with the method calls.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Thu Jun  1 21:49:30 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Jun 2006 16:49:30 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
References: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <447F60EA.1050608@campus.iztacala.unam.mx>

Hi Emmanuel,

Take a look into the BioPerl FAQ:

http://bioperl.org/wiki/FAQ

It contains some info that will guide you through the appropriate steps 
depending on your situation.

Regards,
Mauricio.

Emmanuel Quevillon wrote:
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Jun  1 21:47:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Jun 2006 16:47:11 -0500
Subject: [Bioperl-l] How to submit new module?
In-Reply-To: <447F58FE.7020603@versailles.inra.fr>
Message-ID: <001c01c685c4$f01e7550$15327e82@pyrimidine>

The Bioperl FAQ on the wiki answers this:

http://www.bioperl.org/wiki/FAQ#I.27ve_got_an_idea_for_a_module_how_do_I_con
tribute_it.3F

Basically, you've already done the first step, but you might want to
resubmit the email in a different form, with something about "New parsers
for TargetP, TandemRepeatFinder and RepeatMasker" in the Subject line to get
more input about those from the users-at-large.  

BTW, there is already a Bio::Tools::RepeatMasker, so you should check it out
to make sure there isn't any redundancy between your version and the
bioperl-live version.  The developers may be reluctant to replace the
bioperl-live version with yours to prevent API problems with end users,
unless you provide some serious justification (like the current one is
broken, not complete, etc).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Emmanuel Quevillon
> Sent: Thursday, June 01, 2006 4:16 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] How to submit new module?
> 
> Hi,
> 
> I just created some new parsers for TargetP, TandemRepeatFinder and
> RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just
> like to know the differents steps procedure to submit them to BioPerl
> and to be integrated in the next release (I hope)?
> Is there any documentation about it?
> 
> Thanks
> 
> --
> Emmanuel
> 
> ---------------------------------------------------------------------
> Emmanuel Quevillon      <emmanuel.quevillon _at versailles _inra _fr>
> 
> INRA-URGI / Bayer CropScience
> 523 Place des Terrasses             http://www.infobiogen.fr
> 91000 EVRY                          http://urgi.infobiogen.fr
> Tel : 01 60 87 37 42                http://www.bayercropscience.com
> 
> PGP public key server : http://pgp.mit.edu/
> Key ID : 0x0B84357F
> ---------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  2 07:52:07 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 2 Jun 2006 09:52:07 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <001201c685a3$59d78da0$15327e82@pyrimidine>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
Message-ID: <200606020952.08034.heikki@sanbi.ac.za>

I've started going through the files that have 'return undef' lines.
I'll report back later.

Initial impression is that there are a few cases where the context indicates 
list to be returned but failure returns an explicit undef. I'll fix those.

Most of the cases are much more ambiguous. Even when documentation says the 
failure returns undef, it is clearly meant to mean false. In most cases 
documentation does not comment on return value at all. Luckily the context is 
almost always scalar and therefore it does not matter too much.

I seem to be changing 'return undef' to plain 'return' a bit overzealously, so 
do not take it personally.

	-Heikki

On Thursday 01 June 2006 19:46, Chris Fields wrote:
> ....
>
> > > Again, didn't do that.
> >
> > I'm very sorry that I allowed the ambiguity, but my comments were
> > certainly not directed at your recent changes to Bio::Restriction::IO.
> > In fact, I put in the above * comment to exclude your changes from my
> > discussion; you changed the docs because the code never did what they
> > said they did (the docs were bad). That's fine (good!). My comments were
> > a general point, slightly directed at the idea of changing all the
> > return undef;s - changing the code so that it no longer matches the docs
> > of a previously working method. That's what I think is bad. Though in
> > this particular case it shouldn't make any difference at all.
>
> Agreed.  In any case, if tests have been properly set up then they should
> catch problems.  This is, of course, if they are properly set up.
>
> Chris
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Fri Jun  2 09:04:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 02 Jun 2006 10:04:18 +0100
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447F4F29.9070600@gmx.at>
References: <447F4F29.9070600@gmx.at>
Message-ID: <447FFF12.506@mrc-dunn.cam.ac.uk>

Hubert Prielinger wrote:
> hi,
> I have the following program and it worked quite well, for retrieving 
> remoteblast results in a textfile,
> now I have altered it to to xml, and it didn't work anymore.....
> it takes all the parameter at the commandline, submits the query, but I 
> don't retrieve any results file anymore.....
> 
> it seems that it hangs in a endless loop......
> the only output I get is:  $rc is not a ref! over and over..... it 
> doesn't enter the else term anymore....

There is no problem with your code. The problem is with the NCBI server 
and should be reported to them. You can visit the site and do a blast, 
requesting xml format, and you will typically get one normal 'waiting' 
message and the promise that it will be updated in x seconds, but 
subsequent attempts to get progress information result in an xml error 
page because the NCBI server doesn't actually send any data.

Unfortunately the way that the bioperl code is written, it treats no 
data as 'waiting' instead of an error. I've offered a patch to fix this 
at this bug page:
http://bugzilla.bioperl.org/show_bug.cgi?id=2015


From cjfields at uiuc.edu  Fri Jun  2 14:30:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 09:30:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <447FFF12.506@mrc-dunn.cam.ac.uk>
Message-ID: <001a01c68651$12925250$15327e82@pyrimidine>

Sendu, Hubert,


Hubert, your code looks fine so Sendu's patch should fix the problem (break
out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
works.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, June 02, 2006 4:04 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> Hubert Prielinger wrote:
> > hi,
> > I have the following program and it worked quite well, for retrieving
> > remoteblast results in a textfile,
> > now I have altered it to to xml, and it didn't work anymore.....
> > it takes all the parameter at the commandline, submits the query, but I
> > don't retrieve any results file anymore.....
> >
> > it seems that it hangs in a endless loop......
> > the only output I get is:  $rc is not a ref! over and over..... it
> > doesn't enter the else term anymore....
> 
> There is no problem with your code. The problem is with the NCBI server
> and should be reported to them. You can visit the site and do a blast,
> requesting xml format, and you will typically get one normal 'waiting'
> message and the promise that it will be updated in x seconds, but
> subsequent attempts to get progress information result in an xml error
> page because the NCBI server doesn't actually send any data.
> 
> Unfortunately the way that the bioperl code is written, it treats no
> data as 'waiting' instead of an error. I've offered a patch to fix this
> at this bug page:
> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  2 19:13:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 14:13:31 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
Message-ID: <000301c68678$a3cdaa40$15327e82@pyrimidine>

Heikki,

I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
when running AlignIO.t (I was fixing bug 2000):

http://bugzilla.open-bio.org/show_bug.cgi?id=2016

Not sure what's going on there but using read_aln and write_aln seem to work
normally.  It may have something to do with Bio::SimpleAlign but I'm not
absolutely sure.

Any ideas what may be going on here?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Fri Jun  2 21:11:41 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 15:11:41 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <001a01c68651$12925250$15327e82@pyrimidine>
References: <001a01c68651$12925250$15327e82@pyrimidine>
Message-ID: <4480A98D.6010501@gmx.at>

hi,
sorry, but I have updated the remoteblast module and I have run several 
attempts with the same results as before. It didn't work.
I didn't get any results.

regards
Hubert


Chris Fields wrote:
> Sendu, Hubert,
>
>
> Hubert, your code looks fine so Sendu's patch should fix the problem (break
> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in CVS;
> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if it
> works.  
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Friday, June 02, 2006 4:04 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> Hubert Prielinger wrote:
>>     
>>> hi,
>>> I have the following program and it worked quite well, for retrieving
>>> remoteblast results in a textfile,
>>> now I have altered it to to xml, and it didn't work anymore.....
>>> it takes all the parameter at the commandline, submits the query, but I
>>> don't retrieve any results file anymore.....
>>>
>>> it seems that it hangs in a endless loop......
>>> the only output I get is:  $rc is not a ref! over and over..... it
>>> doesn't enter the else term anymore....
>>>       
>> There is no problem with your code. The problem is with the NCBI server
>> and should be reported to them. You can visit the site and do a blast,
>> requesting xml format, and you will typically get one normal 'waiting'
>> message and the promise that it will be updated in x seconds, but
>> subsequent attempts to get progress information result in an xml error
>> page because the NCBI server doesn't actually send any data.
>>
>> Unfortunately the way that the bioperl code is written, it treats no
>> data as 'waiting' instead of an error. I've offered a patch to fix this
>> at this bug page:
>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri Jun  2 21:54:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 16:54:20 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480A98D.6010501@gmx.at>
Message-ID: <000001c6868f$1b68dbe0$15327e82@pyrimidine>

Hubert, 

Could you post this on bugzilla with your script and test data so I can try
to replicate you error?  I may not get to it until Monday.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, June 02, 2006 4:12 PM
> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> sorry, but I have updated the remoteblast module and I have run several
> attempts with the same results as before. It didn't work.
> I didn't get any results.
> 
> regards
> Hubert
> 
> 
> Chris Fields wrote:
> > Sendu, Hubert,
> >
> >
> > Hubert, your code looks fine so Sendu's patch should fix the problem
> (break
> > out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
> CVS;
> > it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
> it
> > works.
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >> Sent: Friday, June 02, 2006 4:04 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>
> >> Hubert Prielinger wrote:
> >>
> >>> hi,
> >>> I have the following program and it worked quite well, for retrieving
> >>> remoteblast results in a textfile,
> >>> now I have altered it to to xml, and it didn't work anymore.....
> >>> it takes all the parameter at the commandline, submits the query, but
> I
> >>> don't retrieve any results file anymore.....
> >>>
> >>> it seems that it hangs in a endless loop......
> >>> the only output I get is:  $rc is not a ref! over and over..... it
> >>> doesn't enter the else term anymore....
> >>>
> >> There is no problem with your code. The problem is with the NCBI server
> >> and should be reported to them. You can visit the site and do a blast,
> >> requesting xml format, and you will typically get one normal 'waiting'
> >> message and the promise that it will be updated in x seconds, but
> >> subsequent attempts to get progress information result in an xml error
> >> page because the NCBI server doesn't actually send any data.
> >>
> >> Unfortunately the way that the bioperl code is written, it treats no
> >> data as 'waiting' instead of an error. I've offered a patch to fix this
> >> at this bug page:
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri Jun  2 23:19:40 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 17:19:40 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <000001c68691$8c4eeb40$15327e82@pyrimidine>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
Message-ID: <4480C78C.1000701@gmx.at>

hi,
I have submitted the bug -> Bug 2017
with the script and input file, just start it from command line

thank you very much
greetings

Hubert

Chris Fields wrote:
> Hubert,
>
> I have a script that's using blastxml and XML output which seems to work.
> I'll try looking at it to get a better idea this weekend.
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, June 02, 2006 4:12 PM
>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> sorry, but I have updated the remoteblast module and I have run several
>> attempts with the same results as before. It didn't work.
>> I didn't get any results.
>>
>> regards
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Sendu, Hubert,
>>>
>>>
>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>       
>> (break
>>     
>>> out of that infinite loop).  I applied Sendu's patch to RemoteBlast in
>>>       
>> CVS;
>>     
>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to see if
>>>       
>> it
>>     
>>> works.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> Hubert Prielinger wrote:
>>>>
>>>>         
>>>>> hi,
>>>>> I have the following program and it worked quite well, for retrieving
>>>>> remoteblast results in a textfile,
>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>> it takes all the parameter at the commandline, submits the query, but
>>>>>           
>> I
>>     
>>>>> don't retrieve any results file anymore.....
>>>>>
>>>>> it seems that it hangs in a endless loop......
>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>> doesn't enter the else term anymore....
>>>>>
>>>>>           
>>>> There is no problem with your code. The problem is with the NCBI server
>>>> and should be reported to them. You can visit the site and do a blast,
>>>> requesting xml format, and you will typically get one normal 'waiting'
>>>> message and the promise that it will be updated in x seconds, but
>>>> subsequent attempts to get progress information result in an xml error
>>>> page because the NCBI server doesn't actually send any data.
>>>>
>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>> data as 'waiting' instead of an error. I've offered a patch to fix this
>>>> at this bug page:
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at uiuc.edu  Sat Jun  3 00:33:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:33:48 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480C78C.1000701@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
Message-ID: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>

You need to add the input conditions as well (you have several  
<STDIN> lines which may play a role; I would like to know what you  
normally enter for those).

How long did you let the script run?  I ran a quick check on your  
sequences; you have almost 1600, so you have to expect that you'll  
run into some problems here!  Most here (including me) would suggest  
you try installing a local blast setup for something like this.

Chris

On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:

> hi,
> I have submitted the bug -> Bug 2017
> with the script and input file, just start it from command line
>
> thank you very much
> greetings
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> I have a script that's using blastxml and XML output which seems  
>> to work.
>> I'll try looking at it to get a better idea this weekend.
>>
>> Chris
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>> Sent: Friday, June 02, 2006 4:12 PM
>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>
>>> hi,
>>> sorry, but I have updated the remoteblast module and I have run  
>>> several
>>> attempts with the same results as before. It didn't work.
>>> I didn't get any results.
>>>
>>> regards
>>> Hubert
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> Sendu, Hubert,
>>>>
>>>>
>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>> problem
>>>>
>>> (break
>>>
>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>> RemoteBlast in
>>>>
>>> CVS;
>>>
>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to  
>>>> see if
>>>>
>>> it
>>>
>>>> works.
>>>>
>>>> Chris
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi,
>>>>>> I have the following program and it worked quite well, for  
>>>>>> retrieving
>>>>>> remoteblast results in a textfile,
>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>> query, but
>>>>>>
>>> I
>>>
>>>>>> don't retrieve any results file anymore.....
>>>>>>
>>>>>> it seems that it hangs in a endless loop......
>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>> over..... it
>>>>>> doesn't enter the else term anymore....
>>>>>>
>>>>>>
>>>>> There is no problem with your code. The problem is with the  
>>>>> NCBI server
>>>>> and should be reported to them. You can visit the site and do a  
>>>>> blast,
>>>>> requesting xml format, and you will typically get one normal  
>>>>> 'waiting'
>>>>> message and the promise that it will be updated in x seconds, but
>>>>> subsequent attempts to get progress information result in an  
>>>>> xml error
>>>>> page because the NCBI server doesn't actually send any data.
>>>>>
>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>> treats no
>>>>> data as 'waiting' instead of an error. I've offered a patch to  
>>>>> fix this
>>>>> at this bug page:
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Sat Jun  3 00:49:15 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 18:49:15 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
Message-ID: <4480DC8B.7070005@gmx.at>

hi,
input database: swissprot
         matrix: pam30
         count: 1
         gapcosts: 9 1

I know that there are  a lot of sequences, but that doesn't matter, you 
can delete all of them except one, the amount of the sequences is not 
the problem, the script reads one line and submits it.....then the 
second line and so on.....I have tried it with only one sequence either 
and I got the same result.... the script run at that time for more than 
20 minutes!!!!!! .....and that should be enough time to retrieve the 
results for ONE sequence, I guess

regards
Hubert


Chris Fields wrote:
> You need to add the input conditions as well (you have several <STDIN> 
> lines which may play a role; I would like to know what you normally 
> enter for those).
>
> How long did you let the script run?  I ran a quick check on your 
> sequences; you have almost 1600, so you have to expect that you'll run 
> into some problems here!  Most here (including me) would suggest you 
> try installing a local blast setup for something like this.
>
> Chris
>
> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>
>> hi,
>> I have submitted the bug -> Bug 2017
>> with the script and input file, just start it from command line
>>
>> thank you very much
>> greetings
>>
>> Hubert
>>
>> Chris Fields wrote:
>>> Hubert,
>>>
>>> I have a script that's using blastxml and XML output which seems to 
>>> work.
>>> I'll try looking at it to get a better idea this weekend.
>>>
>>> Chris
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala'
>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>
>>>> hi,
>>>> sorry, but I have updated the remoteblast module and I have run 
>>>> several
>>>> attempts with the same results as before. It didn't work.
>>>> I didn't get any results.
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>> Sendu, Hubert,
>>>>>
>>>>>
>>>>> Hubert, your code looks fine so Sendu's patch should fix the problem
>>>>>
>>>> (break
>>>>
>>>>> out of that infinite loop).  I applied Sendu's patch to 
>>>>> RemoteBlast in
>>>>>
>>>> CVS;
>>>>
>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS to 
>>>>> see if
>>>>>
>>>> it
>>>>
>>>>> works.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> Hubert Prielinger wrote:
>>>>>>
>>>>>>
>>>>>>> hi,
>>>>>>> I have the following program and it worked quite well, for 
>>>>>>> retrieving
>>>>>>> remoteblast results in a textfile,
>>>>>>> now I have altered it to to xml, and it didn't work anymore.....
>>>>>>> it takes all the parameter at the commandline, submits the 
>>>>>>> query, but
>>>>>>>
>>>> I
>>>>
>>>>>>> don't retrieve any results file anymore.....
>>>>>>>
>>>>>>> it seems that it hangs in a endless loop......
>>>>>>> the only output I get is:  $rc is not a ref! over and over..... it
>>>>>>> doesn't enter the else term anymore....
>>>>>>>
>>>>>>>
>>>>>> There is no problem with your code. The problem is with the NCBI 
>>>>>> server
>>>>>> and should be reported to them. You can visit the site and do a 
>>>>>> blast,
>>>>>> requesting xml format, and you will typically get one normal 
>>>>>> 'waiting'
>>>>>> message and the promise that it will be updated in x seconds, but
>>>>>> subsequent attempts to get progress information result in an xml 
>>>>>> error
>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>
>>>>>> Unfortunately the way that the bioperl code is written, it treats no
>>>>>> data as 'waiting' instead of an error. I've offered a patch to 
>>>>>> fix this
>>>>>> at this bug page:
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Sat Jun  3 00:57:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 19:57:37 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480DC8B.7070005@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
Message-ID: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>

Yes, I see the same error you do.  But I have a similar script  
(blastp, XML blast report, XML parsing, similar loop structure) that  
works fine.  I'm trying to dissect the problem but I think it may be  
something logically wrong here (something not so obvious) and not a  
bug...

What I'm trying to say is, when you send sequences using remoteblast  
like, this you are essentially spamming the NCBI BLAST server with  
~1600 requests.  This script wasn't set up with that intent in mind;  
you should really try to set up your own local blast database if  
possible.  If you can't, try running this script in off-hours  
(10pm-6am EST or something like that).


Chris

On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:

> hi,
> input database: swissprot
>         matrix: pam30
>         count: 1
>         gapcosts: 9 1
>
> I know that there are  a lot of sequences, but that doesn't matter,  
> you can delete all of them except one, the amount of the sequences  
> is not the problem, the script reads one line and submits  
> it.....then the second line and so on.....I have tried it with only  
> one sequence either and I got the same result.... the script run at  
> that time for more than 20 minutes!!!!!! .....and that should be  
> enough time to retrieve the results for ONE sequence, I guess
>
> regards
> Hubert
>
>
>
> Chris Fields wrote:
>> You need to add the input conditions as well (you have several  
>> <STDIN> lines which may play a role; I would like to know what you  
>> normally enter for those).
>>
>> How long did you let the script run?  I ran a quick check on your  
>> sequences; you have almost 1600, so you have to expect that you'll  
>> run into some problems here!  Most here (including me) would  
>> suggest you try installing a local blast setup for something like  
>> this.
>>
>> Chris
>>
>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I have submitted the bug -> Bug 2017
>>> with the script and input file, just start it from command line
>>>
>>> thank you very much
>>> greetings
>>>
>>> Hubert
>>>
>>> Chris Fields wrote:
>>>> Hubert,
>>>>
>>>> I have a script that's using blastxml and XML output which seems  
>>>> to work.
>>>> I'll try looking at it to get a better idea this weekend.
>>>>
>>>> Chris
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>> Bala'
>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>
>>>>> hi,
>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>> several
>>>>> attempts with the same results as before. It didn't work.
>>>>> I didn't get any results.
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Sendu, Hubert,
>>>>>>
>>>>>>
>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>> problem
>>>>>>
>>>>> (break
>>>>>
>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>> RemoteBlast in
>>>>>>
>>>>> CVS;
>>>>>
>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>> to see if
>>>>>>
>>>>> it
>>>>>
>>>>>> works.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>> hi,
>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>> retrieving
>>>>>>>> remoteblast results in a textfile,
>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>> anymore.....
>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>> query, but
>>>>>>>>
>>>>> I
>>>>>
>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>
>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>> over..... it
>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>
>>>>>>>>
>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>> NCBI server
>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>> a blast,
>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>> 'waiting'
>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>> but
>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>> xml error
>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>
>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>> treats no
>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>> to fix this
>>>>>>> at this bug page:
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hubert.prielinger at gmx.at  Sat Jun  3 01:36:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 02 Jun 2006 19:36:42 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
Message-ID: <4480E7AA.3020603@gmx.at>

hi chris,
thanks but I never intended to run the remoteblast with so much, only a 
few of them, acutally I goal is to run the phiblast with regular 
expression, so that i just don't need that
file anymore.

another question for parsing the xml output....is there a xml parser 
available for blast xml output or how to start.....
I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
I'm not sure how to start....sorry, I guess I'm too stupid....
is their maybe another introduction or an example.

thanks
Hubert


Chris Fields wrote:
> Yes, I see the same error you do.  But I have a similar script  
> (blastp, XML blast report, XML parsing, similar loop structure) that  
> works fine.  I'm trying to dissect the problem but I think it may be  
> something logically wrong here (something not so obvious) and not a  
> bug...
>
> What I'm trying to say is, when you send sequences using remoteblast  
> like, this you are essentially spamming the NCBI BLAST server with  
> ~1600 requests.  This script wasn't set up with that intent in mind;  
> you should really try to set up your own local blast database if  
> possible.  If you can't, try running this script in off-hours  
> (10pm-6am EST or something like that).
>
>
> Chris
>
> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>
>   
>> hi,
>> input database: swissprot
>>         matrix: pam30
>>         count: 1
>>         gapcosts: 9 1
>>
>> I know that there are  a lot of sequences, but that doesn't matter,  
>> you can delete all of them except one, the amount of the sequences  
>> is not the problem, the script reads one line and submits  
>> it.....then the second line and so on.....I have tried it with only  
>> one sequence either and I got the same result.... the script run at  
>> that time for more than 20 minutes!!!!!! .....and that should be  
>> enough time to retrieve the results for ONE sequence, I guess
>>
>> regards
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> You need to add the input conditions as well (you have several  
>>> <STDIN> lines which may play a role; I would like to know what you  
>>> normally enter for those).
>>>
>>> How long did you let the script run?  I ran a quick check on your  
>>> sequences; you have almost 1600, so you have to expect that you'll  
>>> run into some problems here!  Most here (including me) would  
>>> suggest you try installing a local blast setup for something like  
>>> this.
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>
>>>       
>>>> hi,
>>>> I have submitted the bug -> Bug 2017
>>>> with the script and input file, just start it from command line
>>>>
>>>> thank you very much
>>>> greetings
>>>>
>>>> Hubert
>>>>
>>>> Chris Fields wrote:
>>>>         
>>>>> Hubert,
>>>>>
>>>>> I have a script that's using blastxml and XML output which seems  
>>>>> to work.
>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu  
>>>>>> Bala'
>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>
>>>>>> hi,
>>>>>> sorry, but I have updated the remoteblast module and I have run  
>>>>>> several
>>>>>> attempts with the same results as before. It didn't work.
>>>>>> I didn't get any results.
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Sendu, Hubert,
>>>>>>>
>>>>>>>
>>>>>>> Hubert, your code looks fine so Sendu's patch should fix the  
>>>>>>> problem
>>>>>>>
>>>>>>>               
>>>>>> (break
>>>>>>
>>>>>>             
>>>>>>> out of that infinite loop).  I applied Sendu's patch to  
>>>>>>> RemoteBlast in
>>>>>>>
>>>>>>>               
>>>>>> CVS;
>>>>>>
>>>>>>             
>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from CVS  
>>>>>>> to see if
>>>>>>>
>>>>>>>               
>>>>>> it
>>>>>>
>>>>>>             
>>>>>>> works.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> hi,
>>>>>>>>> I have the following program and it worked quite well, for  
>>>>>>>>> retrieving
>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>> now I have altered it to to xml, and it didn't work  
>>>>>>>>> anymore.....
>>>>>>>>> it takes all the parameter at the commandline, submits the  
>>>>>>>>> query, but
>>>>>>>>>
>>>>>>>>>                   
>>>>>> I
>>>>>>
>>>>>>             
>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>
>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>> the only output I get is:  $rc is not a ref! over and  
>>>>>>>>> over..... it
>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> There is no problem with your code. The problem is with the  
>>>>>>>> NCBI server
>>>>>>>> and should be reported to them. You can visit the site and do  
>>>>>>>> a blast,
>>>>>>>> requesting xml format, and you will typically get one normal  
>>>>>>>> 'waiting'
>>>>>>>> message and the promise that it will be updated in x seconds,  
>>>>>>>> but
>>>>>>>> subsequent attempts to get progress information result in an  
>>>>>>>> xml error
>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>
>>>>>>>> Unfortunately the way that the bioperl code is written, it  
>>>>>>>> treats no
>>>>>>>> data as 'waiting' instead of an error. I've offered a patch  
>>>>>>>> to fix this
>>>>>>>> at this bug page:
>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Sat Jun  3 04:35:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Jun 2006 23:35:21 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>


On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:

> hi chris,
> thanks but I never intended to run the remoteblast with so much,  
> only a few of them, acutally I goal is to run the phiblast with  
> regular expression, so that i just don't need that
> file anymore

Not a problem.  Just to let you know, I did manage to get the script  
working, so I'm marking the bug INVALID.  I think the problem isn't  
that there is an infinite loop so much as setting composition-based  
statistics causes the search to take much much longer; try removing  
that line to see what I mean.

Just so you know, using $result->query_name doesn't get you what you  
would expect (it gives you a part of the RID, which you don't want;  
this is something in the XML output that is beyond our control).  You  
might want to change it to something else or you'll get filenames  
with numerical names.

> another question for parsing the xml output....is there a xml  
> parser available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
> but I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

Bio::SearchIO objects are used to parse BLAST XML output if you have  
it saved to a file.  For instance:

my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');

while (my $result = $factory->next_result) {
   while (my $hit = $result->next_hit) {
      while (my $hsp = $hit->next_hsp {
         #do stuff here
       }
    }
}

The only thing that changes in parsing a text BLAST report from an  
XML BLAST report is the -format line (similar to the -readmethod  
parameter in RemoteBlast).  You shouldn't need to look up any more  
documentation other than these on the wiki:

http://www.bioperl.org/wiki/HOWTO:SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml

Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
up parsing.

Chris

> thanks
> Hubert
>
>
> Chris Fields wrote:
>> Yes, I see the same error you do.  But I have a similar script   
>> (blastp, XML blast report, XML parsing, similar loop structure)  
>> that  works fine.  I'm trying to dissect the problem but I think  
>> it may be  something logically wrong here (something not so  
>> obvious) and not a  bug...
>>
>> What I'm trying to say is, when you send sequences using  
>> remoteblast  like, this you are essentially spamming the NCBI  
>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>> that intent in mind;  you should really try to set up your own  
>> local blast database if  possible.  If you can't, try running this  
>> script in off-hours  (10pm-6am EST or something like that).
>>
>>
>> Chris
>>
>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>
>>
>>> hi,
>>> input database: swissprot
>>>         matrix: pam30
>>>         count: 1
>>>         gapcosts: 9 1
>>>
>>> I know that there are  a lot of sequences, but that doesn't  
>>> matter,  you can delete all of them except one, the amount of the  
>>> sequences  is not the problem, the script reads one line and  
>>> submits  it.....then the second line and so on.....I have tried  
>>> it with only  one sequence either and I got the same result....  
>>> the script run at  that time for more than 20  
>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>> the results for ONE sequence, I guess
>>>
>>> regards
>>> Hubert
>>>
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> You need to add the input conditions as well (you have several   
>>>> <STDIN> lines which may play a role; I would like to know what  
>>>> you  normally enter for those).
>>>>
>>>> How long did you let the script run?  I ran a quick check on  
>>>> your  sequences; you have almost 1600, so you have to expect  
>>>> that you'll  run into some problems here!  Most here (including  
>>>> me) would  suggest you try installing a local blast setup for  
>>>> something like  this.
>>>>
>>>> Chris
>>>>
>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi,
>>>>> I have submitted the bug -> Bug 2017
>>>>> with the script and input file, just start it from command line
>>>>>
>>>>> thank you very much
>>>>> greetings
>>>>>
>>>>> Hubert
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Hubert,
>>>>>>
>>>>>> I have a script that's using blastxml and XML output which  
>>>>>> seems  to work.
>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>> 'Sendu  Bala'
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> hi,
>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>> run  several
>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>> I didn't get any results.
>>>>>>>
>>>>>>> regards
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Sendu, Hubert,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>> the  problem
>>>>>>>>
>>>>>>>>
>>>>>>> (break
>>>>>>>
>>>>>>>
>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>> RemoteBlast in
>>>>>>>>
>>>>>>>>
>>>>>>> CVS;
>>>>>>>
>>>>>>>
>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>> CVS  to see if
>>>>>>>>
>>>>>>>>
>>>>>>> it
>>>>>>>
>>>>>>>
>>>>>>>> works.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>
>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>> for  retrieving
>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>> anymore.....
>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>> the  query, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>
>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>
>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>> over..... it
>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>> the  NCBI server
>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>> do  a blast,
>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>> normal  'waiting'
>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>> seconds,  but
>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>> an  xml error
>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>
>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>> treats no
>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>> patch  to fix this
>>>>>>>>> at this bug page:
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 15:10:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:10:51 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149084373.447da2d5c5339@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
	<1149084373.447da2d5c5339@128.91.55.38>
Message-ID: <9206E0B2-15DC-4AB2-B71B-5EA9D1D11AEC@duke.edu>

The bootstrap is stored as the node ID because that is a limitation  
of the newick format, there isn't a formal way to distinguish  
internal IDs from bootstraps.  There are several differents ways that  
programs encode the internal ID and a bootstrap value in that one  
slot - we try and parse it out if the the bootstrap is stored in  
brackets like INTERNALID[BOOTSTRAP].

Formats like nhx explicitly solve this problem, but most programs  
only use the simple newick.  if you know your data it is a simple  
procedure to move the internal ID data into the bootstrap slot.

in terms of ignoreoverwrite you just need to send in a second  
parameter which is true
$node->add_Descendent($childnode, 1);

-jason


On May 31, 2006, at 10:06 AM, Lucia Peixoto wrote:

> Hi
> Thanks
> a couple more questions
> why is the bootstrap value stored as the node id? Is that right?
>
> also, in the add_descendant method, how do you set the  
> $ignoreoverwrite
> parameter to true?
>
> Lucia
>
> Quoting Jason Stajich <jason.stajich at duke.edu>:
>
>> you need to special case the root - it won't have an ancestor.  just
>> protect the my $parent = $node->ancestor with an if statement as I
>> did below
>>
>> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>>
>>> Hi
>>> OK that was silly, but what I have in my code is what you just wrote
>>> But the problem is that if I write
>>>
>>> $parent->add_Descendent($child)
>>>
>>> it tells me that I am calling  the method "ass_Descendent" on an
>>> undefined value
>>> (but I did define $parent before??)
>>>
>>> So here it goes the code so far:
>>>
>>> use Bio::TreeIO;
>>>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>>>                           -format => 'newick');
>>>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>>>                            -format => 'newick');
>>>  while( my $tree = $in->next_tree ) {
>>>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes 
>>> () ) {
>>>     my $bootstrap=$node->_creation_id;
>>>
>>>     if ($bootstrap < 70 ){
>>>>>> if(        my $parent = $node->ancestor ) {
>>>               my @children=$node->get_all_Descendents;
>>>               foreach my $child (@children){
>>>                  $parent->add_Descendent($child);
>>>               }
>>          }
>>>
>>> ........
>>>
>>> eventually I'll add (once I assigned the children to the parent
>>> succesfully):
>>> $tree->remove_Node($node);
>>>
>>>         }
>>>     }
>>>     $out->write_tree($tree);
>>> }
>>>
>>> Quoting aaron.j.mackey at gsk.com:
>>>
>>>>> foreach $child (@children){
>>>>>          $parent=add_Descendent->$child;
>>>>> }
>>>>
>>>> I think what you want is $parent->add_Descendent($child)
>>>>
>>>> -Aaron
>>>>
>>>
>>>
>>> Lucia Peixoto
>>> Department of Biology,SAS
>>> University of Pennsylvania
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sat Jun  3 15:29:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 11:29:31 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>

you can get all the Hits or hsps with the following method:
my @hits = $result->hits;
my @hsps = $hit->hsps;


You can also reset the counter since these implementations are in- 
memory and already parsed (and not a stream processor per se).   
next_XX just iterates through the list stored in the parent object.

$result->rewind;

   and

$hit->rewind;


For example, the rewind needs to be called if you want to use a  
ResultWriter object and filter some of the values for the final  
writing after first inspecting them.

-jason


On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:

> Thanks for your comment Sendu, it was very helpful. I think this  
> must be
> what's going on.. I am using $blast_report->next_result in both
> subroutines. It appears that analyzing the blast results first w/ my
> sort subroutine empties (?) the $blast_result object so that when I  
> try
> to print, there is nothing left to print. (and visa-versa when I print
> first then try to sort).
> So, from the looks of things, using next_result has the effect of
> popping the Bio::Search::Result::ResultI objects off of the SearchIO
> blast report object??
>
> It seems I could get around this by making a copy of the blast  
> report by
> setting it to another new variable...(not the most elegant  
> solution) but
> I'm having trouble with this...
>
> If I do:
>
> 	my $blast_report_copy = $blast_report;
>
> I'm just copying the reference to the SearchIO blast result, so it
> doesn't help me. How can I make another physical copy of this blast
> result object? Seems like a simple thing but how to do it is  
> escaping me.
>
> But better yet, the way to go is to 'reset the counter,' or to find a
> way to look at/print/sort the results without removing data from the
> blast result object. How is this done though??
>
> Sendu and Brian, I didn't post the sort_results subroutine because  
> it is
> sprawling, as is a lot of my code. The code I provided was more  
> like an
> aid for my explanation of the problem.. it doesn't actually run -  
> sorry
> for the confusion, I should have more clear on that.  The important
> thing to know perhaps is that both sort_results and  
> print_blast_results
> contain a foreach loop where I am using the 'next_results' method to
> view blast results. (And to clarify for Torsten, the blastall() is
> working just fine - the analysis/viewing of the results object is  
> where
> I am encountering the problem.)
>
>
> Any other ideas would be greatly appreciated...
>
> Thank you,
> Genevieve
>
>
>
>
> Sendu Bala wrote:
>
>> Genevieve DeClerck wrote:
>>
>>> Hi,
>>
>> [snip]
>>
>>> If I've sorted the results the sorted-results will print to screen,
>>> however when I try to print the Hit Table results nothing is  
>>> returned,
>>> as if the blast results have evaporated.... and visa versa, if i
>>> comment out the part where i point my sorting subroutine to the  
>>> blast
>>> results reference,  my hit table results suddenly prints to screen.
>>
>> [snip]
>>
>>> Here's an abbreviated version of my code:
>>
>> [snip]
>>
>>> #######
>>> ### the following 2 actions seem to be mutually exclusive.
>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>> # SeqFeature objs stored in arrays. arrays are then printed
>>> # to stdout
>>> &sort_results($blast_report);
>>>
>>> # 2) print blast results
>>> &print_blast_results($blast_report);
>>
>>
>>> sub print_blast_results{
>>>    my $report = shift;
>>>    while(my $result = $report->next_result()){
>>
>> [snip]
>>
>> You didn't give us your sort_results subroutine, but is it as  
>> simple as
>> they both use $report->next_result (and/or $result->next_hit), but  
>> you
>> don't reset the internal counter back to the start, so the second
>> subroutine tries to get the next_result and finds the first  
>> subroutine
>> has already looked at the last result and so next_result returns  
>> false?
>>
>>  From a quick look it wasn't obvious how to reset the counter.  
>> Hopefully
>> this can be done and someone else knows how.
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun  3 19:13:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 3 Jun 2006 14:13:22 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
Message-ID: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>

Nice!  Didn't know I could do that.  Maybe we should add some of this  
to the HOWTO (or is it already in there?).

Chris

On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:

> you can get all the Hits or hsps with the following method:
> my @hits = $result->hits;
> my @hsps = $hit->hsps;
>
>
> You can also reset the counter since these implementations are in-
> memory and already parsed (and not a stream processor per se).
> next_XX just iterates through the list stored in the parent object.
>
> $result->rewind;
>
>    and
>
> $hit->rewind;
>
>
> For example, the rewind needs to be called if you want to use a
> ResultWriter object and filter some of the values for the final
> writing after first inspecting them.
>
> -jason
>
>
> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>
>> Thanks for your comment Sendu, it was very helpful. I think this
>> must be
>> what's going on.. I am using $blast_report->next_result in both
>> subroutines. It appears that analyzing the blast results first w/ my
>> sort subroutine empties (?) the $blast_result object so that when I
>> try
>> to print, there is nothing left to print. (and visa-versa when I  
>> print
>> first then try to sort).
>> So, from the looks of things, using next_result has the effect of
>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>> blast report object??
>>
>> It seems I could get around this by making a copy of the blast
>> report by
>> setting it to another new variable...(not the most elegant
>> solution) but
>> I'm having trouble with this...
>>
>> If I do:
>>
>> 	my $blast_report_copy = $blast_report;
>>
>> I'm just copying the reference to the SearchIO blast result, so it
>> doesn't help me. How can I make another physical copy of this blast
>> result object? Seems like a simple thing but how to do it is
>> escaping me.
>>
>> But better yet, the way to go is to 'reset the counter,' or to find a
>> way to look at/print/sort the results without removing data from the
>> blast result object. How is this done though??
>>
>> Sendu and Brian, I didn't post the sort_results subroutine because
>> it is
>> sprawling, as is a lot of my code. The code I provided was more
>> like an
>> aid for my explanation of the problem.. it doesn't actually run -
>> sorry
>> for the confusion, I should have more clear on that.  The important
>> thing to know perhaps is that both sort_results and
>> print_blast_results
>> contain a foreach loop where I am using the 'next_results' method to
>> view blast results. (And to clarify for Torsten, the blastall() is
>> working just fine - the analysis/viewing of the results object is
>> where
>> I am encountering the problem.)
>>
>>
>> Any other ideas would be greatly appreciated...
>>
>> Thank you,
>> Genevieve
>>
>>
>>
>>
>> Sendu Bala wrote:
>>
>>> Genevieve DeClerck wrote:
>>>
>>>> Hi,
>>>
>>> [snip]
>>>
>>>> If I've sorted the results the sorted-results will print to screen,
>>>> however when I try to print the Hit Table results nothing is
>>>> returned,
>>>> as if the blast results have evaporated.... and visa versa, if i
>>>> comment out the part where i point my sorting subroutine to the
>>>> blast
>>>> results reference,  my hit table results suddenly prints to screen.
>>>
>>> [snip]
>>>
>>>> Here's an abbreviated version of my code:
>>>
>>> [snip]
>>>
>>>> #######
>>>> ### the following 2 actions seem to be mutually exclusive.
>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>> # to stdout
>>>> &sort_results($blast_report);
>>>>
>>>> # 2) print blast results
>>>> &print_blast_results($blast_report);
>>>
>>>
>>>> sub print_blast_results{
>>>>    my $report = shift;
>>>>    while(my $result = $report->next_result()){
>>>
>>> [snip]
>>>
>>> You didn't give us your sort_results subroutine, but is it as
>>> simple as
>>> they both use $report->next_result (and/or $result->next_hit), but
>>> you
>>> don't reset the internal counter back to the start, so the second
>>> subroutine tries to get the next_result and finds the first
>>> subroutine
>>> has already looked at the last result and so next_result returns
>>> false?
>>>
>>>  From a quick look it wasn't obvious how to reset the counter.
>>> Hopefully
>>> this can be done and someone else knows how.
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sat Jun  3 19:31:59 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 3 Jun 2006 15:31:59 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
Message-ID: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>

In the HOWTO hits() and hsps() were there, I just added rewind in the  
table of methods.
If someone wanted to write a little section in the HOWTO about  
resetting the iterator that would be great.

-jason
On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:

> Nice!  Didn't know I could do that.  Maybe we should add some of this
> to the HOWTO (or is it already in there?).
>
> Chris
>
> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>
>> you can get all the Hits or hsps with the following method:
>> my @hits = $result->hits;
>> my @hsps = $hit->hsps;
>>
>>
>> You can also reset the counter since these implementations are in-
>> memory and already parsed (and not a stream processor per se).
>> next_XX just iterates through the list stored in the parent object.
>>
>> $result->rewind;
>>
>>    and
>>
>> $hit->rewind;
>>
>>
>> For example, the rewind needs to be called if you want to use a
>> ResultWriter object and filter some of the values for the final
>> writing after first inspecting them.
>>
>> -jason
>>
>>
>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>
>>> Thanks for your comment Sendu, it was very helpful. I think this
>>> must be
>>> what's going on.. I am using $blast_report->next_result in both
>>> subroutines. It appears that analyzing the blast results first w/ my
>>> sort subroutine empties (?) the $blast_result object so that when I
>>> try
>>> to print, there is nothing left to print. (and visa-versa when I
>>> print
>>> first then try to sort).
>>> So, from the looks of things, using next_result has the effect of
>>> popping the Bio::Search::Result::ResultI objects off of the SearchIO
>>> blast report object??
>>>
>>> It seems I could get around this by making a copy of the blast
>>> report by
>>> setting it to another new variable...(not the most elegant
>>> solution) but
>>> I'm having trouble with this...
>>>
>>> If I do:
>>>
>>> 	my $blast_report_copy = $blast_report;
>>>
>>> I'm just copying the reference to the SearchIO blast result, so it
>>> doesn't help me. How can I make another physical copy of this blast
>>> result object? Seems like a simple thing but how to do it is
>>> escaping me.
>>>
>>> But better yet, the way to go is to 'reset the counter,' or to  
>>> find a
>>> way to look at/print/sort the results without removing data from the
>>> blast result object. How is this done though??
>>>
>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>> it is
>>> sprawling, as is a lot of my code. The code I provided was more
>>> like an
>>> aid for my explanation of the problem.. it doesn't actually run -
>>> sorry
>>> for the confusion, I should have more clear on that.  The important
>>> thing to know perhaps is that both sort_results and
>>> print_blast_results
>>> contain a foreach loop where I am using the 'next_results' method to
>>> view blast results. (And to clarify for Torsten, the blastall() is
>>> working just fine - the analysis/viewing of the results object is
>>> where
>>> I am encountering the problem.)
>>>
>>>
>>> Any other ideas would be greatly appreciated...
>>>
>>> Thank you,
>>> Genevieve
>>>
>>>
>>>
>>>
>>> Sendu Bala wrote:
>>>
>>>> Genevieve DeClerck wrote:
>>>>
>>>>> Hi,
>>>>
>>>> [snip]
>>>>
>>>>> If I've sorted the results the sorted-results will print to  
>>>>> screen,
>>>>> however when I try to print the Hit Table results nothing is
>>>>> returned,
>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>> comment out the part where i point my sorting subroutine to the
>>>>> blast
>>>>> results reference,  my hit table results suddenly prints to  
>>>>> screen.
>>>>
>>>> [snip]
>>>>
>>>>> Here's an abbreviated version of my code:
>>>>
>>>> [snip]
>>>>
>>>>> #######
>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>> # to stdout
>>>>> &sort_results($blast_report);
>>>>>
>>>>> # 2) print blast results
>>>>> &print_blast_results($blast_report);
>>>>
>>>>
>>>>> sub print_blast_results{
>>>>>    my $report = shift;
>>>>>    while(my $result = $report->next_result()){
>>>>
>>>> [snip]
>>>>
>>>> You didn't give us your sort_results subroutine, but is it as
>>>> simple as
>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>> you
>>>> don't reset the internal counter back to the start, so the second
>>>> subroutine tries to get the next_result and finds the first
>>>> subroutine
>>>> has already looked at the last result and so next_result returns
>>>> false?
>>>>
>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>> Hopefully
>>>> this can be done and someone else knows how.
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Sat Jun  3 23:54:20 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 04 Jun 2006 09:54:20 +1000
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <4480E7AA.3020603@gmx.at>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>
	<4480C78C.1000701@gmx.at>
	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>
	<4480DC8B.7070005@gmx.at>
	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>
	<4480E7AA.3020603@gmx.at>
Message-ID: <4482212C.3000908@infotech.monash.edu.au>

Hubert,

> another question for parsing the xml output....is there a xml parser 
> available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but 
> I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

I think we already answered this question for you on 20 May 2006:

http://bioperl.org/pipermail/bioperl-l/2006-May/021574.html
http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#How_to_parse_BLAST_XML_output

http://www.bioperl.org/wiki/HOWTO:SearchIO (search for "blastxml")

--Torsten Seemann


From cjfields at uiuc.edu  Sun Jun  4 05:17:46 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 00:17:46 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
Message-ID: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>

There's an interesting addition to this I found while checking this  
out; looks like if you use:

my @hits =  $result->hits;

to get all the hits, you don't need to use '$result->rewind'.  The  
rewind method resets the iterator for the hit list back back to the  
beginning, but using the hits method to grab all the hits doesn't use  
the iterator at all.  This works either pre- or post-iteration  
through the Hit::BlastHit objects.

Another thing; Genevieve was passing the SearchIO report object (i.e.  
the parser object which was returned from StandAloneBlast,  
$blast_report) to the methods, not the  
Bio::Search::Result::BlastResult object; looks like there was some  
confusion between the two object types since she refers to the report  
as the result object when it's actually the SearchIO parser object.   
So, once the parser was passed into the first method, a result object  
was generated, then destroyed.  When entering the second method, the  
parser had already read parsed the report and generated the objects,  
so it ended with no output.

Though passing the BlastResult object is better since one should only  
have to parse the report once and use the objects, for curiosity's  
sake, is there a method to rewind the parser itself (in other words,  
read through the report again)?

Chris


On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:

> In the HOWTO hits() and hsps() were there, I just added rewind in the
> table of methods.
> If someone wanted to write a little section in the HOWTO about
> resetting the iterator that would be great.
>
> -jason
> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>
>> Nice!  Didn't know I could do that.  Maybe we should add some of this
>> to the HOWTO (or is it already in there?).
>>
>> Chris
>>
>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>
>>> you can get all the Hits or hsps with the following method:
>>> my @hits = $result->hits;
>>> my @hsps = $hit->hsps;
>>>
>>>
>>> You can also reset the counter since these implementations are in-
>>> memory and already parsed (and not a stream processor per se).
>>> next_XX just iterates through the list stored in the parent object.
>>>
>>> $result->rewind;
>>>
>>>    and
>>>
>>> $hit->rewind;
>>>
>>>
>>> For example, the rewind needs to be called if you want to use a
>>> ResultWriter object and filter some of the values for the final
>>> writing after first inspecting them.
>>>
>>> -jason
>>>
>>>
>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>
>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>> must be
>>>> what's going on.. I am using $blast_report->next_result in both
>>>> subroutines. It appears that analyzing the blast results first  
>>>> w/ my
>>>> sort subroutine empties (?) the $blast_result object so that when I
>>>> try
>>>> to print, there is nothing left to print. (and visa-versa when I
>>>> print
>>>> first then try to sort).
>>>> So, from the looks of things, using next_result has the effect of
>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>> SearchIO
>>>> blast report object??
>>>>
>>>> It seems I could get around this by making a copy of the blast
>>>> report by
>>>> setting it to another new variable...(not the most elegant
>>>> solution) but
>>>> I'm having trouble with this...
>>>>
>>>> If I do:
>>>>
>>>> 	my $blast_report_copy = $blast_report;
>>>>
>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>> doesn't help me. How can I make another physical copy of this blast
>>>> result object? Seems like a simple thing but how to do it is
>>>> escaping me.
>>>>
>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>> find a
>>>> way to look at/print/sort the results without removing data from  
>>>> the
>>>> blast result object. How is this done though??
>>>>
>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>> it is
>>>> sprawling, as is a lot of my code. The code I provided was more
>>>> like an
>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>> sorry
>>>> for the confusion, I should have more clear on that.  The important
>>>> thing to know perhaps is that both sort_results and
>>>> print_blast_results
>>>> contain a foreach loop where I am using the 'next_results'  
>>>> method to
>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>> working just fine - the analysis/viewing of the results object is
>>>> where
>>>> I am encountering the problem.)
>>>>
>>>>
>>>> Any other ideas would be greatly appreciated...
>>>>
>>>> Thank you,
>>>> Genevieve
>>>>
>>>>
>>>>
>>>>
>>>> Sendu Bala wrote:
>>>>
>>>>> Genevieve DeClerck wrote:
>>>>>
>>>>>> Hi,
>>>>>
>>>>> [snip]
>>>>>
>>>>>> If I've sorted the results the sorted-results will print to
>>>>>> screen,
>>>>>> however when I try to print the Hit Table results nothing is
>>>>>> returned,
>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>> blast
>>>>>> results reference,  my hit table results suddenly prints to
>>>>>> screen.
>>>>>
>>>>> [snip]
>>>>>
>>>>>> Here's an abbreviated version of my code:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> #######
>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>> # to stdout
>>>>>> &sort_results($blast_report);
>>>>>>
>>>>>> # 2) print blast results
>>>>>> &print_blast_results($blast_report);
>>>>>
>>>>>
>>>>>> sub print_blast_results{
>>>>>>    my $report = shift;
>>>>>>    while(my $result = $report->next_result()){
>>>>>
>>>>> [snip]
>>>>>
>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>> simple as
>>>>> they both use $report->next_result (and/or $result->next_hit), but
>>>>> you
>>>>> don't reset the internal counter back to the start, so the second
>>>>> subroutine tries to get the next_result and finds the first
>>>>> subroutine
>>>>> has already looked at the last result and so next_result returns
>>>>> false?
>>>>>
>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>> Hopefully
>>>>> this can be done and someone else knows how.
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason.stajich at duke.edu  Sun Jun  4 14:08:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 4 Jun 2006 10:08:29 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
Message-ID: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>

right - you don't need rewind if you aren't going to use the iterator  
(next_XXX) -- we provide two different ways to get access to the data.
you can do
for my $hit ( $result->hits ) {

}
or
while( my $hit = $result->next_hit ) {
}


If you want to rewind the parser then (assuming you are using a  
filestream and not a data stream from the web or zcat or something)  
just reset the filehandle
seek($searchio->_fh, 0);

but then you'll have to re-parse everything and pay that cost twice -  
it makes more sense to me to just save the results and put them in  
list if you are going to deliberately make two passes over all the  
results.    You either pay the cost of memory (keeping all the  
objects) or time (reparse the results).


-jason
On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:

> There's an interesting addition to this I found while checking this  
> out; looks like if you use:
>
> my @hits =  $result->hits;
>
> to get all the hits, you don't need to use '$result->rewind'.  The  
> rewind method resets the iterator for the hit list back back to the  
> beginning, but using the hits method to grab all the hits doesn't  
> use the iterator at all.  This works either pre- or post-iteration  
> through the Hit::BlastHit objects.
>
> Another thing; Genevieve was passing the SearchIO report object  
> (i.e. the parser object which was returned from StandAloneBlast,  
> $blast_report) to the methods, not the  
> Bio::Search::Result::BlastResult object; looks like there was some  
> confusion between the two object types since she refers to the  
> report as the result object when it's actually the SearchIO parser  
> object.  So, once the parser was passed into the first method, a  
> result object was generated, then destroyed.  When entering the  
> second method, the parser had already read parsed the report and  
> generated the objects, so it ended with no output.
>
> Though passing the BlastResult object is better since one should  
> only have to parse the report once and use the objects, for  
> curiosity's sake, is there a method to rewind the parser itself (in  
> other words, read through the report again)?
>
> Chris
>
>
> On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote:
>
>> In the HOWTO hits() and hsps() were there, I just added rewind in the
>> table of methods.
>> If someone wanted to write a little section in the HOWTO about
>> resetting the iterator that would be great.
>>
>> -jason
>> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote:
>>
>>> Nice!  Didn't know I could do that.  Maybe we should add some of  
>>> this
>>> to the HOWTO (or is it already in there?).
>>>
>>> Chris
>>>
>>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote:
>>>
>>>> you can get all the Hits or hsps with the following method:
>>>> my @hits = $result->hits;
>>>> my @hsps = $hit->hsps;
>>>>
>>>>
>>>> You can also reset the counter since these implementations are in-
>>>> memory and already parsed (and not a stream processor per se).
>>>> next_XX just iterates through the list stored in the parent object.
>>>>
>>>> $result->rewind;
>>>>
>>>>    and
>>>>
>>>> $hit->rewind;
>>>>
>>>>
>>>> For example, the rewind needs to be called if you want to use a
>>>> ResultWriter object and filter some of the values for the final
>>>> writing after first inspecting them.
>>>>
>>>> -jason
>>>>
>>>>
>>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote:
>>>>
>>>>> Thanks for your comment Sendu, it was very helpful. I think this
>>>>> must be
>>>>> what's going on.. I am using $blast_report->next_result in both
>>>>> subroutines. It appears that analyzing the blast results first  
>>>>> w/ my
>>>>> sort subroutine empties (?) the $blast_result object so that  
>>>>> when I
>>>>> try
>>>>> to print, there is nothing left to print. (and visa-versa when I
>>>>> print
>>>>> first then try to sort).
>>>>> So, from the looks of things, using next_result has the effect of
>>>>> popping the Bio::Search::Result::ResultI objects off of the  
>>>>> SearchIO
>>>>> blast report object??
>>>>>
>>>>> It seems I could get around this by making a copy of the blast
>>>>> report by
>>>>> setting it to another new variable...(not the most elegant
>>>>> solution) but
>>>>> I'm having trouble with this...
>>>>>
>>>>> If I do:
>>>>>
>>>>> 	my $blast_report_copy = $blast_report;
>>>>>
>>>>> I'm just copying the reference to the SearchIO blast result, so it
>>>>> doesn't help me. How can I make another physical copy of this  
>>>>> blast
>>>>> result object? Seems like a simple thing but how to do it is
>>>>> escaping me.
>>>>>
>>>>> But better yet, the way to go is to 'reset the counter,' or to
>>>>> find a
>>>>> way to look at/print/sort the results without removing data  
>>>>> from the
>>>>> blast result object. How is this done though??
>>>>>
>>>>> Sendu and Brian, I didn't post the sort_results subroutine because
>>>>> it is
>>>>> sprawling, as is a lot of my code. The code I provided was more
>>>>> like an
>>>>> aid for my explanation of the problem.. it doesn't actually run -
>>>>> sorry
>>>>> for the confusion, I should have more clear on that.  The  
>>>>> important
>>>>> thing to know perhaps is that both sort_results and
>>>>> print_blast_results
>>>>> contain a foreach loop where I am using the 'next_results'  
>>>>> method to
>>>>> view blast results. (And to clarify for Torsten, the blastall() is
>>>>> working just fine - the analysis/viewing of the results object is
>>>>> where
>>>>> I am encountering the problem.)
>>>>>
>>>>>
>>>>> Any other ideas would be greatly appreciated...
>>>>>
>>>>> Thank you,
>>>>> Genevieve
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sendu Bala wrote:
>>>>>
>>>>>> Genevieve DeClerck wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> If I've sorted the results the sorted-results will print to
>>>>>>> screen,
>>>>>>> however when I try to print the Hit Table results nothing is
>>>>>>> returned,
>>>>>>> as if the blast results have evaporated.... and visa versa, if i
>>>>>>> comment out the part where i point my sorting subroutine to the
>>>>>>> blast
>>>>>>> results reference,  my hit table results suddenly prints to
>>>>>>> screen.
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> Here's an abbreviated version of my code:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> #######
>>>>>>> ### the following 2 actions seem to be mutually exclusive.
>>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>>>>>>> # SeqFeature objs stored in arrays. arrays are then printed
>>>>>>> # to stdout
>>>>>>> &sort_results($blast_report);
>>>>>>>
>>>>>>> # 2) print blast results
>>>>>>> &print_blast_results($blast_report);
>>>>>>
>>>>>>
>>>>>>> sub print_blast_results{
>>>>>>>    my $report = shift;
>>>>>>>    while(my $result = $report->next_result()){
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>> You didn't give us your sort_results subroutine, but is it as
>>>>>> simple as
>>>>>> they both use $report->next_result (and/or $result->next_hit),  
>>>>>> but
>>>>>> you
>>>>>> don't reset the internal counter back to the start, so the second
>>>>>> subroutine tries to get the next_result and finds the first
>>>>>> subroutine
>>>>>> has already looked at the last result and so next_result returns
>>>>>> false?
>>>>>>
>>>>>>  From a quick look it wasn't obvious how to reset the counter.
>>>>>> Hopefully
>>>>>> this can be done and someone else knows how.
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From cjfields at uiuc.edu  Sun Jun  4 15:51:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Jun 2006 10:51:53 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
Message-ID: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>


On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:

> right - you don't need rewind if you aren't going to use the  
> iterator (next_XXX) -- we provide two different ways to get access  
> to the data.
> you can do
> for my $hit ( $result->hits ) {
>
> }
> or
> while( my $hit = $result->next_hit ) {
> }
>
>
> If you want to rewind the parser then (assuming you are using a  
> filestream and not a data stream from the web or zcat or something)  
> just reset the filehandle
> seek($searchio->_fh, 0);
>
> but then you'll have to re-parse everything and pay that cost twice  
> - it makes more sense to me to just save the results and put them  
> in list if you are going to deliberately make two passes over all  
> the results.    You either pay the cost of memory (keeping all the  
> objects) or time (reparse the results).

I agree there isn't any really good reason to rewind the parser; I  
was mainly just curious how this was accomlished.  Your point about a  
memory or time hit might be a point we want to make in the HOWTO.  I  
already added some example code about rewinding the iterator and  
hits, so I'll add a bit about this.

I think a good deal of confusion here comes from not knowing how  
SearchIO works (i.e. that parsing a report can return several  
results, in turn which can return hits, in tur returning HSP's).  Of  
course that doesn't include iterations in the case of PSI-BLAST.    
The HOWTO, I think, explains this all well so it may be a matter of  
just RTM (I left the 'F' out to be a bit more polite).

Chris

> -jason
> On Jun 4, 2006, at 1:17 AM, Chris Fields wrote:
>
...


Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Mon Jun  5 08:16:59 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 05 Jun 2006 16:16:59 +0800
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
Message-ID: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>


Dear Lincoln and experts

Curently I have a CGI application that does this:

1.  read and uploaded file 
2. check the content of the file whether fasta or not
3. print out the content of the file.


Now the problem I'm facing is that
on step three. The content of the file handled is altered
namely the very first line does not get printed. 

So for example if "test1.fasta" looks like this:

>Seq0
ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC
 
When it was printed it gives only:

ATCGGGGG
>Seq1
GGGGGGG
>Seq2
ATCCCCCC

Why is this happening? 

Below is the complete cgi script that 
does the task  I mentioned earlier.

Did I missed out anything in my code?


__BEGIN__
#!/usr/bin/perl -w

use CGI qw/:standard :html3/;
use CGI::Carp qw( fatalsToBrowser );
use Data::Dumper;

BEGIN {
    if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {

        # Blindly untaint.  Taintchecking is to protect
        # from Web data;
        # the environment is under our control.
        eval "use lib '$_';" foreach (
            reverse
            split( /:/, $1 )
        );
    }
}


use Bio::Tools::GuessSeqFormat;

print header,
    start_html('file upload'),
    h1('file upload!');
print_form()    unless param;
print_results() if param;
print end_html;

sub print_form {
    print start_multipart_form(),
       filefield(-name=>'upload',-size=>60),br,
       submit(-label=>'Upload File'),
       end_form;
}

sub print_results {
    my $length;
    my $file = param('upload');
    my $fh_upload = upload('upload');

    my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload );
    my $format_upload  = $guesser_upload->guess;

    if ( !$file ) {
        print "No file uploaded.";
        return;
    }
    print h2('File name'),      $file;
    print h2('Format'), $format_upload;
    print h2('The content is'),br;

    while (<$fh_upload>) {

     # The very first line of the file is not get printed here
     # Why?

        print;
        print br;
        $length += length($_);
    }
    print h2('File length'), $length;
}


__END__

Hope to hear from you again.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 09:02:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:02:48 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <4483F338.7090909@mrc-dunn.cam.ac.uk>

Wijaya Edward wrote:
> Dear Lincoln and experts
> 
> Curently I have a CGI application that does this:
> 
> 1.  read and uploaded file 
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
> 
> 
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed. 

The problem is almost certainly that the guessing is done by reading the 
first line of the filehandle, so that your subsequent while loop on that 
same filehandle starts at the second line.
Just seek the filehandle back to the start before trying to print the 
contents out.

...
my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
my $format_upload  = $guesser_upload->guess;
seek($fh_upload, 0, 0);
...
while (<$fh_upload>) {
     ...
}

An alternative might be to pass GuessSeqFormat the filename in which 
case it would make its own filehandle and close it, leaving your own 
filehandle untouched.


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 09:57:52 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 10:57:52 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
Message-ID: <44840020.4020604@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
> 
>> If you want to rewind the parser then (assuming you are using a 
>> filestream and not a data stream from the web or zcat or something) 
>> just reset the filehandle
>> seek($searchio->_fh, 0);
>>
>> but then you'll have to re-parse everything and pay that cost twice - 
>> it makes more sense to me to just save the results and put them in 
>> list if you are going to deliberately make two passes over all the 
>> results.    You either pay the cost of memory (keeping all the 
>> objects) or time (reparse the results).
> 
> I agree there isn't any really good reason to rewind the parser; I was 
> mainly just curious how this was accomlished.

Didn't you already explain why seeking a SearchIO wouldn't work? And 
indeed, didn't Genevieve already try to do this after I suggested it and 
  found that it didn't work?

Confused...


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 13:19:12 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 14:19:12 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
Message-ID: <44842F50.7090408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> 
> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
> 
>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>> indeed, didn't Genevieve already try to do this after I suggested it and
>> found that it didn't work?
>>
>> Confused...
>>
> There is an internal _rewind if you are using the next_XX methods that 
> resets the internal iterator (all the data has already been parsed).
> 
> You >>can<< reseek the internal filehandle (accessible by calling 
> $object->_fh ), but you can't call seek on the searchio object itsself.

... poor choice of words on my part. Or maybe I'm not understanding 
you... I already suggested to Genevieve that she try:

# in the following, $blast_report is a SearchIO
> my $blast_report = $factory->blastall($ref_seq_objs);
> my $blast_fh = $blast_report->fh();
> while (<$blast_fh>) {
>      # $_ is a ResultI object, use as normal
> }
> seek($blast_fh, 0, 0); # this would be great, but does it work?
> while <$blast_fh>) {
>      # go through the results again in your second subroutine
> }
> 
> An alternative hacky way of doing it, which may also not work, would be 
> to go through your $blast_report as normal, but then before going 
> through it a second time, say
> my $fh = $blast_report->_fh;
> seek($fh, 0, 0);

She reported that neither way of doing it worked. You seem to be saying 
that at least the second way should have. Is that right?
rewind() would of course be preferable, I just wanted to know if my 
assumption about seek working was correct or not.


From jason at bioperl.org  Mon Jun  5 13:45:40 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:45:40 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44842F50.7090408@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
Message-ID: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>

It depends on how you have run StandAloneBlast -- if the stream you  
are dealing with is not a file, but a datastream as in the STDOUT  
from BLAST, then the seek won't work (as it wouldn't work for a zcat  
on gzipped file).  I think the default StandAloneBlast behavior is to  
operate on a STDOUT stream so seeking won't work no matter what.


On Jun 5, 2006, at 9:19 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:
>>
>>> Didn't you already explain why seeking a SearchIO wouldn't work? And
>>> indeed, didn't Genevieve already try to do this after I suggested  
>>> it and
>>> found that it didn't work?
>>>
>>> Confused...
>>>
>> There is an internal _rewind if you are using the next_XX methods  
>> that
>> resets the internal iterator (all the data has already been parsed).
>>
>> You >>can<< reseek the internal filehandle (accessible by calling
>> $object->_fh ), but you can't call seek on the searchio object  
>> itsself.
>
> ... poor choice of words on my part. Or maybe I'm not understanding
> you... I already suggested to Genevieve that she try:
>
> # in the following, $blast_report is a SearchIO
>> my $blast_report = $factory->blastall($ref_seq_objs);
>> my $blast_fh = $blast_report->fh();
>> while (<$blast_fh>) {
>>      # $_ is a ResultI object, use as normal
>> }
>> seek($blast_fh, 0, 0); # this would be great, but does it work?
>> while <$blast_fh>) {
>>      # go through the results again in your second subroutine
>> }
>>
>> An alternative hacky way of doing it, which may also not work,  
>> would be
>> to go through your $blast_report as normal, but then before going
>> through it a second time, say
>> my $fh = $blast_report->_fh;
>> seek($fh, 0, 0);
>
> She reported that neither way of doing it worked. You seem to be  
> saying
> that at least the second way should have. Is that right?
> rewind() would of course be preferable, I just wanted to know if my
> assumption about seek working was correct or not.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 14:13:03 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 15:13:03 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
	<A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>
	<44842F50.7090408@mrc-dunn.cam.ac.uk>
	<E8302034-5A6F-43C0-A481-94DC42ACF088@bioperl.org>
Message-ID: <44843BEF.6080609@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> It depends on how you have run StandAloneBlast -- if the stream you are 
> dealing with is not a file, but a datastream as in the STDOUT from 
> BLAST, then the seek won't work (as it wouldn't work for a zcat on 
> gzipped file).  I think the default StandAloneBlast behavior is to 
> operate on a STDOUT stream so seeking won't work no matter what.

As far as I can see, when you say blastall() on a StandAloneBlast, it 
eventually does:

if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
     $blast_obj = Bio::SearchIO->new(-file=>$outfile,
			            -format => 'blast' );
}

So seeking should work? Tools like StandAloneBlast creating temp files 
for their results prior to parsing is actually one of things I don't 
like about the bioperl tool system.


From lstein at cshl.edu  Mon Jun  5 14:51:52 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 5 Jun 2006 10:51:52 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
Message-ID: <200606051051.52648.lstein@cshl.edu>

Hi,

From the Synopsis for GuessSeqFormat:

           # To guess the format from an already open filehandle:
           my $guesser = new Bio::Tools::GuessSeqFormat( -fh => $filehandle );
           my $format  = $guesser->guess;
           # If the filehandle is seekable (STDIN isn't), it will be
           # returned to its original position.

The filehandle returned by CGI.pm is not seekable.

Lincoln

On Monday 05 June 2006 04:16, Wijaya Edward wrote:
> Dear Lincoln and experts
>
> Curently I have a CGI application that does this:
>
> 1.  read and uploaded file
> 2. check the content of the file whether fasta or not
> 3. print out the content of the file.
>
>
> Now the problem I'm facing is that
> on step three. The content of the file handled is altered
> namely the very first line does not get printed.
>
> So for example if "test1.fasta" looks like this:
> >Seq0
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> When it was printed it gives only:
>
> ATCGGGGG
>
> >Seq1
>
> GGGGGGG
>
> >Seq2
>
> ATCCCCCC
>
> Why is this happening?
>
> Below is the complete cgi script that
> does the task  I mentioned earlier.
>
> Did I missed out anything in my code?
>
>
>
> __BEGIN__
> #!/usr/bin/perl -w
>
> use CGI qw/:standard :html3/;
> use CGI::Carp qw( fatalsToBrowser );
> use Data::Dumper;
>
> BEGIN {
>     if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) {
>
>         # Blindly untaint.  Taintchecking is to protect
>         # from Web data;
>         # the environment is under our control.
>         eval "use lib '$_';" foreach (
>             reverse
>             split( /:/, $1 )
>         );
>     }
> }
>
>
> use Bio::Tools::GuessSeqFormat;
>
> print header,
>     start_html('file upload'),
>     h1('file upload!');
> print_form()    unless param;
> print_results() if param;
> print end_html;
>
> sub print_form {
>     print start_multipart_form(),
>        filefield(-name=>'upload',-size=>60),br,
>        submit(-label=>'Upload File'),
>        end_form;
> }
>
> sub print_results {
>     my $length;
>     my $file = param('upload');
>     my $fh_upload = upload('upload');
>
>     my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload
> ); my $format_upload  = $guesser_upload->guess;
>
>     if ( !$file ) {
>         print "No file uploaded.";
>         return;
>     }
>     print h2('File name'),      $file;
>     print h2('Format'), $format_upload;
>     print h2('The content is'),br;
>
>     while (<$fh_upload>) {
>
>      # The very first line of the file is not get printed here
>      # Why?
>
>         print;
>         print br;
>         $length += length($_);
>     }
>     print h2('File length'), $length;
> }
>
>
> __END__
>
> Hope to hear from you again.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the
> intended recipient, please delete it and notify us immediately. Please do
> not copy or use it for any purpose, or disclose its contents to any other
> person. Thank you. --------------------------------------------------------

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060605/0d6f7bb0/attachment.sig>

From cjfields at uiuc.edu  Mon Jun  5 16:30:41 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 11:30:41 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44843BEF.6080609@mrc-dunn.cam.ac.uk>
Message-ID: <006001c688bd$62d48850$15327e82@pyrimidine>

If you want flexibility or added functionality then you can always
contribute a patch, such as adding an option for filehandles, IO::String,
pipes/forks, or whatever you wish.  Or you could suggest such to the module
maintainer, Torsten, and then it's his choice whether he wants to make it a
priority to implement it.  Simply stating this is 'one of things I don't
like about the bioperl tool system' isn't productive here.   It hasn't been
a top priority to implement something along those lines since the module
works for them as is, so if you want these options you'll have to add them,
and add the appropriate tests.

As for the seek issue, the file handle you get by using '$blast_report-fh()'
isn't the raw input file stream but is a tied filehandle of a stream of
ResultI objects:
==================================
Jason's version:
# seek called on the >>internal<< filehandle (from Bio::Root::IO)
# this is the raw data input stream from a file, so should work
seek($searchio->_fh, 0);
==================================
Your version:
# seek called on SearchIO object filehandle
my $blast_report = $factory->blastall($ref_seq_objs);
# this is a tied filehandle for an output stream of objects from SearchIO,
# NOT the raw input stream
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
	# a stream of Bio::Search::Result::BlastResult objects 
} 
# can't use seek on a tied filehandle, won't work unless 
# SEEK class method is implemented (and it's not)
seek($blast_fh, 0, 0); 
==================================

There's a good deal in Programming Perl about tied filehandles.  You'll
notice that Bio::SearchIO implements TIEHANDLE, READLINE, DESTROY, and PRINT
methods, but not SEEK since we've never needed it.  You can always add one
if you want but I really don't see the point based on reasons Jason and I
outlined before.

Seems there is not much overall documentation on newFh or $blast_report->fh,
but I believe it's analogous to the SeqIO version which is covered a bit in
the bptutorial file, now on the wiki:

http://www.bioperl.org/wiki/Bptutorial.pl#III.2.1_Transforming_sequence_file
s_.28SeqIO.29

$in  = Bio::SeqIO->newFh(-file => "inputfilename" ,
                          -format => 'fasta');
$out = Bio::SeqIO->newFh(-format => 'embl');
print $out $_ while <$in>;

Wouldn't hurt if someone wants to add a bit more about these to the SearchIO
HOWTO.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 05, 2006 9:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Jason Stajich wrote:
> > It depends on how you have run StandAloneBlast -- if the stream you are
> > dealing with is not a file, but a datastream as in the STDOUT from
> > BLAST, then the seek won't work (as it wouldn't work for a zcat on
> > gzipped file).  I think the default StandAloneBlast behavior is to
> > operate on a STDOUT stream so seeking won't work no matter what.
> 
> As far as I can see, when you say blastall() on a StandAloneBlast, it
> eventually does:
> 
> if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i )  {
>      $blast_obj = Bio::SearchIO->new(-file=>$outfile,
> 			            -format => 'blast' );
> }
> 
> So seeking should work? Tools like StandAloneBlast creating temp files
> for their results prior to parsing is actually one of things I don't
> like about the bioperl tool system.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Jun  5 13:02:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Jun 2006 09:02:02 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44840020.4020604@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
	<84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu>
	<BCC0A170-7563-4C92-ABF2-5C0495BB309B@uiuc.edu>
	<D97BD03C-DB37-47CC-8C48-79FC6F8EE019@duke.edu>
	<8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu>
	<BA8C1B23-4D1C-4162-8181-8CD3D70B1897@duke.edu>
	<8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu>
	<44840020.4020604@mrc-dunn.cam.ac.uk>
Message-ID: <A3614BB8-BE61-448F-946D-FD172F703D00@bioperl.org>


On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote:
>>
>>> If you want to rewind the parser then (assuming you are using a
>>> filestream and not a data stream from the web or zcat or something)
>>> just reset the filehandle
>>> seek($searchio->_fh, 0);
>>>
>>> but then you'll have to re-parse everything and pay that cost  
>>> twice -
>>> it makes more sense to me to just save the results and put them in
>>> list if you are going to deliberately make two passes over all the
>>> results.    You either pay the cost of memory (keeping all the
>>> objects) or time (reparse the results).
>>
>> I agree there isn't any really good reason to rewind the parser; I  
>> was
>> mainly just curious how this was accomlished.
>
> Didn't you already explain why seeking a SearchIO wouldn't work? And
> indeed, didn't Genevieve already try to do this after I suggested  
> it and
>   found that it didn't work?
>
> Confused...
>
There is an internal _rewind if you are using the next_XX methods  
that resets the internal iterator (all the data has already been  
parsed).

You >>can<< reseek the internal filehandle (accessible by calling  
$object->_fh ), but you can't call seek on the searchio object itsself.

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 17:23:36 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 18:23:36 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006001c688bd$62d48850$15327e82@pyrimidine>
References: <006001c688bd$62d48850$15327e82@pyrimidine>
Message-ID: <44846898.8020001@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> If you want flexibility or added functionality then you can always
> contribute a patch, such as adding an option for filehandles, IO::String,
> pipes/forks, or whatever you wish.

Well, it wouldn't be a new feature per se, but just changing the way the 
modules work under the hood.


> Or you could suggest such to the module
> maintainer, Torsten, and then it's his choice whether he wants to make it a
> priority to implement it.  Simply stating this is 'one of things I don't
> like about the bioperl tool system' isn't productive here.

Yes, I apologise for that. I had thought too much would need to be 
changed and backward compatibility wouldn't be possible, but just 
changing StandAloneBlast should be possible.

I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... is there any point in working 
up a patch (would something of questionable reliability ever be 
committed into bioperl)?


> As for the seek issue, the file handle you get by using '$blast_report-fh()'
> isn't the raw input file stream but is a tied filehandle of a stream of
> ResultI objects:
> ==================================
> Jason's version:
> # seek called on the >>internal<< filehandle (from Bio::Root::IO)
> # this is the raw data input stream from a file, so should work
> seek($searchio->_fh, 0);
> ==================================
> Your version:
> # seek called on SearchIO object filehandle
> my $blast_report = $factory->blastall($ref_seq_objs);
> # this is a tied filehandle for an output stream of objects from SearchIO,
> # NOT the raw input stream
> my $blast_fh = $blast_report->fh();

For academic interest, how do I get the 'raw input stream'? Wasn't that 
what my second version did?

 > my $fh = $blast_report->_fh;
 > seek($fh, 0, 0);


From hubert.prielinger at gmx.at  Mon Jun  5 18:17:53 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 12:17:53 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
References: <000001c68691$8c4eeb40$15327e82@pyrimidine>	<4480C78C.1000701@gmx.at>	<B2F5D2AE-DFD2-499F-AF47-73B440B923F1@uiuc.edu>	<4480DC8B.7070005@gmx.at>	<573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu>	<4480E7AA.3020603@gmx.at>
	<720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu>
Message-ID: <44847551.7040705@gmx.at>

hi,
you were right, removing the composition-based statistics solved the 
problem. Now I get the result viewed on STDIN, but it doesn't save the 
output in the file.
I haved tried it by reopening the file and writing it to an other file 
again, but it doesn't work.....
The strange thing is that if I retrieve text instead of xml output it 
works without any problem. Don't know why

Hubert


Chris Fields wrote:
> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>
>   
>> hi chris,
>> thanks but I never intended to run the remoteblast with so much,  
>> only a few of them, acutally I goal is to run the phiblast with  
>> regular expression, so that i just don't need that
>> file anymore
>>     
>
> Not a problem.  Just to let you know, I did manage to get the script  
> working, so I'm marking the bug INVALID.  I think the problem isn't  
> that there is an infinite loop so much as setting composition-based  
> statistics causes the search to take much much longer; try removing  
> that line to see what I mean.
>
> Just so you know, using $result->query_name doesn't get you what you  
> would expect (it gives you a part of the RID, which you don't want;  
> this is something in the XML output that is beyond our control).  You  
> might want to change it to something else or you'll get filenames  
> with numerical names.
>
>   
>> another question for parsing the xml output....is there a xml  
>> parser available for blast xml output or how to start.....
>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>> is their maybe another introduction or an example.
>>     
>
> Bio::SearchIO objects are used to parse BLAST XML output if you have  
> it saved to a file.  For instance:
>
> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>
> while (my $result = $factory->next_result) {
>    while (my $hit = $result->next_hit) {
>       while (my $hsp = $hit->next_hsp {
>          #do stuff here
>        }
>     }
> }
>
> The only thing that changes in parsing a text BLAST report from an  
> XML BLAST report is the -format line (similar to the -readmethod  
> parameter in RemoteBlast).  You shouldn't need to look up any more  
> documentation other than these on the wiki:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>
> Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
> up parsing.
>
> Chris
>
>   
>> thanks
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Yes, I see the same error you do.  But I have a similar script   
>>> (blastp, XML blast report, XML parsing, similar loop structure)  
>>> that  works fine.  I'm trying to dissect the problem but I think  
>>> it may be  something logically wrong here (something not so  
>>> obvious) and not a  bug...
>>>
>>> What I'm trying to say is, when you send sequences using  
>>> remoteblast  like, this you are essentially spamming the NCBI  
>>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>>> that intent in mind;  you should really try to set up your own  
>>> local blast database if  possible.  If you can't, try running this  
>>> script in off-hours  (10pm-6am EST or something like that).
>>>
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi,
>>>> input database: swissprot
>>>>         matrix: pam30
>>>>         count: 1
>>>>         gapcosts: 9 1
>>>>
>>>> I know that there are  a lot of sequences, but that doesn't  
>>>> matter,  you can delete all of them except one, the amount of the  
>>>> sequences  is not the problem, the script reads one line and  
>>>> submits  it.....then the second line and so on.....I have tried  
>>>> it with only  one sequence either and I got the same result....  
>>>> the script run at  that time for more than 20  
>>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>>> the results for ONE sequence, I guess
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> You need to add the input conditions as well (you have several   
>>>>> <STDIN> lines which may play a role; I would like to know what  
>>>>> you  normally enter for those).
>>>>>
>>>>> How long did you let the script run?  I ran a quick check on  
>>>>> your  sequences; you have almost 1600, so you have to expect  
>>>>> that you'll  run into some problems here!  Most here (including  
>>>>> me) would  suggest you try installing a local blast setup for  
>>>>> something like  this.
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> I have submitted the bug -> Bug 2017
>>>>>> with the script and input file, just start it from command line
>>>>>>
>>>>>> thank you very much
>>>>>> greetings
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Hubert,
>>>>>>>
>>>>>>> I have a script that's using blastxml and XML output which  
>>>>>>> seems  to work.
>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>>> 'Sendu  Bala'
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>>> run  several
>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>> I didn't get any results.
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Sendu, Hubert,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>>> the  problem
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> (break
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>>> RemoteBlast in
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> CVS;
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>>> CVS  to see if
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> it
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> hi,
>>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>>> for  retrieving
>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>>> anymore.....
>>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>>> the  query, but
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>> I
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>
>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>>> over..... it
>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>>> the  NCBI server
>>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>>> do  a blast,
>>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>>> normal  'waiting'
>>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>>> seconds,  but
>>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>>> an  xml error
>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>
>>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>>> treats no
>>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>>> patch  to fix this
>>>>>>>>>> at this bug page:
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Mon Jun  5 18:32:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:32:47 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44847551.7040705@gmx.at>
Message-ID: <006101c688ce$7185c330$15327e82@pyrimidine>

Hubert, 

Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
option to save XML was committed relatively recently (last month or so).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Monday, June 05, 2006 1:18 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> you were right, removing the composition-based statistics solved the
> problem. Now I get the result viewed on STDIN, but it doesn't save the
> output in the file.
> I haved tried it by reopening the file and writing it to an other file
> again, but it doesn't work.....
> The strange thing is that if I retrieve text instead of xml output it
> works without any problem. Don't know why
> 
> Hubert
> 
> 
> 
> Chris Fields wrote:
> > On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
> >
> >
> >> hi chris,
> >> thanks but I never intended to run the remoteblast with so much,
> >> only a few of them, acutally I goal is to run the phiblast with
> >> regular expression, so that i just don't need that
> >> file anymore
> >>
> >
> > Not a problem.  Just to let you know, I did manage to get the script
> > working, so I'm marking the bug INVALID.  I think the problem isn't
> > that there is an infinite loop so much as setting composition-based
> > statistics causes the search to take much much longer; try removing
> > that line to see what I mean.
> >
> > Just so you know, using $result->query_name doesn't get you what you
> > would expect (it gives you a part of the RID, which you don't want;
> > this is something in the XML output that is beyond our control).  You
> > might want to change it to something else or you'll get filenames
> > with numerical names.
> >
> >
> >> another question for parsing the xml output....is there a xml
> >> parser available for blast xml output or how to start.....
> >> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
> >> but I'm not sure how to start....sorry, I guess I'm too stupid....
> >> is their maybe another introduction or an example.
> >>
> >
> > Bio::SearchIO objects are used to parse BLAST XML output if you have
> > it saved to a file.  For instance:
> >
> > my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
> >
> > while (my $result = $factory->next_result) {
> >    while (my $hit = $result->next_hit) {
> >       while (my $hsp = $hit->next_hsp {
> >          #do stuff here
> >        }
> >     }
> > }
> >
> > The only thing that changes in parsing a text BLAST report from an
> > XML BLAST report is the -format line (similar to the -readmethod
> > parameter in RemoteBlast).  You shouldn't need to look up any more
> > documentation other than these on the wiki:
> >
> > http://www.bioperl.org/wiki/HOWTO:SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
> >
> > Pay attention to the fact you'll need to install XML::SAX (CPAN) and
> > that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
> > up parsing.
> >
> > Chris
> >
> >
> >> thanks
> >> Hubert
> >>
> >>
> >> Chris Fields wrote:
> >>
> >>> Yes, I see the same error you do.  But I have a similar script
> >>> (blastp, XML blast report, XML parsing, similar loop structure)
> >>> that  works fine.  I'm trying to dissect the problem but I think
> >>> it may be  something logically wrong here (something not so
> >>> obvious) and not a  bug...
> >>>
> >>> What I'm trying to say is, when you send sequences using
> >>> remoteblast  like, this you are essentially spamming the NCBI
> >>> BLAST server with  ~1600 requests.  This script wasn't set up with
> >>> that intent in mind;  you should really try to set up your own
> >>> local blast database if  possible.  If you can't, try running this
> >>> script in off-hours  (10pm-6am EST or something like that).
> >>>
> >>>
> >>> Chris
> >>>
> >>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
> >>>
> >>>
> >>>
> >>>> hi,
> >>>> input database: swissprot
> >>>>         matrix: pam30
> >>>>         count: 1
> >>>>         gapcosts: 9 1
> >>>>
> >>>> I know that there are  a lot of sequences, but that doesn't
> >>>> matter,  you can delete all of them except one, the amount of the
> >>>> sequences  is not the problem, the script reads one line and
> >>>> submits  it.....then the second line and so on.....I have tried
> >>>> it with only  one sequence either and I got the same result....
> >>>> the script run at  that time for more than 20
> >>>> minutes!!!!!! .....and that should be  enough time to retrieve
> >>>> the results for ONE sequence, I guess
> >>>>
> >>>> regards
> >>>> Hubert
> >>>>
> >>>>
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> You need to add the input conditions as well (you have several
> >>>>> <STDIN> lines which may play a role; I would like to know what
> >>>>> you  normally enter for those).
> >>>>>
> >>>>> How long did you let the script run?  I ran a quick check on
> >>>>> your  sequences; you have almost 1600, so you have to expect
> >>>>> that you'll  run into some problems here!  Most here (including
> >>>>> me) would  suggest you try installing a local blast setup for
> >>>>> something like  this.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> hi,
> >>>>>> I have submitted the bug -> Bug 2017
> >>>>>> with the script and input file, just start it from command line
> >>>>>>
> >>>>>> thank you very much
> >>>>>> greetings
> >>>>>>
> >>>>>> Hubert
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Hubert,
> >>>>>>>
> >>>>>>> I have a script that's using blastxml and XML output which
> >>>>>>> seems  to work.
> >>>>>>> I'll try looking at it to get a better idea this weekend.
> >>>>>>>
> >>>>>>> Chris
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> >>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
> >>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
> >>>>>>>> 'Sendu  Bala'
> >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>
> >>>>>>>> hi,
> >>>>>>>> sorry, but I have updated the remoteblast module and I have
> >>>>>>>> run  several
> >>>>>>>> attempts with the same results as before. It didn't work.
> >>>>>>>> I didn't get any results.
> >>>>>>>>
> >>>>>>>> regards
> >>>>>>>> Hubert
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Chris Fields wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Sendu, Hubert,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
> >>>>>>>>> the  problem
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> (break
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
> >>>>>>>>> RemoteBlast in
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> CVS;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
> >>>>>>>>> CVS  to see if
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> it
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> works.
> >>>>>>>>>
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
> >>>>>>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>>>
> >>>>>>>>>> Hubert Prielinger wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> hi,
> >>>>>>>>>>> I have the following program and it worked quite well,
> >>>>>>>>>>> for  retrieving
> >>>>>>>>>>> remoteblast results in a textfile,
> >>>>>>>>>>> now I have altered it to to xml, and it didn't work
> >>>>>>>>>>> anymore.....
> >>>>>>>>>>> it takes all the parameter at the commandline, submits
> >>>>>>>>>>> the  query, but
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>> I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> don't retrieve any results file anymore.....
> >>>>>>>>>>>
> >>>>>>>>>>> it seems that it hangs in a endless loop......
> >>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
> >>>>>>>>>>> over..... it
> >>>>>>>>>>> doesn't enter the else term anymore....
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> There is no problem with your code. The problem is with
> >>>>>>>>>> the  NCBI server
> >>>>>>>>>> and should be reported to them. You can visit the site and
> >>>>>>>>>> do  a blast,
> >>>>>>>>>> requesting xml format, and you will typically get one
> >>>>>>>>>> normal  'waiting'
> >>>>>>>>>> message and the promise that it will be updated in x
> >>>>>>>>>> seconds,  but
> >>>>>>>>>> subsequent attempts to get progress information result in
> >>>>>>>>>> an  xml error
> >>>>>>>>>> page because the NCBI server doesn't actually send any data.
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately the way that the bioperl code is written, it
> >>>>>>>>>> treats no
> >>>>>>>>>> data as 'waiting' instead of an error. I've offered a
> >>>>>>>>>> patch  to fix this
> >>>>>>>>>> at this bug page:
> >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>> Christopher Fields
> >>> Postdoctoral Researcher
> >>> Lab of Dr. Robert Switzer
> >>> Dept of Biochemistry
> >>> University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jun  5 18:56:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 13:56:18 -0500
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44846898.8020001@mrc-dunn.cam.ac.uk>
Message-ID: <006201c688d1$bad2aff0$15327e82@pyrimidine>


> Chris Fields wrote:
> > If you want flexibility or added functionality then you can always
> > contribute a patch, such as adding an option for filehandles,
> IO::String,
> > pipes/forks, or whatever you wish.
> 
> Well, it wouldn't be a new feature per se, but just changing the way the
> modules work under the hood.

...

> I use IPC::Open3 for blasts and have never run into problems, but it
> pretty much falls into the 'apt to cause deadlock' camp. It may pass
> tests on one machine but fail on others... is there any point in working
> up a patch (would something of questionable reliability ever be
> committed into bioperl)?

The main thing you should avoid is major API changes or issues which break
this module on other OS's.  I'm not sure that StandAloneBlast is 'broken' by
using a tempfile as the location of the BLAST report.  

Any way you go about it, you'll have to capture the BLAST output as a stream
and get it to persist in a SearchIO object somehow.  It's can be a pretty
decent memory hit to keep that report hanging around, esp. if it is larger.

...

> For academic interest, how do I get the 'raw input stream'? Wasn't that
> what my second version did?
> 
>  > my $fh = $blast_report->_fh;
>  > seek($fh, 0, 0);

That should work, yes.  Didn't see that one your previous response.  I can
get it work w/o problems with SearchIO directly but I haven't tried it with
StandAloneBlast.  Below is my script.  Commenting the seek line below
doesn't move the file pointer so the second round of parsing won't happen.

my $parser = Bio::SearchIO->new(  -file => shift,
                                  -format => 'blast');

my $fh = $parser->_fh;

while (<$fh>) {
     print;
}

seek($fh, 0,0);

$fh = $parser->fh;

print "Second round:\n";
while (<$fh>) {
    while (my $hit = $_->next_hit) {
        print $hit->accession,"\n";
    }
}


Chris


From hubert.prielinger at gmx.at  Mon Jun  5 19:12:37 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 13:12:37 -0600
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <006101c688ce$7185c330$15327e82@pyrimidine>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
Message-ID: <44848225.8080003@gmx.at>

hi chris,
sorry, I have tried it with the latest CVS version:

# $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $

but it still doesn't work.

Hubert

Chris Fields wrote:
> Hubert, 
>
> Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
> option to save XML was committed relatively recently (last month or so).
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Monday, June 05, 2006 1:18 PM
>> To: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> you were right, removing the composition-based statistics solved the
>> problem. Now I get the result viewed on STDIN, but it doesn't save the
>> output in the file.
>> I haved tried it by reopening the file and writing it to an other file
>> again, but it doesn't work.....
>> The strange thing is that if I retrieve text instead of xml output it
>> works without any problem. Don't know why
>>
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi chris,
>>>> thanks but I never intended to run the remoteblast with so much,
>>>> only a few of them, acutally I goal is to run the phiblast with
>>>> regular expression, so that i just don't need that
>>>> file anymore
>>>>
>>>>         
>>> Not a problem.  Just to let you know, I did manage to get the script
>>> working, so I'm marking the bug INVALID.  I think the problem isn't
>>> that there is an infinite loop so much as setting composition-based
>>> statistics causes the search to take much much longer; try removing
>>> that line to see what I mean.
>>>
>>> Just so you know, using $result->query_name doesn't get you what you
>>> would expect (it gives you a part of the RID, which you don't want;
>>> this is something in the XML output that is beyond our control).  You
>>> might want to change it to something else or you'll get filenames
>>> with numerical names.
>>>
>>>
>>>       
>>>> another question for parsing the xml output....is there a xml
>>>> parser available for blast xml output or how to start.....
>>>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
>>>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>>>> is their maybe another introduction or an example.
>>>>
>>>>         
>>> Bio::SearchIO objects are used to parse BLAST XML output if you have
>>> it saved to a file.  For instance:
>>>
>>> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>>>
>>> while (my $result = $factory->next_result) {
>>>    while (my $hit = $result->next_hit) {
>>>       while (my $hsp = $hit->next_hsp {
>>>          #do stuff here
>>>        }
>>>     }
>>> }
>>>
>>> The only thing that changes in parsing a text BLAST report from an
>>> XML BLAST report is the -format line (similar to the -readmethod
>>> parameter in RemoteBlast).  You shouldn't need to look up any more
>>> documentation other than these on the wiki:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>>>
>>> Pay attention to the fact you'll need to install XML::SAX (CPAN) and
>>> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
>>> up parsing.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> thanks
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Yes, I see the same error you do.  But I have a similar script
>>>>> (blastp, XML blast report, XML parsing, similar loop structure)
>>>>> that  works fine.  I'm trying to dissect the problem but I think
>>>>> it may be  something logically wrong here (something not so
>>>>> obvious) and not a  bug...
>>>>>
>>>>> What I'm trying to say is, when you send sequences using
>>>>> remoteblast  like, this you are essentially spamming the NCBI
>>>>> BLAST server with  ~1600 requests.  This script wasn't set up with
>>>>> that intent in mind;  you should really try to set up your own
>>>>> local blast database if  possible.  If you can't, try running this
>>>>> script in off-hours  (10pm-6am EST or something like that).
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> input database: swissprot
>>>>>>         matrix: pam30
>>>>>>         count: 1
>>>>>>         gapcosts: 9 1
>>>>>>
>>>>>> I know that there are  a lot of sequences, but that doesn't
>>>>>> matter,  you can delete all of them except one, the amount of the
>>>>>> sequences  is not the problem, the script reads one line and
>>>>>> submits  it.....then the second line and so on.....I have tried
>>>>>> it with only  one sequence either and I got the same result....
>>>>>> the script run at  that time for more than 20
>>>>>> minutes!!!!!! .....and that should be  enough time to retrieve
>>>>>> the results for ONE sequence, I guess
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> You need to add the input conditions as well (you have several
>>>>>>> <STDIN> lines which may play a role; I would like to know what
>>>>>>> you  normally enter for those).
>>>>>>>
>>>>>>> How long did you let the script run?  I ran a quick check on
>>>>>>> your  sequences; you have almost 1600, so you have to expect
>>>>>>> that you'll  run into some problems here!  Most here (including
>>>>>>> me) would  suggest you try installing a local blast setup for
>>>>>>> something like  this.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> hi,
>>>>>>>> I have submitted the bug -> Bug 2017
>>>>>>>> with the script and input file, just start it from command line
>>>>>>>>
>>>>>>>> thank you very much
>>>>>>>> greetings
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Hubert,
>>>>>>>>>
>>>>>>>>> I have a script that's using blastxml and XML output which
>>>>>>>>> seems  to work.
>>>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
>>>>>>>>>> 'Sendu  Bala'
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> sorry, but I have updated the remoteblast module and I have
>>>>>>>>>> run  several
>>>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>>>> I didn't get any results.
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Chris Fields wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Sendu, Hubert,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
>>>>>>>>>>> the  problem
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> (break
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
>>>>>>>>>>> RemoteBlast in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> CVS;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
>>>>>>>>>>> CVS  to see if
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>>>
>>>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>>>> hi,
>>>>>>>>>>>>> I have the following program and it worked quite well,
>>>>>>>>>>>>> for  retrieving
>>>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>>>> now I have altered it to to xml, and it didn't work
>>>>>>>>>>>>> anymore.....
>>>>>>>>>>>>> it takes all the parameter at the commandline, submits
>>>>>>>>>>>>> the  query, but
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>>>
>>>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
>>>>>>>>>>>>> over..... it
>>>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> There is no problem with your code. The problem is with
>>>>>>>>>>>> the  NCBI server
>>>>>>>>>>>> and should be reported to them. You can visit the site and
>>>>>>>>>>>> do  a blast,
>>>>>>>>>>>> requesting xml format, and you will typically get one
>>>>>>>>>>>> normal  'waiting'
>>>>>>>>>>>> message and the promise that it will be updated in x
>>>>>>>>>>>> seconds,  but
>>>>>>>>>>>> subsequent attempts to get progress information result in
>>>>>>>>>>>> an  xml error
>>>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the way that the bioperl code is written, it
>>>>>>>>>>>> treats no
>>>>>>>>>>>> data as 'waiting' instead of an error. I've offered a
>>>>>>>>>>>> patch  to fix this
>>>>>>>>>>>> at this bug page:
>>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher
>>>>>>> Lab of Dr. Robert Switzer
>>>>>>> Dept of Biochemistry
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From sb at mrc-dunn.cam.ac.uk  Mon Jun  5 19:14:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 05 Jun 2006 20:14:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <006201c688d1$bad2aff0$15327e82@pyrimidine>
References: <006201c688d1$bad2aff0$15327e82@pyrimidine>
Message-ID: <44848280.1080703@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>>> If you want flexibility or added functionality then you can 
>>> always contribute a patch, such as adding an option for 
>>> filehandles, IO::String, pipes/forks, or whatever you wish.
>> 
>> Well, it wouldn't be a new feature per se, but just changing the 
>> way the modules work under the hood.
> 
> ...
> 
>> I use IPC::Open3 for blasts and have never run into problems, but 
>> it pretty much falls into the 'apt to cause deadlock' camp. It may
>> pass tests on one machine but fail on others... is there any point
>> in working up a patch (would something of questionable reliability
>> ever be committed into bioperl)?
> 
> The main thing you should avoid is major API changes or issues which
> break this module on other OS's.  I'm not sure that StandAloneBlast
> is 'broken' by using a tempfile as the location of the BLAST report.
> 
> 
> 
> Any way you go about it, you'll have to capture the BLAST output as a
> stream and get it to persist in a SearchIO object somehow.  It's can
> be a pretty decent memory hit to keep that report hanging around, 
> esp. if it is larger.

Well at the moment StandAloneBlast runs the blast program and stores its
output to a temp file, then gives the temp file name as an arg to
SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
of the blast program directly to my parser. The question is, why wasn't
this done in StandAloneBlast? I would get the blast program output
handle and pass it directly to SearchIO with the -fh option of new().
The only difference here is it's faster and more efficient with the
direct pipe, but you can't subsequently seek the SearchIO's internal
filehandle (as we discussing in this thread). There are no (additional)
issues with memory.

If it isn't done using IPC::Open3 (or similar) because the original
author already knew it wouldn't be reliable enough, or for some other
reason(s), fine. Does anyone know the reasons?


From cjfields at uiuc.edu  Mon Jun  5 19:43:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 14:43:50 -0500
Subject: [Bioperl-l] StandAloneBlast
In-Reply-To: <44848280.1080703@mrc-dunn.cam.ac.uk>
Message-ID: <006301c688d8$5e4ce910$15327e82@pyrimidine>

> Well at the moment StandAloneBlast runs the blast program and stores its
> output to a temp file, then gives the temp file name as an arg to
> SearchIO. I (not using bioperl) would use IPC::Open3 to send the output
> of the blast program directly to my parser. The question is, why wasn't
> this done in StandAloneBlast? 

Probably for the reasons you outlined before:

'I use IPC::Open3 for blasts and have never run into problems, but it 
pretty much falls into the 'apt to cause deadlock' camp. It may pass 
tests on one machine but fail on others... '

Why would we take a chance on using something that works on one OS/machine
and fails to work on another?  

> I would get the blast program output handle and pass it directly to 
> SearchIO with the -fh option of new().
> The only difference here is it's faster and more efficient with the
> direct pipe, but you can't subsequently seek the SearchIO's internal
> filehandle (as we discussing in this thread). There are no (additional)
> issues with memory.

Like I said before, you can make changes and submit a patch.  The code here
is over five years old, and many many things have changed since then, so you
might find something works now which wasn't available or didn't work then.
It hasn't really been a priority (it certainly hasn't been mine).  Most
people don't care b/c it just works and a vast majority don't worry/care
about the internals.  

The issue at hand is whether any code changes will work on all OS's, not
just yours.  BioPerl is used the world over on just about every OS, so ANY
code changes need to take that into consideration.  I can guarantee that if
you made changes that break or reduce performance on 50% of the OS's, it'll
likely get rolled back.  You need the best cross-platform compatibility
possible.

We've now veered WAY off topic here.  If we intend on continuing this, we
need to switch the thread topic.

Chris

> If it isn't done using IPC::Open3 (or similar) because the original
> author already knew it wouldn't be reliable enough, or for some other
> reason(s), fine. Does anyone know the reasons?


From cjfields at uiuc.edu  Mon Jun  5 20:30:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 15:30:01 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
Message-ID: <006401c688de$d38035b0$15327e82@pyrimidine>

I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
finished yet (BioSQL and Bioperl-guts isn't done yet) and there are probably
some mangld worsd in there so have mercy on me!  It's been a busy month.

http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006

Fling your mud and abuses by responding to this thread per usual

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Tue Jun  6 03:42:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Jun 2006 22:42:18 -0500
Subject: [Bioperl-l] remoteblast xml problem
In-Reply-To: <44848225.8080003@gmx.at>
References: <006101c688ce$7185c330$15327e82@pyrimidine>
	<44848225.8080003@gmx.at>
Message-ID: <D7A85F26-1ADD-446E-A5F3-8C3420746364@uiuc.edu>

Hubert,

I had no trouble getting this to work; the script scans through each  
sequence and save the XML output to a file on both Windows and Mac OS  
X, both using bioperl-live.  The older RemoteBlast would only save  
text; otherwise it saved an empty file.  Using your script I get  
several XML BLAST output files (1.xml, 2.xml, etc) based on a  
counter, each about 1 MB.  All were parseable by SearchIO.

I did notice that if certain parameters weren't entered in correctly  
then you will get no data (such as setting the database to 'swiss'  
instead of 'swissprot').  A warning pops up stating that no data was  
returned when this occurs (it doesn't tell you what was wrong, just  
that no data came back from NCBI).  If you see this then that is  
likely the problem.  Besides that, I don't know what else it can be.

Chris

On Jun 5, 2006, at 2:12 PM, Hubert Prielinger wrote:

> hi chris,
> sorry, I have tried it with the latest CVS version:
>
> # $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $
>
> but it still doesn't work.
>
> Hubert
>
> Chris Fields wrote:
>> Hubert,
>>
>> Make sure you have the latest Bio::Tools::Run::RemoteBlast from  
>> CVS.  The
>> option to save XML was committed relatively recently (last month  
>> or so).
>>
>> Chris
>>

...

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From heikki at sanbi.ac.za  Tue Jun  6 07:40:06 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 09:40:06 +0200
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <200606060940.07285.heikki@sanbi.ac.za>

Chris,

I am mystified. I'll try to get the massive 'return undef' change done first 
and the have an other look.

	-Heikki

On Friday 02 June 2006 21:13, Chris Fields wrote:
> Heikki,
>
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
>
> Not sure what's going on there but using read_aln and write_aln seem to
> work normally.  It may have something to do with Bio::SimpleAlign but I'm
> not absolutely sure.
>
> Any ideas what may be going on here?
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Tue Jun  6 08:04:00 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 6 Jun 2006 10:04:00 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606020952.08034.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
Message-ID: <200606061004.01193.heikki@sanbi.ac.za>


OK. I've gone through all cases where return and undef are on the same lines.
I've done changes in 185 files.

My aims have ben the following:

1. Remove undef from return undef when not necessary.
	This will make it easier to spot cases where undef matters in the future
	Most of the changes fall into this category. The context is clearly scalar.

2. Returning undef when user expects en empty list is bad

./Bio/Tools/Est2Genome.pm fixed
./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
                               not fixed
./Bio/Matrix/PSM/SiteMatrix.pm  fixed
./Bio/Matrix/PSM/Psm  fixed
./Bio/DB/Taxonomy::entrez.pm fixed

3. If docs say method returns nothing, explicit undef is not the right thing 
to return

4. do not return an explicit undef if the method is supposed to return false 
on failure


Before I do the commit, I'd like to see number people to do 'make test' on 
bioperl-live and report back after the commit they see changes. There are 
quite a few tests that fail currently.

I'll do the commit tomorrow Wednesday at 9 o'cock GMT.

	-Heikki


On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> I've started going through the files that have 'return undef' lines.
> I'll report back later.
>
> Initial impression is that there are a few cases where the context
> indicates list to be returned but failure returns an explicit undef. I'll
> fix those.
>
> Most of the cases are much more ambiguous. Even when documentation says the
> failure returns undef, it is clearly meant to mean false. In most cases
> documentation does not comment on return value at all. Luckily the context
> is almost always scalar and therefore it does not matter too much.
>
> I seem to be changing 'return undef' to plain 'return' a bit overzealously,
> so do not take it personally.
>
> 	-Heikki
>
> On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > ....
> >
> > > > Again, didn't do that.
> > >
> > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > certainly not directed at your recent changes to Bio::Restriction::IO.
> > > In fact, I put in the above * comment to exclude your changes from my
> > > discussion; you changed the docs because the code never did what they
> > > said they did (the docs were bad). That's fine (good!). My comments
> > > were a general point, slightly directed at the idea of changing all the
> > > return undef;s - changing the code so that it no longer matches the
> > > docs of a previously working method. That's what I think is bad. Though
> > > in this particular case it shouldn't make any difference at all.
> >
> > Agreed.  In any case, if tests have been properly set up then they should
> > catch problems.  This is, of course, if they are properly set up.
> >
> > Chris
> >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 09:17:48 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 10:17:48 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine>
References: <000301c68678$a3cdaa40$15327e82@pyrimidine>
Message-ID: <4485483C.4080505@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Heikki,
> 
> I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up
> when running AlignIO.t (I was fixing bug 2000):
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> 
> Not sure what's going on there but using read_aln and write_aln seem to work
> normally.  It may have something to do with Bio::SimpleAlign but I'm not
> absolutely sure.
> 
> Any ideas what may be going on here?

Yes, see my replies on the bug page. But so more people see the 
question, I'll ask here: can anyone offer examples of metafasta files, 
especially multiple alignments?


From cjfields at uiuc.edu  Tue Jun  6 14:30:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:30:17 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <4485483C.4080505@mrc-dunn.cam.ac.uk>
Message-ID: <000901c68975$bb9968d0$15327e82@pyrimidine>

Sendu,

This is Heikki's original submission for the specs for meta format:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
sta

So it's really a specialized FASTA format used to store meta information
about sequences.  Seems mainly useful for amino acid sequences, but is
extended to include properties of nucleotides like DNA content, RNA sec.
structure, and so on.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Tuesday, June 06, 2006 4:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> work
> > normally.  It may have something to do with Bio::SimpleAlign but I'm not
> > absolutely sure.
> >
> > Any ideas what may be going on here?
> 
> Yes, see my replies on the bug page. But so more people see the
> question, I'll ask here: can anyone offer examples of metafasta files,
> especially multiple alignments?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun  6 14:36:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 09:36:16 -0500
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <200606060940.07285.heikki@sanbi.ac.za>
Message-ID: <000a01c68976$9479e300$15327e82@pyrimidine>

Heikki,

I agree it's all a bit weird.  Not too concerning at the moment though since
it works at the moment but it might take some tinkering with SimpleAlign to
get it to behave.

This alignment format has some of the same characteristics as Stockholm
alignment format but looks easier to work with.  I work with RNA,
specifically one with a conserved secondary structure so this format appeals
to me quite a bit.  If I get time (probably not for a while) I may tinker
with Bio::AlignIO::stockholm to get a write_aln() method up-and-running and
see if I can convert back-and-forth from the two.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Tuesday, June 06, 2006 2:40 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests
> 
> Chris,
> 
> I am mystified. I'll try to get the massive 'return undef' change done
> first
> and the have an other look.
> 
> 	-Heikki
> 
> On Friday 02 June 2006 21:13, Chris Fields wrote:
> > Heikki,
> >
> > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped
> up
> > when running AlignIO.t (I was fixing bug 2000):
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2016
> >
> > Not sure what's going on there but using read_aln and write_aln seem to
> > work normally.  It may have something to do with Bio::SimpleAlign but
> I'm
> > not absolutely sure.
> >
> > Any ideas what may be going on here?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Tue Jun  6 15:40:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 06 Jun 2006 16:40:05 +0100
Subject: [Bioperl-l] Bio::AlignIO::metafasta tests
In-Reply-To: <000901c68975$bb9968d0$15327e82@pyrimidine>
References: <000901c68975$bb9968d0$15327e82@pyrimidine>
Message-ID: <4485A1D5.5090805@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu,
> 
> This is Heikki's original submission for the specs for meta format:
> 
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa
> sta
> 
> So it's really a specialized FASTA format used to store meta information
> about sequences.  Seems mainly useful for amino acid sequences, but is
> extended to include properties of nucleotides like DNA content, RNA sec.
> structure, and so on.  

Thanks. It's not really clear to me if the meta data needs to be 
considered in the context of an alignment. That is, if you have two meta 
sequences with the same primary sequence, will all their meta data 
necessarily be the same? Or could they be different?

If the same, then the test data and test need to be fixed so my patched 
version of Bio::AlignIO::metafasta passes the tests.

If different, how should the meta data be handled? Like the test implies 
with its expected value for the consensus (just treat the primary 
sequence and all meta data as one long string)?
Is it really the intent to include characters from the meta data names 
when considering what symbols we've seen with symbol_chars() method?
Do we include the meta data name symbols when numbering?

Thoughts anyone?


From cjfields at uiuc.edu  Tue Jun  6 21:07:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 16:07:39 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <006401c688de$d38035b0$15327e82@pyrimidine>
Message-ID: <000601c689ad$3e6aec20$15327e82@pyrimidine>

I hate talking to myself...

I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
(appropriately enough, on 6-6-06).  I am trying out a new script which helps
with all the developer list noise; hope everybody likes it.

Cheers,

Chris   

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Monday, June 05, 2006 3:30 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] ListSummaries for May 10-31.
> 
> I have posted the ListSummaries for May 10-31 on the wiki.  I haven't
> finished yet (BioSQL and Bioperl-guts isn't done yet) and there are
> probably
> some mangld worsd in there so have mercy on me!  It's been a busy month.
> 
> http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006
> 
> Fling your mud and abuses by responding to this thread per usual
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun  7 00:41:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Jun 2006 19:41:08 -0500
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <44861D47.7090205@infotech.monash.edu.au>
Message-ID: <000601c689cb$11f568a0$15327e82@pyrimidine>

I could do something like that.  Right now I have a script that just grabs
the text from the web page:

http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html

and uses regexes and hashes to sort everything and make some sense of the
noise.  The resolution for a bug isn't on that page but in the linked
message so I would need to grab the link from HTML, go to that page, then
get the resolution if there is one, so at the moment I just check each one
(thanks for the bug hunt Jason!).  I usually have to do a little touching up
afterwards, such as fix links and such, but the script really saves on time.
As you can tell, it's been a busy month!

I'm (very slowly) updating the script to go through the mail list threads
recursively but haven't really gotten anywhere with that yet.  Benchwork has
intervened yet again!

Chris

> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Tuesday, June 06, 2006 7:27 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
> 
> > I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> > (appropriately enough, on 6-6-06).  I am trying out a new script which
> helps
> > with all the developer list noise; hope everybody likes it.
> 
> I like the CVS summaries.
> 
> For the bug summaries, would it make sense to categorise/sort by
> category/status eg. RESOLVED, WORKSFORME etc?
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Wed Jun  7 00:26:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 07 Jun 2006 10:26:47 +1000
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689ad$3e6aec20$15327e82@pyrimidine>
References: <000601c689ad$3e6aec20$15327e82@pyrimidine>
Message-ID: <44861D47.7090205@infotech.monash.edu.au>

> I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l
> (appropriately enough, on 6-6-06).  I am trying out a new script which helps
> with all the developer list noise; hope everybody likes it.

I like the CVS summaries.

For the bug summaries, would it make sense to categorise/sort by 
category/status eg. RESOLVED, WORKSFORME etc?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jason at bioperl.org  Wed Jun  7 04:04:02 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Jun 2006 00:04:02 -0400
Subject: [Bioperl-l] ListSummaries for May 10-31.
In-Reply-To: <000601c689cb$11f568a0$15327e82@pyrimidine>
References: <000601c689cb$11f568a0$15327e82@pyrimidine>
Message-ID: <8D9B514C-ADB4-409F-A55F-DC0C3DA9354A@bioperl.org>

It is possible some of this can be extracted from the bugzilla as a  
query (all the changes from X to Y) and generate RSS or text that can  
be processed.

-jason
On Jun 6, 2006, at 8:41 PM, Chris Fields wrote:

> I could do something like that.  Right now I have a script that  
> just grabs
> the text from the web page:
>
> http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html
>
> and uses regexes and hashes to sort everything and make some sense  
> of the
> noise.  The resolution for a bug isn't on that page but in the linked
> message so I would need to grab the link from HTML, go to that  
> page, then
> get the resolution if there is one, so at the moment I just check  
> each one
> (thanks for the bug hunt Jason!).  I usually have to do a little  
> touching up
> afterwards, such as fix links and such, but the script really saves  
> on time.
> As you can tell, it's been a busy month!
>
> I'm (very slowly) updating the script to go through the mail list  
> threads
> recursively but haven't really gotten anywhere with that yet.   
> Benchwork has
> intervened yet again!
>
> Chris
>
>> -----Original Message-----
>> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
>> Sent: Tuesday, June 06, 2006 7:27 PM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] ListSummaries for May 10-31.
>>
>>> I have updated the ListSummaries to include BioSQL-l and Bioperl- 
>>> guts-l
>>> (appropriately enough, on 6-6-06).  I am trying out a new script  
>>> which
>> helps
>>> with all the developer list noise; hope everybody likes it.
>>
>> I like the CVS summaries.
>>
>> For the bug summaries, would it make sense to categorise/sort by
>> category/status eg. RESOLVED, WORKSFORME etc?
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From heikki at sanbi.ac.za  Wed Jun  7 09:57:47 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Jun 2006 11:57:47 +0200
Subject: [Bioperl-l] For CVS developers
	-potentialpitfallwith"returnundef"
In-Reply-To: <200606061004.01193.heikki@sanbi.ac.za>
References: <001201c685a3$59d78da0$15327e82@pyrimidine>
	<200606020952.08034.heikki@sanbi.ac.za>
	<200606061004.01193.heikki@sanbi.ac.za>
Message-ID: <200606071157.47736.heikki@sanbi.ac.za>

Committed.

Please report any surprising changes in functionality to the list.

	-Heikki

On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> OK. I've gone through all cases where return and undef are on the same
> lines. I've done changes in 185 files.
>
> My aims have ben the following:
>
> 1. Remove undef from return undef when not necessary.
> 	This will make it easier to spot cases where undef matters in the future
> 	Most of the changes fall into this category. The context is clearly
> scalar.
>
> 2. Returning undef when user expects en empty list is bad
>
> ./Bio/Tools/Est2Genome.pm fixed
> ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
>                                not fixed
> ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> ./Bio/Matrix/PSM/Psm  fixed
> ./Bio/DB/Taxonomy::entrez.pm fixed
>
> 3. If docs say method returns nothing, explicit undef is not the right
> thing to return
>
> 4. do not return an explicit undef if the method is supposed to return
> false on failure
>
>
> Before I do the commit, I'd like to see number people to do 'make test' on
> bioperl-live and report back after the commit they see changes. There are
> quite a few tests that fail currently.
>
> I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
>
> 	-Heikki
>
> On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > I've started going through the files that have 'return undef' lines.
> > I'll report back later.
> >
> > Initial impression is that there are a few cases where the context
> > indicates list to be returned but failure returns an explicit undef. I'll
> > fix those.
> >
> > Most of the cases are much more ambiguous. Even when documentation says
> > the failure returns undef, it is clearly meant to mean false. In most
> > cases documentation does not comment on return value at all. Luckily the
> > context is almost always scalar and therefore it does not matter too
> > much.
> >
> > I seem to be changing 'return undef' to plain 'return' a bit
> > overzealously, so do not take it personally.
> >
> > 	-Heikki
> >
> > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > ....
> > >
> > > > > Again, didn't do that.
> > > >
> > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > certainly not directed at your recent changes to
> > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > exclude your changes from my discussion; you changed the docs because
> > > > the code never did what they said they did (the docs were bad).
> > > > That's fine (good!). My comments were a general point, slightly
> > > > directed at the idea of changing all the return undef;s - changing
> > > > the code so that it no longer matches the docs of a previously
> > > > working method. That's what I think is bad. Though in this particular
> > > > case it shouldn't make any difference at all.
> > >
> > > Agreed.  In any case, if tests have been properly set up then they
> > > should catch problems.  This is, of course, if they are properly set
> > > up.
> > >
> > > Chris
> > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Michael.Muratet at operon.com  Tue Jun  6 18:34:38 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Tue, 6 Jun 2006 13:34:38 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>

Greetings

I am trying to install bioperl-db in preparation for installing a biosql database. I'm running on a Dell PowerEdge with quad dual-core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl 1.5.1.  I have installed mysql v5.0.21 from source with --with-innodb set for the configuration. I installed bioperl-db from cvs. I have the latest DBI and DBD:mysql installed a few weeks ago from CPAN. The installation has been working well with perl otherwise, for example, the Ensembl core API works OK. SHOW ENGINES indicates that innodb is enabled.  I have attached a snippet from the top of the output below. I searched the web and the bioperl-db list and haven't found anything that appears to be relevant. I've done several of these installs and they've pretty much completed without a single glitch. Does anyone have any ideas how to isolate the problem?

Thanks

Mike

[mmuratet at HSV-PROBE bioperl-db]$ make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/01dbadaptor.....ok 14/19
------------- EXCEPTION  -------------
MSG: failed to open connection: Transactions not supported by database
STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1477
STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm:518
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
STACK toplevel t/01dbadaptor.t:62


From hlapp at gmx.net  Wed Jun  7 12:52:22 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 08:52:22 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local>
Message-ID: <4F23D2EA-2218-4023-A3F6-3284912952BE@gmx.net>

Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From nlhepler at umd.edu  Wed Jun  7 13:46:32 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 09:46:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
Message-ID: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>

Hello,

I am having some difficulty here.  I have a list of accessions, which  
are the parameters for a get_Stream_by_acc() function on a  
Bio::DB::GenBank object.  None of the returned GenBank information  
for any of my accessions seems to contain variation data, no matter  
how I try to coax it out with unflattener and typemapper.  This data  
is, however, available via the web interface of NCBI Nucleotide, as  
an optional feature (SNP).  I was wondering if there was some option  
I'm missing in the initialization of the Bio::DB::GenBank object (no  
options currently) that will coax the database into giving me this  
data?  Or something else that I'm missing altogether.  The organism  
of interest is human, taxon:9606.

Nicolaus Lance Hepler
nlhepler at mail dot umd dot edu


From cjfields at uiuc.edu  Wed Jun  7 13:56:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 08:56:16 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606071157.47736.heikki@sanbi.ac.za>
Message-ID: <000601c68a3a$265552a0$15327e82@pyrimidine>

Yikes!  I'll download a tarball from anon CVS and run a comparison (vs my
pre-updated bioperl-live) on WinXP and Mac OS X 10.4 (Intel) and report back
success/fail; may be a bit.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> 	-Heikki
> 
> On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote:
> > OK. I've gone through all cases where return and undef are on the same
> > lines. I've done changes in 185 files.
> >
> > My aims have ben the following:
> >
> > 1. Remove undef from return undef when not necessary.
> > 	This will make it easier to spot cases where undef matters in the
> future
> > 	Most of the changes fall into this category. The context is clearly
> > scalar.
> >
> > 2. Returning undef when user expects en empty list is bad
> >
> > ./Bio/Tools/Est2Genome.pm fixed
> > ./Bio/SearchIO/blast.pm:2330:  should return (undef, undef) for clarity?
> >                                not fixed
> > ./Bio/Matrix/PSM/SiteMatrix.pm  fixed
> > ./Bio/Matrix/PSM/Psm  fixed
> > ./Bio/DB/Taxonomy::entrez.pm fixed
> >
> > 3. If docs say method returns nothing, explicit undef is not the right
> > thing to return
> >
> > 4. do not return an explicit undef if the method is supposed to return
> > false on failure
> >
> >
> > Before I do the commit, I'd like to see number people to do 'make test'
> on
> > bioperl-live and report back after the commit they see changes. There
> are
> > quite a few tests that fail currently.
> >
> > I'll do the commit tomorrow Wednesday at 9 o'cock GMT.
> >
> > 	-Heikki
> >
> > On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote:
> > > I've started going through the files that have 'return undef' lines.
> > > I'll report back later.
> > >
> > > Initial impression is that there are a few cases where the context
> > > indicates list to be returned but failure returns an explicit undef.
> I'll
> > > fix those.
> > >
> > > Most of the cases are much more ambiguous. Even when documentation
> says
> > > the failure returns undef, it is clearly meant to mean false. In most
> > > cases documentation does not comment on return value at all. Luckily
> the
> > > context is almost always scalar and therefore it does not matter too
> > > much.
> > >
> > > I seem to be changing 'return undef' to plain 'return' a bit
> > > overzealously, so do not take it personally.
> > >
> > > 	-Heikki
> > >
> > > On Thursday 01 June 2006 19:46, Chris Fields wrote:
> > > > ....
> > > >
> > > > > > Again, didn't do that.
> > > > >
> > > > > I'm very sorry that I allowed the ambiguity, but my comments were
> > > > > certainly not directed at your recent changes to
> > > > > Bio::Restriction::IO. In fact, I put in the above * comment to
> > > > > exclude your changes from my discussion; you changed the docs
> because
> > > > > the code never did what they said they did (the docs were bad).
> > > > > That's fine (good!). My comments were a general point, slightly
> > > > > directed at the idea of changing all the return undef;s - changing
> > > > > the code so that it no longer matches the docs of a previously
> > > > > working method. That's what I think is bad. Though in this
> particular
> > > > > case it shouldn't make any difference at all.
> > > >
> > > > Agreed.  In any case, if tests have been properly set up then they
> > > > should catch problems.  This is, of course, if they are properly set
> > > > up.
> > > >
> > > > Chris
> > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed Jun  7 15:42:32 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 07 Jun 2006 11:42:32 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu>
Message-ID: <C0AC6C28.8C12%osborne1@optonline.net>

Nicolaus,

The short answer is no, there's no option that will omit or add a particular
feature or annotation to the Sequence object returned by Bio::DB::GenBank.
Can you give some example accessions?

Brian O.


On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:

> Hello,
> 
> I am having some difficulty here.  I have a list of accessions, which
> are the parameters for a get_Stream_by_acc() function on a
> Bio::DB::GenBank object.  None of the returned GenBank information
> for any of my accessions seems to contain variation data, no matter
> how I try to coax it out with unflattener and typemapper.  This data
> is, however, available via the web interface of NCBI Nucleotide, as
> an optional feature (SNP).  I was wondering if there was some option
> I'm missing in the initialization of the Bio::DB::GenBank object (no
> options currently) that will coax the database into giving me this
> data?  Or something else that I'm missing altogether.  The organism
> of interest is human, taxon:9606.
> 
> Nicolaus Lance Hepler
> nlhepler at mail dot umd dot edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nlhepler at umd.edu  Wed Jun  7 16:26:06 2006
From: nlhepler at umd.edu (Nicolaus Hepler)
Date: Wed, 7 Jun 2006 12:26:06 -0400
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <C0AC6C28.8C12%osborne1@optonline.net>
References: <C0AC6C28.8C12%osborne1@optonline.net>
Message-ID: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>

Brian,

A sample accession is BC000007.  I figured a way around it though.   
Rather than automate the whole process, I just downloaded from Batch  
Entrez a flat .gb file of all my accessions.  It's not flexible, and  
will be inconvenient when we expand the dataset, but it will provide  
me with data to work with for now.

Nicolaus

> Nicolaus,
>
> The short answer is no, there's no option that will omit or add a  
> particular
> feature or annotation to the Sequence object returned by  
> Bio::DB::GenBank.
> Can you give some example accessions?
>
> Brian O.
>
>
> On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
>
>> Hello,
>>
>> I am having some difficulty here.  I have a list of accessions, which
>> are the parameters for a get_Stream_by_acc() function on a
>> Bio::DB::GenBank object.  None of the returned GenBank information
>> for any of my accessions seems to contain variation data, no matter
>> how I try to coax it out with unflattener and typemapper.  This data
>> is, however, available via the web interface of NCBI Nucleotide, as
>> an optional feature (SNP).  I was wondering if there was some option
>> I'm missing in the initialization of the Bio::DB::GenBank object (no
>> options currently) that will coax the database into giving me this
>> data?  Or something else that I'm missing altogether.  The organism
>> of interest is human, taxon:9606.
>>
>> Nicolaus Lance Hepler
>> nlhepler at mail dot umd dot edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From lstein at cshl.edu  Wed Jun  7 16:50:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Jun 2006 12:50:24 -0400
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <4483F338.7090909@mrc-dunn.cam.ac.uk>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
Message-ID: <200606071250.25026.lstein@cshl.edu>

I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
because the CGI upload filehandle is not seekable (for good reasons that I 
won't inflict on you)! You'll have to write to a temporary file, or else read 
the whole sequence into memory. Sorry about this.

Lincoln

On Monday 05 June 2006 05:02, Sendu Bala wrote:
> Wijaya Edward wrote:
> > Dear Lincoln and experts
> >
> > Curently I have a CGI application that does this:
> >
> > 1.  read and uploaded file
> > 2. check the content of the file whether fasta or not
> > 3. print out the content of the file.
> >
> >
> > Now the problem I'm facing is that
> > on step three. The content of the file handled is altered
> > namely the very first line does not get printed.
>
> The problem is almost certainly that the guessing is done by reading the
> first line of the filehandle, so that your subsequent while loop on that
> same filehandle starts at the second line.
> Just seek the filehandle back to the start before trying to print the
> contents out.
>
> ..
> my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload );
> my $format_upload  = $guesser_upload->guess;
> seek($fh_upload, 0, 0);
> ..
> while (<$fh_upload>) {
>      ...
> }
>
> An alternative might be to pass GuessSeqFormat the filename in which
> case it would make its own filehandle and close it, leaving your own
> filehandle untouched.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From paul.boutros at utoronto.ca  Wed Jun  7 17:03:01 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 13:03:01 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
Message-ID: <1149699781.448706c5e803d@webmail.utoronto.ca>

Hi,

Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 and I had a few 
failures:

Failed Test         Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/Annotation.t                    89    2  79 88
t/Biblio.t                        24    1  2
t/LocusLink.t                     23    1  23
t/PhysicalMap.t                   14    2  11-12
t/RepeatMasker.t                   6    3  1-2 6
t/StandAloneBlast.t               18    4  19-22
t/TaxonTree.t                     17   30  11 18-42
t/alignUtilities.t                 9    1  9
t/psm.t              255 65280    48   35  29 32-48
t/tutorial.t                      21   15  7-21

Not sure if any of these are related to the "return undef" changes, or are known.  I also 
had some warnings running BioGraphics.t

t/BioGraphics................Use of uninitialized value in numeric lt (<) at Bio/Graphics/
FeatureFile.pm line 547, <GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 61.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 61.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, 
<GEN0> line 62.
Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, 
<GEN0> line 62.
t/BioGraphics................ok

I also ran the tests manually and below I've attached what came out (doesn't always agree 
with the results of make test, and in a few cases (e.g. tutorial.t or StandAloneBlast.t) 
there were no errors running the tests manually.
Paul

Annotation.t
============
not ok 8
# Test 8 got: '' (t/Annotation.t at line 59)
#   Expected: '0'

not ok 71
# Test 71 got: 'dumpster|test case|Ann:00001' (t/Annotation.t at line 187)
#    Expected: 'dumpster|test case|'

not ok 79
# Failed test 79 in t/Annotation.t at line 217

ok 85
Use of uninitialized value in concatenation (.) or string at /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annot
ationFactory.pm line 236.

------------- EXCEPTION  -------------
MSG: Bio::AnnotationI implementation Bio::Annotation:: failed to load:
------------- EXCEPTION  -------------
MSG: Failed to load module Bio::Annotation::. Can't locate Bio/Annotation/.pm in @INC 
(@INC contains: t /db2blast/Paul/perl5.8
.7/lib/5.8.7/aix /db2blast/Paul/perl5.8.7/lib/5.8.7 /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/aix /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7 /db2blast/Paul/perl5.8.7/lib/site_perl .) at /db2blast/Paul/perl5.8.7/
lib/site_perl/5.8.7/Bio/Root/Root.pm
 line 396.

STACK Bio::Root::Root::_load_module /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Root/
Root.pm:398
STACK (eval) /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Annotation/
AnnotationFactory.pm:149
STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:148
STACK toplevel t/Annotation.t:237
--------------------------------------

STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/
site_perl/5.8.7/Bio/Annotation/Annotation
Factory.pm:152
STACK toplevel t/Annotation.t:237
--------------------------------------


PhysicalMap.t
=============
not ok 11
# Test 11 got: <UNDEF> (t/PhysicalMap.t at line 55)
#    Expected: '0' (code holds and returns a string, definition requires a boolean)
not ok 12
# Test 12 got: '3' (t/PhysicalMap.t at line 56)
#    Expected: '1' (code holds and returns a string, definition requires a boolean)

TaxonTree.t
===========
ok 10
Use of uninitialized value in string eq at /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/
Bio/Taxonomy/Taxon.pm line 559.
not ok 11
# Test 11 got: <UNDEF> (t/TaxonTree.t at line 35)
#    Expected: 'species'
ok 12 # foo is not a rank, class variable @RANK not initialised
ok 13
ok 14
ok 15
ok 16
ok 17
ok 18
Can't use string ("this could be anything") as a HASH ref while "strict refs" in use at /
db2blast/Paul/perl5.8.7/lib/site_perl
/5.8.7/Bio/Taxonomy/Taxon.pm line 452.

alignUtilities.t
================
ok 6

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Replacing one sequence [n1/1-36]
---------------------------------------------------
ok 7
ok 8
not ok 9
# Test 9 got: '1' (t/alignUtilities.t at line 53)
#   Expected: '3'

RepeatMasker.t
==============
t/RepeatMasker...............FAILED tests 1-2, 6
        Failed 3/6 tests, 50.00% okay

StandAloneBlast.t
=================
t/StandAloneBlast............FAILED tests 19-22
        Failed 4/18 tests, 77.78% okay

psm.t
=====
t/Pseudowise.................ok
t/psm........................NOK 29Illegal division by zero at t/psm.t line 147, <GEN1> 
line 36.
t/psm........................dubious
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
        Failed 18/48 tests, 62.50% okay
t/QRNA.......................ok

tutorial.t
==========
t/tutorial...................ok 5/21
The following numeric arguments can be passed to run the corresponding demo-script.
1  => sequence_manipulations
2  => seqstats_and_seqwords
3  => restriction_and_sigcleave
4  => other_seq_utilities
5  => run_perl
6  => searchio_parsing
8  => hmmer_parsing
9  => simplealign
10 => gene_prediction_parsing
11 => access_remote_db
12 => index_local_db
13 => fetch_local_db    (NOTE: needs to be run with demo 12)
14 => sequence_annotation
15 => largeseqs
16 => liveseqs
17 => run_struct
18 => demo_variations
19 => demo_xml
20 => run_tree
21 => run_map
22 => run_remoteblast
23 => run_standaloneblast
24 => run_clustalw_tcoffee
25 => run_psw_bl2seq

In addition the argument "100" followed by the name of a single
bioperl object will display a list of all the public methods
available from that object and from what object they are inherited.

Using the parameter "0" will run all the tests that do not require
external programs (i.e. tests 1 to 22).
Using any other argument (or no argument) will run this display.

So typical command lines might be:
To run all core demo scripts:
 > perl -w  bptutorial.pl 0
or to just run the local indexing demos:
 > perl -w  bptutorial.pl 12 13
or to list all the methods available for object Bio::Tools::SeqStats -
 > perl -w  bptutorial.pl 100 Bio::Tools::SeqStats

t/tutorial...................FAILED tests 7-21
        Failed 15/21 tests, 28.57% okay

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Wednesday, June 07, 2006 4:58 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> Committed.
> 
> Please report any surprising changes in functionality to the list.
> 
> -Heikki
> 


From sb at mrc-dunn.cam.ac.uk  Wed Jun  7 16:54:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 07 Jun 2006 17:54:31 +0100
Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ?
In-Reply-To: <200606071250.25026.lstein@cshl.edu>
References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg>
	<4483F338.7090909@mrc-dunn.cam.ac.uk>
	<200606071250.25026.lstein@cshl.edu>
Message-ID: <448704C7.6080201@mrc-dunn.cam.ac.uk>

Lincoln Stein wrote:
> I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, 
> because the CGI upload filehandle is not seekable (for good reasons that I 
> won't inflict on you)! You'll have to write to a temporary file, or else read 
> the whole sequence into memory. Sorry about this.

The OP already had success with my alternative solution.


>> An alternative might be to pass GuessSeqFormat the filename in which
>> case it would make its own filehandle and close it, leaving your own
>> filehandle untouched.


From hlapp at gmx.net  Wed Jun  7 17:25:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Jun 2006 13:25:25 -0400
Subject: [Bioperl-l] bioperl-db failing tests
In-Reply-To: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
References: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>
Message-ID: <76434774-51A4-46E7-97AA-1E9227CB7771@gmx.net>

Hi Michael,

yes it looks like a problem in DBD if DBD::mysql fails to recognize  
that the mysql instance to which it is connected does support  
transactions. You can verify this by writing a simple script that  
tries to open a connection with
{ AutoCommit => 0 } as the parameter hash:

	use DBI;
	my $dbh = DBI->connect("dbi:mysql:database=<yourdb>;host=<yourhost>",
	                       "username","password",
	                       { AutoCommit => 0, RaiseError => 0 });
	die DBI::errstr unless $dbh;
	$dbh->disconnect;

If this succeeds fine then something in Biosql may be related to the  
problem, but otherwise not.

	-hilmar


On Jun 7, 2006, at 12:01 PM, Michael Muratet US-Huntsville wrote:

> Hilmar
>
> Pardon the top post.
>
> I tried the test below and it failed. So, I went back and redid the  
> Innodb configuration (deleted all the index files--they were empty  
> anyway, reinstalled biosql (which was empty,too) and restarted the  
> server. Now, the test below works. I went into the DBD-3.0003 and  
> did a distclean and reinstalled the package, but it fails the one  
> transaction test, too. So, it looks like the problem is in DBD, yes?
>
> We had a RAID 5 drive glitch the day before yesterday and rebuilt  
> it. That's the only thing that's changed that I know of that could  
> have caused the problem with ibxxx files.
>
> I have received a reply on the DBD list. Can you think of anything  
> else I should try from the biosql end?
>
> Thanks a million.
>
> Mike
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 07, 2006 7:52 AM
> To: Michael Muratet US-Huntsville
> Cc: Bioperl; BioSQL
> Subject: Re: [Bioperl-l] bioperl-db failing tests
>
>
> Hi Michael,
>
> Bioperl-db will open all connections with AutoCommit => 0 in the DBI
> parameter hash. The test you're stumbling over is actually there to
> test that the database  does support transactions, but apparently in
> 5.x versions MySQL no longer silently ignores the AutoCommit
> parameter if it doesn't support transactions (effectively preempting
> the test ...).
>
> Now you say that innodb shows as enabled - i.e., you can confirm that
> you changed the Mysql configuration parameter that designates the
> directory for innodb to store its files?
>
> You can confirm that transactions are supported by simple tests on
> the sql level. Open a mysql shell and do the following:
>
> 	-- BTW 'start transaction;' will (should) work too
> 	mysql> set autocommit = 0;
> 	mysql> insert into biodatabase (name) values ('__dummy__');
> 	mysql> select name from biodatabase where name = '__dummy__';
> 	mysql> rollback;
> 	mysql> select name from biodatabase where name = '__dummy__';
>
> The first SELECT query should return one and the last query should
> return zero rows if transactions are supported, and there shouldn't
> be any error.
>
> If the above succeeds (which I don't expect it to) then it looks like
> the DBD::mysql driver thinks the database doesn't support
> transactions when in reality it does. Let me know the result.
>
> 	-hilmar
>
> On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:
>
>> Greetings
>>
>> I am trying to install bioperl-db in preparation for installing a
>> biosql database. I'm running on a Dell PowerEdge with quad dual-
>> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl
>> 1.5.1.  I have installed mysql v5.0.21 from source with --with-
>> innodb set for the configuration. I installed bioperl-db from cvs.
>> I have the latest DBI and DBD:mysql installed a few weeks ago from
>> CPAN. The installation has been working well with perl otherwise,
>> for example, the Ensembl core API works OK. SHOW ENGINES indicates
>> that innodb is enabled.  I have attached a snippet from the top of
>> the output below. I searched the web and the bioperl-db list and
>> haven't found anything that appears to be relevant. I've done
>> several of these installs and they've pretty much completed without
>> a single glitch. Does anyone have any ideas how to isolate the
>> problem?
>>
>> Thanks
>>
>> Mike
>>
>> [mmuratet at HSV-PROBE bioperl-db]$ make test
>> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>> t/01dbadaptor.....ok 14/19
>> ------------- EXCEPTION  -------------
>> MSG: failed to open connection: Transactions not supported by  
>> database
>> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 255
>> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/
>> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: 
>> 215
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/
>> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/
>> BioSQL/BasePersistenceAdaptor.pm:1477
>> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/
>> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/
>> DB/BioSQL/BaseDriver.pm:518
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /
>> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/
>> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
>> STACK toplevel t/01dbadaptor.t:62
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 18:08:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 13:08:19 -0500
Subject: [Bioperl-l] GenBank Feature: variation
In-Reply-To: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu>
Message-ID: <001501c68a5d$5db655a0$15327e82@pyrimidine>

Nicolaus,

Bio::DB::GenBank use NCBI's efetch mainly; I implemented epost but it's a
hack at best and only works in certain circumstances.  So you could get the
sequence data directly but the links aren't included and are only given
through NCBI's elink.  There is no way I know of to get this information via
bioperl as there isn't an interface to NCBI's elink AFAIK (Brian?).  I'm
working on a rewrite for a general NCBI eutils interface for each tool
(efetch, epost, elink, etc), but it isn't working yet and probably won't be
ready to go until the end of summer-beginning of fall.

Just so you know how complex the situation is when using accessions, you
can't use a sequence accession directly when querying elink (and most
eutils), it has to be the GI number; I believe efetch is the only one that
accepts accessions.  So you would have to run esearch first using the
accessions as a query, grab the GI from the XML, run elink with the GI, grab
the SNP cluster ID, efetch the SNP data, and parse the data to get into
Bio::ClusterIO.  Fun, huh?  You would think NCBI would try making this a
little easier...

There used to be a way to parse dbSNP data using Bio::ClusterIO but the XML
schema changed so the parser is likely broken (the tests work but the file
is from the old schema).  I think Allen Day was in charge of it.

I used the eutils test interface () to grab the SNP cluster accessions for
your sequence using elink (note that the format is XML, which one  would
have to parse out to grab the cluster ID's):

<eLinkResult>
<LinkSet>
	<DbFrom>nucleotide</DbFrom>
	<IdList>
		<Id>33875090</Id>
	</IdList>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp</LinkName>
		<Link>
			<Id>4631</Id>
		</Link>
	</LinkSetDb>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp_genegenotype</LinkName>
		<Link>
			<Id>28362589</Id>
		</Link>
		<Link>
			<Id>4635949</Id>
		</Link>

		<Link>
			<Id>28362591</Id>
		</Link>
		<Link>
			<Id>11545838</Id>
		</Link>
		<Link>
			<Id>4246814</Id>

		</Link>
		<Link>
			<Id>28670911</Id>
		</Link>
		<Link>
			<Id>4073746</Id>
		</Link>
		<Link>

			<Id>9313754</Id>
		</Link>
		<Link>
			<Id>11545840</Id>
		</Link>
		<Link>
			<Id>17077806</Id>

		</Link>
		<Link>
			<Id>28362590</Id>
		</Link>
		<Link>
			<Id>4076327</Id>
		</Link>
		<Link>

			<Id>9834</Id>
		</Link>
		<Link>
			<Id>4073745</Id>
		</Link>
		<Link>
			<Id>6879874</Id>

		</Link>
	</LinkSetDb>
</LinkSet>
</eLinkResult>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nicolaus Hepler
> Sent: Wednesday, June 07, 2006 11:26 AM
> To: Brian Osborne; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] GenBank Feature: variation
> 
> Brian,
> 
> A sample accession is BC000007.  I figured a way around it though.
> Rather than automate the whole process, I just downloaded from Batch
> Entrez a flat .gb file of all my accessions.  It's not flexible, and
> will be inconvenient when we expand the dataset, but it will provide
> me with data to work with for now.
> 
> Nicolaus
> 
> > Nicolaus,
> >
> > The short answer is no, there's no option that will omit or add a
> > particular
> > feature or annotation to the Sequence object returned by
> > Bio::DB::GenBank.
> > Can you give some example accessions?
> >
> > Brian O.
> >
> >
> > On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
> >
> >> Hello,
> >>
> >> I am having some difficulty here.  I have a list of accessions, which
> >> are the parameters for a get_Stream_by_acc() function on a
> >> Bio::DB::GenBank object.  None of the returned GenBank information
> >> for any of my accessions seems to contain variation data, no matter
> >> how I try to coax it out with unflattener and typemapper.  This data
> >> is, however, available via the web interface of NCBI Nucleotide, as
> >> an optional feature (SNP).  I was wondering if there was some option
> >> I'm missing in the initialization of the Bio::DB::GenBank object (no
> >> options currently) that will coax the database into giving me this
> >> data?  Or something else that I'm missing altogether.  The organism
> >> of interest is human, taxon:9606.
> >>
> >> Nicolaus Lance Hepler
> >> nlhepler at mail dot umd dot edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Michael.Muratet at operon.com  Wed Jun  7 16:01:29 2006
From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville)
Date: Wed, 7 Jun 2006 11:01:29 -0500
Subject: [Bioperl-l] bioperl-db failing tests
Message-ID: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local>

Hilmar

Pardon the top post.

I tried the test below and it failed. So, I went back and redid the Innodb configuration (deleted all the index files--they were empty anyway, reinstalled biosql (which was empty,too) and restarted the server. Now, the test below works. I went into the DBD-3.0003 and did a distclean and reinstalled the package, but it fails the one transaction test, too. So, it looks like the problem is in DBD, yes?

We had a RAID 5 drive glitch the day before yesterday and rebuilt it. That's the only thing that's changed that I know of that could have caused the problem with ibxxx files. 

I have received a reply on the DBD list. Can you think of anything else I should try from the biosql end?

Thanks a million.

Mike

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net]
Sent: Wednesday, June 07, 2006 7:52 AM
To: Michael Muratet US-Huntsville
Cc: Bioperl; BioSQL
Subject: Re: [Bioperl-l] bioperl-db failing tests


Hi Michael,

Bioperl-db will open all connections with AutoCommit => 0 in the DBI  
parameter hash. The test you're stumbling over is actually there to  
test that the database  does support transactions, but apparently in  
5.x versions MySQL no longer silently ignores the AutoCommit  
parameter if it doesn't support transactions (effectively preempting  
the test ...).

Now you say that innodb shows as enabled - i.e., you can confirm that  
you changed the Mysql configuration parameter that designates the  
directory for innodb to store its files?

You can confirm that transactions are supported by simple tests on  
the sql level. Open a mysql shell and do the following:

	-- BTW 'start transaction;' will (should) work too
	mysql> set autocommit = 0;
	mysql> insert into biodatabase (name) values ('__dummy__');
	mysql> select name from biodatabase where name = '__dummy__';
	mysql> rollback;
	mysql> select name from biodatabase where name = '__dummy__';

The first SELECT query should return one and the last query should  
return zero rows if transactions are supported, and there shouldn't  
be any error.

If the above succeeds (which I don't expect it to) then it looks like  
the DBD::mysql driver thinks the database doesn't support  
transactions when in reality it does. Let me know the result.

	-hilmar

On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote:

> Greetings
>
> I am trying to install bioperl-db in preparation for installing a  
> biosql database. I'm running on a Dell PowerEdge with quad dual- 
> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl  
> 1.5.1.  I have installed mysql v5.0.21 from source with --with- 
> innodb set for the configuration. I installed bioperl-db from cvs.  
> I have the latest DBI and DBD:mysql installed a few weeks ago from  
> CPAN. The installation has been working well with perl otherwise,  
> for example, the Ensembl core API works OK. SHOW ENGINES indicates  
> that innodb is enabled.  I have attached a snippet from the top of  
> the output below. I searched the web and the bioperl-db list and  
> haven't found anything that appears to be relevant. I've done  
> several of these installs and they've pretty much completed without  
> a single glitch. Does anyone have any ideas how to isolate the  
> problem?
>
> Thanks
>
> Mike
>
> [mmuratet at HSV-PROBE bioperl-db]$ make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/01dbadaptor.....ok 14/19
> ------------- EXCEPTION  -------------
> MSG: failed to open connection: Transactions not supported by database
> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255
> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ 
> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ 
> BioSQL/BasePersistenceAdaptor.pm:1477
> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ 
> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ 
> DB/BioSQL/BaseDriver.pm:518
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ 
> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> STACK toplevel t/01dbadaptor.t:62
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun  7 19:38:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 14:38:08 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
Message-ID: <001901c68a69$e7ece8e0$15327e82@pyrimidine>

All,

Don't know how many people use Bio::ClusterIO this module, but it looks like
Bio::ClusterIO::dbsnp is broken unless you are using older XML versions of
the dbSNP database; the schema for ASN.1 and XML format for SNP has changed:

http://www.ncbi.nlm.nih.gov/projects/SNP/

under 'Announcements'.

I actually tried parsing the dbsnp test file and a newer schema XML file to
confirm this; the new version doesn't work (returned object from
next_cluster is undef).  I'm filing a bug as a reminder.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From paul.boutros at utoronto.ca  Wed Jun  7 22:35:46 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Wed,  7 Jun 2006 18:35:46 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149719746.448754c2ef4e0@webmail.utoronto.ca>

> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
Yup yup, you're right. I should have mentioned in my original message that I didn't run 
any remote-server tests, and unfortunately can't do so on this box.
Paul

Quoting David Messina <dmessina at wustl.edu>:

> To look for problems related to Heikki's "return undef" sweep, I ran  
> 'make test' on both today's version of bioperl-live and on an older  
> version I had checked out on May 12. This was done on OS X 10.4.6 and  
> perl 5.8.6.
> 
> 
> Here are the results:
> 
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
> need to be examined.
> 
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
> 
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
> between 5/12 and today, so this appears to be not 'return undef'- 
> related.
> 
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load  
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
> 5/12 and today, so this is not 'return undef'-related.
> 
> 
> 
> All the other test failures appear in both versions of bioperl-live,  
> so presumably they are not affected by the 'return undef' changes.
> 
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
> that he didn't. Those only pop up when I run the optional remote- 
> server tests, however. Perhaps Paul didn't run those and that  
> accounts for the discrepancy?
> 
> Also, he saw errors in Biblio.t, Repeatmasker.t, and  
> StandAloneBlast.t that I did not.
> 
> Dave
> 
> 
> Today's bioperl-live test results:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
> 99.84% okay.
> 
> Note that this is including tests requiring a remote server.
> 
> And here's the output from a May 12 checkout of bioperl-live:
> 
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
> 99.89% okay.
> 
> 
> 
> 
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
> 
> > Hi,
> >
> > Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> > and I had a few
> > failures:
> >
> > Failed Test         Stat Wstat Total Fail  List of Failed
> > ---------------------------------------------------------------------- 
> > ---------
> > t/Annotation.t                    89    2  79 88
> > t/Biblio.t                        24    1  2
> > t/LocusLink.t                     23    1  23
> > t/PhysicalMap.t                   14    2  11-12
> > t/RepeatMasker.t                   6    3  1-2 6
> > t/StandAloneBlast.t               18    4  19-22
> > t/TaxonTree.t                     17   30  11 18-42
> > t/alignUtilities.t                 9    1  9
> > t/psm.t              255 65280    48   35  29 32-48
> > t/tutorial.t                      21   15  7-21
> 
> 


From dmessina at wustl.edu  Wed Jun  7 22:26:25 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 17:26:25 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <1149699781.448706c5e803d@webmail.utoronto.ca>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
Message-ID: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>

To look for problems related to Heikki's "return undef" sweep, I ran  
'make test' on both today's version of bioperl-live and on an older  
version I had checked out on May 12. This was done on OS X 10.4.6 and  
perl 5.8.6.


Here are the results:

Failures in today's version of bioperl-live but NOT in 5/12 version
===================================================================
- psm.t -
The psm.t error appears to be new, so the changes made to Bio/Matrix/ 
PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may  
need to be examined.

Here's the error message:
Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
t/psm........................dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 29, 32-48
         Failed 18/48 tests, 62.50% okay


Failures in 5/12 version of bioperl-live but NOT in today's version
===================================================================
- OntologyStore.t -
Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
touched between 5/12 and today.

The error looks like a transient network problem to me, but I'm not  
sure:
-------------------- WARNING ---------------------
MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
*checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
500.  retrying...
---------------------------------------------------
[REPEATED 5 times -Dave]

t/OntologyStore..............FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay


- RepeatMasker.t -
Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm  
between 5/12 and today, so this appears to be not 'return undef'- 
related.

- SeqVersion.t -
The SeqVersion error was due to a failure to find and load  
Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between  
5/12 and today, so this is not 'return undef'-related.


All the other test failures appear in both versions of bioperl-live,  
so presumably they are not affected by the 'return undef' changes.

Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG  
that he didn't. Those only pop up when I run the optional remote- 
server tests, however. Perhaps Paul didn't run those and that  
accounts for the discrepancy?

Also, he saw errors in Biblio.t, Repeatmasker.t, and  
StandAloneBlast.t that I did not.

Dave


Today's bioperl-live test results:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/PhysicalMap.t                  14    2  14.29%  11-12
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/psm.t             255 65280    48   35  72.92%  29 32-48
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,  
99.84% okay.

Note that this is including tests requiring a remote server.

And here's the output from a May 12 checkout of bioperl-live:

Failed Test        Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/Annotation.t                   89    2   2.25%  79 88
t/DBCUTG.t                       29    5  17.24%  26 30-32
t/LocusLink.t                    23    1   4.35%  23
t/OntologyStore.t                 6    4  66.67%  3-6
t/PhysicalMap.t                  14    2  14.29%  11-12
t/RepeatMasker.t                  6    3  50.00%  1-2 6
t/SeqVersion.t      255 65280     6   10 166.67%  2-6
t/TaxonTree.t                    17   30 176.47%  11 18-42
t/alignUtilities.t                9    1  11.11%  9
t/tutorial.t                     21   15  71.43%  7-21
114 subtests skipped.
Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,  
99.89% okay.


On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:

> Hi,
>
> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7  
> and I had a few
> failures:
>
> Failed Test         Stat Wstat Total Fail  List of Failed
> ---------------------------------------------------------------------- 
> ---------
> t/Annotation.t                    89    2  79 88
> t/Biblio.t                        24    1  2
> t/LocusLink.t                     23    1  23
> t/PhysicalMap.t                   14    2  11-12
> t/RepeatMasker.t                   6    3  1-2 6
> t/StandAloneBlast.t               18    4  19-22
> t/TaxonTree.t                     17   30  11 18-42
> t/alignUtilities.t                 9    1  9
> t/psm.t              255 65280    48   35  29 32-48
> t/tutorial.t                      21   15  7-21


From cjfields at uiuc.edu  Wed Jun  7 23:38:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Jun 2006 18:38:10 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>

I saw a ton of activity from Jason on bioperl-guts for test files and  
modules; you may want to check your tests vs. his changes in case  
they were fixed.  I'll be running similar tests on WinXP ad Mac OS X;  
would be nice to see how my results compare to Dave's

Chris

On Jun 7, 2006, at 5:26 PM, David Messina wrote:

> To look for problems related to Heikki's "return undef" sweep, I ran
> 'make test' on both today's version of bioperl-live and on an older
> version I had checked out on May 12. This was done on OS X 10.4.6 and
> perl 5.8.6.
>
>
> Here are the results:
>
> Failures in today's version of bioperl-live but NOT in 5/12 version
> ===================================================================
> - psm.t -
> The psm.t error appears to be new, so the changes made to Bio/Matrix/
> PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may
> need to be examined.
>
> Here's the error message:
> Illegal division by zero at t/psm.t line 147, <GEN1> line 36.
> t/psm........................dubious
>          Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 29, 32-48
>          Failed 18/48 tests, 62.50% okay
>
>
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been
> touched between 5/12 and today.
>
> The error looks like a transient network problem to me, but I'm not
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
>
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
>
>
> - RepeatMasker.t -
> Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm
> between 5/12 and today, so this appears to be not 'return undef'-
> related.
>
> - SeqVersion.t -
> The SeqVersion error was due to a failure to find and load
> Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between
> 5/12 and today, so this is not 'return undef'-related.
>
>
>
> All the other test failures appear in both versions of bioperl-live,
> so presumably they are not affected by the 'return undef' changes.
>
> Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG
> that he didn't. Those only pop up when I run the optional remote-
> server tests, however. Perhaps Paul didn't run those and that
> accounts for the discrepancy?
>
> Also, he saw errors in Biblio.t, Repeatmasker.t, and
> StandAloneBlast.t that I did not.
>
> Dave
>
>
> Today's bioperl-live test results:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.
>
> Note that this is including tests requiring a remote server.
>
> And here's the output from a May 12 checkout of bioperl-live:
>
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/RepeatMasker.t                  6    3  50.00%  1-2 6
> t/SeqVersion.t      255 65280     6   10 166.67%  2-6
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed,
> 99.89% okay.
>
>
>
>
> On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote:
>
>> Hi,
>>
>> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7
>> and I had a few
>> failures:
>>
>> Failed Test         Stat Wstat Total Fail  List of Failed
>> --------------------------------------------------------------------- 
>> -
>> ---------
>> t/Annotation.t                    89    2  79 88
>> t/Biblio.t                        24    1  2
>> t/LocusLink.t                     23    1  23
>> t/PhysicalMap.t                   14    2  11-12
>> t/RepeatMasker.t                   6    3  1-2 6
>> t/StandAloneBlast.t               18    4  19-22
>> t/TaxonTree.t                     17   30  11 18-42
>> t/alignUtilities.t                 9    1  9
>> t/psm.t              255 65280    48   35  29 32-48
>> t/tutorial.t                      21   15  7-21
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Thu Jun  8 00:50:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Wed, 7 Jun 2006 19:50:48 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
Message-ID: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>

Thanks for letting me know, Chris.

Here's a new round of results on bioperl-live checked out moments ago:
[OS X 10.4.6, perl 5.8.6]

Failed Test   Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
t/DBCUTG.t                  29    5  17.24%  26 30-32
t/LocusLink.t               23    1   4.35%  23
t/PopGen.t                  89    1   1.12%  85
t/psm.t        255 65280    48   35  72.92%  29 32-48
t/tutorial.t                21   15  71.43%  7-21
121 subtests skipped.
Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,  
99.69% okay.

Fixed since earlier today
=========================
Annotation.t
PhysicalMap.t
TaxonTree.t
alignUtilities.t

New since earlier today
=======================
PopGen.t

t/PopGen.....................FAILED test 85
         Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86  
okay, 96.63%)

Unchanged
=========
DBCUTG.t
LocusLink.t
psm.t
tutorial.t

Remote-server tests were run like before. I forgot to mention last  
time that I skipped the local DB tests and I don't have bioperl-ext  
installed, so several staden-related tests were also skipped.

Dave


My results from earlier today for reference:
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ---------------------------------------------------------------------- 
> --
> -------
> t/Annotation.t                   89    2   2.25%  79 88
> t/DBCUTG.t                       29    5  17.24%  26 30-32
> t/LocusLink.t                    23    1   4.35%  23
> t/PhysicalMap.t                  14    2  14.29%  11-12
> t/TaxonTree.t                    17   30 176.47%  11 18-42
> t/alignUtilities.t                9    1  11.11%  9
> t/psm.t             255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                     21   15  71.43%  7-21
> 114 subtests skipped.
> Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> 99.84% okay.


From heikki at sanbi.ac.za  Thu Jun  8 08:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jun  8 08:52:27 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:52:27 +0200
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
Message-ID: <200606081052.27446.heikki@sanbi.ac.za>

I sort of fixed this.

At least the tests pass (I commented out two) when using the new sample XML. 
To be really usefull, the code need much more work, so I left the bug open.

http://bugzilla.open-bio.org/show_bug.cgi?id=2018


	-Heikki


On Wednesday 07 June 2006 21:38, Chris Fields wrote:
> All,
>
> Don't know how many people use Bio::ClusterIO this module, but it looks
> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
> versions of the dbSNP database; the schema for ASN.1 and XML format for SNP
> has changed:
>
> http://www.ncbi.nlm.nih.gov/projects/SNP/
>
> under 'Announcements'.
>
> I actually tried parsing the dbsnp test file and a newer schema XML file to
> confirm this; the new version doesn't work (returned object from
> next_cluster is undef).  I'm filing a bug as a reminder.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jun  8 08:49:38 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Jun 2006 10:49:38 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return
	undef"
In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu>
	<4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu>
Message-ID: <200606081049.40232.heikki@sanbi.ac.za>

Looks like we survived the sweeping change - and fixed a number of existing 
bugs in the process. Thanks for everyone who helped!

	-Heikki 

On Thursday 08 June 2006 02:50, David Messina wrote:
> Thanks for letting me know, Chris.
>
> Here's a new round of results on bioperl-live checked out moments ago:
> [OS X 10.4.6, perl 5.8.6]
>
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------
> -------
> t/DBCUTG.t                  29    5  17.24%  26 30-32
> t/LocusLink.t               23    1   4.35%  23
> t/PopGen.t                  89    1   1.12%  85
> t/psm.t        255 65280    48   35  72.92%  29 32-48
> t/tutorial.t                21   15  71.43%  7-21
> 121 subtests skipped.
> Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> 99.69% okay.
>
> Fixed since earlier today
> =========================
> Annotation.t
> PhysicalMap.t
> TaxonTree.t
> alignUtilities.t
>
> New since earlier today
> =======================
> PopGen.t
>
> t/PopGen.....................FAILED test 85
>          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> okay, 96.63%)
>
> Unchanged
> =========
> DBCUTG.t
> LocusLink.t
> psm.t
> tutorial.t
>
> Remote-server tests were run like before. I forgot to mention last
> time that I skipped the local DB tests and I don't have bioperl-ext
> installed, so several staden-related tests were also skipped.
>
> Dave
>
> My results from earlier today for reference:
> > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > ----------------------------------------------------------------------
> > --
> > -------
> > t/Annotation.t                   89    2   2.25%  79 88
> > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > t/LocusLink.t                    23    1   4.35%  23
> > t/PhysicalMap.t                  14    2  14.29%  11-12
> > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > t/alignUtilities.t                9    1  11.11%  9
> > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                     21   15  71.43%  7-21
> > 114 subtests skipped.
> > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > 99.84% okay.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 05:55:09 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 08 Jun 2006 15:55:09 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
Message-ID: <4487BBBD.6060702@infotech.monash.edu.au>

Hi all,

I've just been further auditing the Bioperl code and noticed that
Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
can't locate an example/sample sequence file in "Lasergene" format.

 From the code it looks similar to 'raw' format but has "^^" as
a separator character.

Can anyone provide a real-life example so I can augment the 
t/lasergene.t tests?

Thanks,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From jrm62 at cam.ac.uk  Thu Jun  8 11:38:40 2006
From: jrm62 at cam.ac.uk (John Mifsud)
Date: 08 Jun 2006 12:38:40 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
Message-ID: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>

Dear all,

Firstly I hope this is the right email list to write to! 

Secondly, I have a little program that parses the BLAST results i have got 
running remotely to the NCBI server and takes out all the hit sequences and 
converts them to FASTA format.

Now when using BROAD BLAST and getting results this works fine (tblastn ver 
2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
the output is different and the parsing no longer works. I was wondering if 
anyone knew of a new SearchIO module / script that is designed to blast the 
updated NCBI BLAST output?

Thanks for your time,


John


From cjfields at uiuc.edu  Thu Jun  8 12:56:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 07:56:27 -0500
Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live
In-Reply-To: <200606081052.27446.heikki@sanbi.ac.za>
References: <001901c68a69$e7ece8e0$15327e82@pyrimidine>
	<200606081052.27446.heikki@sanbi.ac.za>
Message-ID: <AB8EE4BC-4774-48A6-8F26-2A8356F8E700@uiuc.edu>

Sounds good to me.  If someone wants to use this down the line, they  
might be desperate enough to provide patches; there are a lot of  
commented out tags.

Chris

On Jun 8, 2006, at 3:52 AM, Heikki Lehvaslaiho wrote:

> I sort of fixed this.
>
> At least the tests pass (I commented out two) when using the new  
> sample XML.
> To be really usefull, the code need much more work, so I left the  
> bug open.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2018
>
>
> 	-Heikki
>
>
> On Wednesday 07 June 2006 21:38, Chris Fields wrote:
>> All,
>>
>> Don't know how many people use Bio::ClusterIO this module, but it  
>> looks
>> like Bio::ClusterIO::dbsnp is broken unless you are using older XML
>> versions of the dbSNP database; the schema for ASN.1 and XML  
>> format for SNP
>> has changed:
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/
>>
>> under 'Announcements'.
>>
>> I actually tried parsing the dbsnp test file and a newer schema  
>> XML file to
>> confirm this; the new version doesn't work (returned object from
>> next_cluster is undef).  I'm filing a bug as a reminder.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 13:03:05 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 14:03:05 +0100
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <44882009.1040906@mrc-dunn.cam.ac.uk>

John Mifsud wrote:
> Dear all,
> 
> Firstly I hope this is the right email list to write to! 
> 
> Secondly, I have a little program that parses the BLAST results i have got 
> running remotely to the NCBI server and takes out all the hit sequences and 
> converts them to FASTA format.
> 
> Now when using BROAD BLAST and getting results this works fine (tblastn ver 
> 2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and 
> the output is different and the parsing no longer works. I was wondering if 
> anyone knew of a new SearchIO module / script that is designed to blast the 
> updated NCBI BLAST output?

You'll probably need to get the latest SearchIO blast module from 
bioperl-live.
http://bioperl.org/wiki/Getting_BioPerl

If you're having difficulties with your setup, John, I can just send you 
the relevant file(s). Mail me (or Alan) privately for that.


From cjfields at uiuc.edu  Thu Jun  8 13:12:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 08:12:23 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
Message-ID: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>

I would say, based on previous responses, update to the latest CVS  
(bioperl-live).  You could also try updating  
Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you  
don't want to update the entire toolkit.  Running these with BLAST  
2.2.14 output seems to work fine.

Though this is the likely fix, if you have additional problems next  
time please make sure to include more information.  We have no idea  
what OS, bioperl version, perl version you are running.  And a code  
snippet and bug description would be nice (i.e. "it doesn't work" -  
not a good description; "the script freezes" is a little more  
informative).

Chris

On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:

> Dear all,
>
> Firstly I hope this is the right email list to write to!
>
> Secondly, I have a little program that parses the BLAST results i  
> have got
> running remotely to the NCBI server and takes out all the hit  
> sequences and
> converts them to FASTA format.
>
> Now when using BROAD BLAST and getting results this works fine  
> (tblastn ver
> 2.2.9). However, NCBI have just updated their BLAST server (to  
> 2.2.14) and
> the output is different and the parsing no longer works. I was  
> wondering if
> anyone knew of a new SearchIO module / script that is designed to  
> blast the
> updated NCBI BLAST output?
>
> Thanks for your time,
>
>
> John
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Thu Jun  8 16:03:21 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 08 Jun 2006 17:03:21 +0100
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>	<200605311255.19166.heikki@sanbi.ac.za>
	<447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <44884A49.6060805@mrc-dunn.cam.ac.uk>

Sendu Bala wrote:
> Heikki Lehvaslaiho wrote:
>> In my opinion the sooner the bugs get exposed the better. It is much more 
>> likely that there is a well hidden bug caused by assigning accidentally undef 
>> into an one element array that someone intentionally writing code that 
>> expects that behaviour!
>>
>> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
>> and could not see any differences in the test output. 
>>
>> Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible 
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Looks like Heikki's work went well. If there is any further interest in 
getting rid of all the remaining undef returns, this also need to be fixed:

sub x {
   # return (...) on success
   # do nothing on failure
}

Needs to be changed to:

sub x {
   # return (...) on success
   return;
}


From roy at colibase.bham.ac.uk  Thu Jun  8 16:31:10 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 08 Jun 2006 17:31:10 +0100
Subject: [Bioperl-l] Truncate sequence with features
Message-ID: <448850CE.1040105@colibase.bham.ac.uk>

Hi all.

I've been playing around with a subroutine to truncate a sequence and 
adjust the coordinates of any features that overlap the specified 
region- something that according to the comments in 
Bio::Location::Simple has been abortively worked on in the past.

I've submitted the subroutine as an enhancement in Bugzilla. It's a bit 
hacky but works for what I needed it for. However I'm a bit unsure on 
the best way to deal with split locations where one of the sublocations 
is entirely outside the truncated region. My current method results in 
locations like:
join(1..500, >1000..>1000)

which is quite ugly and possibly invalid, but kind of makes sense. Does 
anyone know what would be the correct behaviour for this situation?

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From cjfields at uiuc.edu  Thu Jun  8 18:47:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 13:47:19 -0500
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <000701c68b2b$f8cc21e0$15327e82@pyrimidine>

Thomas;

That error isn't related to BioPerl.  This is the standard HTML response
NCBI gives as a web page; the error imbedded in the HTML you received as a
warning has:

ERROR: Cannot accept request, error code: 1Number of unfinished requests
(151) from your IP address reached the HARD limit 150.

So you may have too many requests in the BLAST queue.  

Chris

> -----Original Message-----
> From: Thomas J Keller [mailto:kellert at ohsu.edu]
> Sent: Thursday, June 08, 2006 1:39 PM
> To: Chris Fields
> Cc: John Mifsud; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] NCBI BLAST results parsing
> 
> I'm having the same problem bp_remote_blast.pl worked yesterday,
> today it's busted. Incidently, I got the following email from NCBI
> this morning:
> The new version of the NCBI SOAP E-Utilities, which includes recent
> changes to the NCBI sequence databases schema, was released today.
> 
> Thank you.
> NCBI E-Utilities Team
> 
> I wouldn't have thought that that would affect
> Bio::Tools::RemoteBlast but something has changed.
> 
> Here's a snippet of the output after $ bp_remote_blast.pl -p blastn -
> d nr -e 1e-3 -i nm_008540.fasta
> 
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
> Content-Length: 267
> Content-Type: application/x-www-form-urlencoded
> 
> DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+%
> 25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C
> +mRNA.%
> 0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm
> ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn
> 
> 
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content-
> Type" content="text/html; charset=utf-8"/><link rel="stylesheet"
> href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body
> bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"
> onload="StartBlastCgi();"><!--  the header   --> <table border="0"
> width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"
> colspan=4>    <map name="head_img_map">    <area shape="rect"
> coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home
> page">       <area shape="rect" coords="301,0,600,40" href="http://
> www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>
> <IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"
> NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"
> HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr
> align="center">    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI=
> yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN
> KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes
> &HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT
> O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td
> width="150" bgcolor="#003366">        <a href="http://
> www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI
> =yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI
> NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</
> FONT></a></td>    <td width="150" bgcolor="#003366">        <a
> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
> CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye
> s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"
> class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</
> FONT></a></td></tr></table><br><!--  the contents   --> <form
> action="Blast.cgi" enctype="application/x-www-form-urlencoded"
> method="POST"><script src="blastcgi.js"></script><SCRIPT
> LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =
> 'html/head_formating.gif';// --></SCRIPT><br><hr><font
> color="red">ERROR: Cannot accept request, error code: 1Number of
> unfinished requests (151)  from your IP address reached the HARD
> limit 150.</font><hr></form>   </body></html>
> ---------------------------------------------------
> 
> On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:
> 
> > I would say, based on previous responses, update to the latest CVS
> > (bioperl-live).  You could also try updating
> > Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> > don't want to update the entire toolkit.  Running these with BLAST
> > 2.2.14 output seems to work fine.
> >
> > Though this is the likely fix, if you have additional problems next
> > time please make sure to include more information.  We have no idea
> > what OS, bioperl version, perl version you are running.  And a code
> > snippet and bug description would be nice (i.e. "it doesn't work" -
> > not a good description; "the script freezes" is a little more
> > informative).
> >
> > Chris
> >
> > On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
> >
> >> Dear all,
> >>
> >> Firstly I hope this is the right email list to write to!
> >>
> >> Secondly, I have a little program that parses the BLAST results i
> >> have got
> >> running remotely to the NCBI server and takes out all the hit
> >> sequences and
> >> converts them to FASTA format.
> >>
> >> Now when using BROAD BLAST and getting results this works fine
> >> (tblastn ver
> >> 2.2.9). However, NCBI have just updated their BLAST server (to
> >> 2.2.14) and
> >> the output is different and the parsing no longer works. I was
> >> wondering if
> >> anyone knew of a new SearchIO module / script that is designed to
> >> blast the
> >> updated NCBI BLAST output?
> >>
> >> Thanks for your time,
> >>
> >>
> >> John
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From kellert at ohsu.edu  Thu Jun  8 18:39:04 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 11:39:04 -0700
Subject: [Bioperl-l] NCBI BLAST results parsing
In-Reply-To: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
Message-ID: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>

I'm having the same problem bp_remote_blast.pl worked yesterday,  
today it's busted. Incidently, I got the following email from NCBI  
this morning:
The new version of the NCBI SOAP E-Utilities, which includes recent
changes to the NCBI sequence databases schema, was released today.

Thank you.
NCBI E-Utilities Team

I wouldn't have thought that that would affect  
Bio::Tools::RemoteBlast but something has changed.

Here's a snippet of the output after $ bp_remote_blast.pl -p blastn - 
d nr -e 1e-3 -i nm_008540.fasta

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4
Content-Length: 267
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+% 
25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C 
+mRNA.% 
0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm 
ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn


---------------------------------------------------

-------------------- WARNING ---------------------
MSG: <html><head><title>NCBI Blast</title><meta http-equiv="Content- 
Type" content="text/html; charset=utf-8"/><link rel="stylesheet"  
href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"></head><body  
bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600"  
onload="StartBlastCgi();"><!--  the header   --> <table border="0"  
width="600" cellspacing="0" cellpadding="0"><tr>     <td width="600"  
colspan=4>    <map name="head_img_map">    <area shape="rect"  
coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home  
page">       <area shape="rect" coords="301,0,600,40" href="http:// 
www.ncbi.nlm.nih.gov/blast" alt="NCBI BLAST home page">    </map>     
<IMG SRC="html/blastheader.gif" USEMAP="#head_img_map" BORDER="0"  
NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600"  
HEIGHT="45" BORDER="0" ALIGN="middle">    </td></tr><tr  
align="center">    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI= 
yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN 
KOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Nucleotide</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes 
&HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT 
O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Protein</FONT></a></td>    <td  
width="150" bgcolor="#003366">        <a href="http:// 
www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Translations&NCBI_GI 
=yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LI 
NKOUT=yes"        class="HELPBAR"><FONT COLOR="#FFFFFF">Translations</ 
FONT></a></td>    <td width="150" bgcolor="#003366">        <a  
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? 
CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye 
s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes"         
class="HELPBAR"><FONT COLOR="#FFFFFF">Retrieve results for an RID</ 
FONT></a></td></tr></table><br><!--  the contents   --> <form  
action="Blast.cgi" enctype="application/x-www-form-urlencoded"  
method="POST"><script src="blastcgi.js"></script><SCRIPT  
LANGUAGE="JavaScript"> <!--document.images['BlastHeaderGif'].src =  
'html/head_formating.gif';// --></SCRIPT><br><hr><font  
color="red">ERROR: Cannot accept request, error code: 1Number of  
unfinished requests (151)  from your IP address reached the HARD  
limit 150.</font><hr></form>   </body></html>
---------------------------------------------------

On Jun 8, 2006, at 6:12 AM, Chris Fields wrote:

> I would say, based on previous responses, update to the latest CVS
> (bioperl-live).  You could also try updating
> Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you
> don't want to update the entire toolkit.  Running these with BLAST
> 2.2.14 output seems to work fine.
>
> Though this is the likely fix, if you have additional problems next
> time please make sure to include more information.  We have no idea
> what OS, bioperl version, perl version you are running.  And a code
> snippet and bug description would be nice (i.e. "it doesn't work" -
> not a good description; "the script freezes" is a little more
> informative).
>
> Chris
>
> On Jun 8, 2006, at 6:38 AM, John Mifsud wrote:
>
>> Dear all,
>>
>> Firstly I hope this is the right email list to write to!
>>
>> Secondly, I have a little program that parses the BLAST results i
>> have got
>> running remotely to the NCBI server and takes out all the hit
>> sequences and
>> converts them to FASTA format.
>>
>> Now when using BROAD BLAST and getting results this works fine
>> (tblastn ver
>> 2.2.9). However, NCBI have just updated their BLAST server (to
>> 2.2.14) and
>> the output is different and the parsing no longer works. I was
>> wondering if
>> anyone knew of a new SearchIO module / script that is designed to
>> blast the
>> updated NCBI BLAST output?
>>
>> Thanks for your time,
>>
>>
>> John
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Jun  8 19:28:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 14:28:18 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <200606081049.40232.heikki@sanbi.ac.za>
Message-ID: <000001c68b31$b5320390$15327e82@pyrimidine>

Here are tests run from WinXP, ActivePerl 5.8.817; almost everything passes.
Not sure what's going on with StandAloneBlast or the protgraph tests, so
I'll check into it.  The psm.t tests that failed are the same as the ones
mentioned previously on other systems.
As an aside, I hate that using '-w' flag with ActivePerl gives a thousand
useless 'subroutines redefined' warnings; only way I found to turn it off is
to not use the flag.  Anyway, I pulled out the relevant chunks of code here;
I'll submit the Mac results separately to not confuse the two.  

...
t/StandAloneBlast............FAILED tests 19-22
	Failed 4/18 tests, 77.78% okay
...
t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
36-37, 45, 48-56, 59-60, 65-66
	Failed 22/66 tests, 66.67% okay
...
t/psm........................Illegal division by zero at t/psm.t line 147,
<GEN1> line 36.
dubious
	Test returned status 9 (wstat 2304, 0x900)
DIED. FAILED tests 29, 32-48
Failed 18/48 tests, 62.50% okay
...
Failed Test         Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/StandAloneBlast.t               18    4  22.22%  19-22
t/protgraph.t                     66   22  33.33%  11 13 20-21 26 33 36-37
45
                                                   48-56 59-60 65-66
t/psm.t                9  2304    48   35  72.92%  29 32-48
39 subtests skipped.
Failed 3/233 test scripts, 98.71% okay. 36/11100 subtests failed, 99.68%
okay.
NMAKE :  U1077: 
Stop.


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Thursday, June 08, 2006 3:50 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; BioPerl Mailing List; Chris Fields
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall with
> "returnundef"
> 
> Looks like we survived the sweeping change - and fixed a number of
> existing
> bugs in the process. Thanks for everyone who helped!
> 
> 	-Heikki
> 
> On Thursday 08 June 2006 02:50, David Messina wrote:
> > Thanks for letting me know, Chris.
> >
> > Here's a new round of results on bioperl-live checked out moments ago:
> > [OS X 10.4.6, perl 5.8.6]
> >
> > Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> > ------------------------------------------------------------------------
> > -------
> > t/DBCUTG.t                  29    5  17.24%  26 30-32
> > t/LocusLink.t               23    1   4.35%  23
> > t/PopGen.t                  89    1   1.12%  85
> > t/psm.t        255 65280    48   35  72.92%  29 32-48
> > t/tutorial.t                21   15  71.43%  7-21
> > 121 subtests skipped.
> > Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed,
> > 99.69% okay.
> >
> > Fixed since earlier today
> > =========================
> > Annotation.t
> > PhysicalMap.t
> > TaxonTree.t
> > alignUtilities.t
> >
> > New since earlier today
> > =======================
> > PopGen.t
> >
> > t/PopGen.....................FAILED test 85
> >          Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86
> > okay, 96.63%)
> >
> > Unchanged
> > =========
> > DBCUTG.t
> > LocusLink.t
> > psm.t
> > tutorial.t
> >
> > Remote-server tests were run like before. I forgot to mention last
> > time that I skipped the local DB tests and I don't have bioperl-ext
> > installed, so several staden-related tests were also skipped.
> >
> > Dave
> >
> > My results from earlier today for reference:
> > > Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> > > ----------------------------------------------------------------------
> > > --
> > > -------
> > > t/Annotation.t                   89    2   2.25%  79 88
> > > t/DBCUTG.t                       29    5  17.24%  26 30-32
> > > t/LocusLink.t                    23    1   4.35%  23
> > > t/PhysicalMap.t                  14    2  14.29%  11-12
> > > t/TaxonTree.t                    17   30 176.47%  11 18-42
> > > t/alignUtilities.t                9    1  11.11%  9
> > > t/psm.t             255 65280    48   35  72.92%  29 32-48
> > > t/tutorial.t                     21   15  71.43%  7-21
> > > 114 subtests skipped.
> > > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed,
> > > 99.84% okay.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fernan at iib.unsam.edu.ar  Thu Jun  8 17:02:27 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Thu, 8 Jun 2006 14:02:27 -0300
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <4487BBBD.6060702@infotech.monash.edu.au>
References: <4487BBBD.6060702@infotech.monash.edu.au>
Message-ID: <20060608170227.GF3334@iib.unsam.edu.ar>

+----[ Torsten Seemann <torsten.seemann at infotech.monash.edu.au> (08.Jun.2006 13:47):
|
| Hi all,
| 
| I've just been further auditing the Bioperl code and noticed that
| Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
| can't locate an example/sample sequence file in "Lasergene" format.
| 
|  From the code it looks similar to 'raw' format but has "^^" as
| a separator character.
| 
| Can anyone provide a real-life example so I can augment the 
| t/lasergene.t tests?
|
+----]

See the attached file. 

The format seems to be plain text, beginning with a free
text description that goes from the beginning of the file
until the "^^" delimiter, and after that the sequence.

Fernan
-------------- next part --------------
Created: Jueves, 08 de Junio de 2006 01:56 p.m.

This is a test sequence created with EditSeq (Lasergene's DNAStar)

^^
ATCGATCGATCG

From freimuth at pathology.wustl.edu  Thu Jun  8 17:12:36 2006
From: freimuth at pathology.wustl.edu (Freimuth, Robert)
Date: Thu, 8 Jun 2006 12:12:36 -0500
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
Message-ID: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>

Hi,

I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
of hits from blast, then get some information about the tiled result.  I
thought I'd use the num_unaligned_query and num_unaligned_hit methods to
get the number of unaligned bases in the tiled result, then subtract
that from the length of the query/subject sequence to get the number of
aligned bases in the region spanned by the hit(s).  My code is below,
followed by the error message.


while( my $result_obj = $blast_obj->next_result() )
{
    while( my $hit_obj = $result_obj->next_hit() )
    {
        my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
=> $hit_obj->name() );
        $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
this number of bp

        while( my $hsp_obj = $hit_obj->next_hsp() )
        {
            # add all HSPs to a GenericHit object so they can be tiled
together
            $generic_hit_obj->add_hsp( $hsp_obj );
        }

        my $num_unaligned_query =
$generic_hit_obj->num_unaligned_query();
        my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();


------------- EXCEPTION  -------------
MSG: Must have defined query_len
STACK Bio::Search::Hit::GenericHit::logical_length
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
STACK Bio::Search::Hit::GenericHit::num_unaligned_query
/usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
STACK toplevel blast_needle_timetrials_1.pl:94
 
--------------------------------------


I looked through the docs to try to find an explanation or some mention
of how to set query_len, but I didn't find anything.  Could someone
please point out what I'm doing wrong?  Additionally, if I'm making this
harder than it needs to be, please give me a gentle whack with the clue
stick.

Thanks,
Bob


From osborne1 at optonline.net  Thu Jun  8 19:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From osborne1 at optonline.net  Thu Jun  8 19:42:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 08 Jun 2006 15:42:07 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"returnundef"
In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine>
Message-ID: <C0ADF5CF.8C8F%osborne1@optonline.net>

Chris,

Odd. protgraph.t passes all of its tests on my computer. Do you have the
Clone module installed?

Brian O.


On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> 36-37, 45, 48-56, 59-60, 65-66
> Failed 22/66 tests, 66.67% okay


From jason at bioperl.org  Thu Jun  8 20:15:47 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 8 Jun 2006 16:15:47 -0400
Subject: [Bioperl-l] undef query_len error with
	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <84AC010A-25E6-48C7-A723-CE4688ECA926@bioperl.org>

why are you trying to create new Hit objects?
  $hit_obj is-A GenericHit object...


-jason
On Jun 8, 2006, at 1:12 PM, Freimuth, Robert wrote:

> Hi,
>
> I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set
> of hits from blast, then get some information about the tiled  
> result.  I
> thought I'd use the num_unaligned_query and num_unaligned_hit  
> methods to
> get the number of unaligned bases in the tiled result, then subtract
> that from the length of the query/subject sequence to get the  
> number of
> aligned bases in the region spanned by the hit(s).  My code is below,
> followed by the error message.
>
>
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
>
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
>
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698
> STACK Bio::Search::Hit::GenericHit::num_unaligned_query
> /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264
> STACK main::process_blast_hit blast_needle_timetrials_1.pl:245
> STACK toplevel blast_needle_timetrials_1.pl:94
>
> --------------------------------------
>
>
> I looked through the docs to try to find an explanation or some  
> mention
> of how to set query_len, but I didn't find anything.  Could someone
> please point out what I'm doing wrong?  Additionally, if I'm making  
> this
> harder than it needs to be, please give me a gentle whack with the  
> clue
> stick.
>
> Thanks,
> Bob
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu Jun  8 22:36:00 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 09 Jun 2006 08:36:00 +1000
Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file?
In-Reply-To: <20060608170227.GF3334@iib.unsam.edu.ar>
References: <4487BBBD.6060702@infotech.monash.edu.au>
	<20060608170227.GF3334@iib.unsam.edu.ar>
Message-ID: <4488A650.2050803@infotech.monash.edu.au>

> I've just been further auditing the Bioperl code and noticed that
> Bio::SeqIO::lasergene didn't even compile (now fixed) but I still
> can't locate an example/sample sequence file in "Lasergene" format.

Thanks to Fernan, Todd and Senthil who sent me example Lasergene files.
Those will be enough examples to write some tests.

--Torsten


From kellert at ohsu.edu  Fri Jun  9 00:29:10 2006
From: kellert at ohsu.edu (Thomas J Keller)
Date: Thu, 8 Jun 2006 17:29:10 -0700
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
Message-ID: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>

Greetings,
Is fink still a reasonable way to install and maintain bioperl?  
(There's been some emails about instability.) How 'bout upgrades: the  
way I have fink installed it's path is first when perl reads @INC. So  
if I put a newer Bio::something in /usr/local/whereever it won't be  
seen if an older module is in the fink path.  Can I upgrade in the  
fink "space" without messing up fink's database? Other options?

Thanks,
Tom K


Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From hlapp at gmx.net  Fri Jun  9 01:19:28 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 8 Jun 2006 21:19:28 -0400
Subject: [Bioperl-l] fink and updating bioperl
In-Reply-To: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
References: <Prayer.1.0.16.0606081238400.12964@hermes-1.csi.cam.ac.uk>
	<17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu>
	<8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu>
	<1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu>
Message-ID: <060FC8CE-FD89-436E-B79C-135BB4F324CD@gmx.net>

Why don't you remove the fink bioperl package if you want to install  
a newer version locally?

BTW unless you use a custom-compiled perl your packages will end up  
in /Library/Perl/5.8.6/ (or /System/Library/Perl/5.8.6/), not /usr/ 
local, when you issue 'make install'.

	-hilmar

On Jun 8, 2006, at 8:29 PM, Thomas J Keller wrote:

> Greetings,
> Is fink still a reasonable way to install and maintain bioperl?
> (There's been some emails about instability.) How 'bout upgrades: the
> way I have fink installed it's path is first when perl reads @INC. So
> if I put a newer Bio::something in /usr/local/whereever it won't be
> seen if an older module is in the fink path.  Can I upgrade in the
> fink "space" without messing up fink's database? Other options?
>
> Thanks,
> Tom K
>
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Fri Jun  9 02:30:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Jun 2006 21:30:20 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <C0ADF5CF.8C8F%osborne1@optonline.net>
Message-ID: <000c01c68b6c$a8184710$15327e82@pyrimidine>

Yes; using ActiveState's PPM:

ppm> query CLone
Querying target 1 (ActivePerl 5.8.7.815)
  1. Clone [0.20] recursively copy Perl datatypes
ppm>

v. 0.20 is the latest in CPAN.

I can try some additional tests with the relevant modules to see what the
problem is.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Thursday, June 08, 2006 2:42 PM
> To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> Cc: Paul.Boutros at utoronto.ca; bioperl-l
> Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> with"returnundef"
> 
> Chris,
> 
> Odd. protgraph.t passes all of its tests on my computer. Do you have the
> Clone module installed?
> 
> Brian O.
> 
> 
> On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33,
> > 36-37, 45, 48-56, 59-60, 65-66
> > Failed 22/66 tests, 66.67% okay
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From heikki at sanbi.ac.za  Fri Jun  9 07:35:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 09:35:12 +0200
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <448850CE.1040105@colibase.bham.ac.uk>
References: <448850CE.1040105@colibase.bham.ac.uk>
Message-ID: <200606090935.12758.heikki@sanbi.ac.za>

Roy,

The definitive document describing the locations is the feature table 
definition:

http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#3.5

but you probably know that already.


Two questions come to mind:

1. Can you parse your joint location using bioperl without errors?

2. Is there a practical advantage in including a location which has no 
relevance to the sequence in hand?

I notice that the /partial qualifier is deprecated and the docs suggest using 
</> signs to indicate that the sequence is partial, so I guess what you are 
doing is  correct.

	-Heikki

On Thursday 08 June 2006 18:31, Roy Chaudhuri wrote:
> Hi all.
>
> I've been playing around with a subroutine to truncate a sequence and
> adjust the coordinates of any features that overlap the specified
> region- something that according to the comments in
> Bio::Location::Simple has been abortively worked on in the past.
>
> I've submitted the subroutine as an enhancement in Bugzilla. It's a bit
> hacky but works for what I needed it for. However I'm a bit unsure on
> the best way to deal with split locations where one of the sublocations
> is entirely outside the truncated region. My current method results in
> locations like:
> join(1..500, >1000..>1000)
>
> which is quite ugly and possibly invalid, but kind of makes sense. Does
> anyone know what would be the correct behaviour for this situation?
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
>
> http://xbase.bham.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Fri Jun  9 08:06:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 9 Jun 2006 10:06:30 +0200
Subject: [Bioperl-l] For CVS developers-potentialpitfall
	with"returnundef"
In-Reply-To: <000c01c68b6c$a8184710$15327e82@pyrimidine>
References: <000c01c68b6c$a8184710$15327e82@pyrimidine>
Message-ID: <200606091006.30893.heikki@sanbi.ac.za>

I am using:
   This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
and I have Clone installed, but more than half the tests fail.

Something is badly wrong.


	-Heikki
bala ~/src/bioperl/core> perl -w t/protgraph.t
1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Failed test 10 in t/protgraph.t at line 85
not ok 11
# Test 11 got: '5' (t/protgraph.t at line 86)
#    Expected: '13'
not ok 12
# Failed test 12 in t/protgraph.t at line 94
not ok 13
# Test 13 got: '5' (t/protgraph.t at line 95)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t/protgraph.t at line 113)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t/protgraph.t at line 114)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t/protgraph.t at line 122)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t/protgraph.t at line 150)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t/protgraph.t at line 158)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t/protgraph.t at line 163)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t/protgraph.t at line 187
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t/protgraph.t at line 212)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t/protgraph.t at line 228)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t/protgraph.t at line 229)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t/protgraph.t at line 233)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t/protgraph.t at line 234)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t/protgraph.t at line 248)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t/protgraph.t at line 249)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t/protgraph.t at line 253)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t/protgraph.t at line 254)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t/protgraph.t at line 263)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t/protgraph.t at line 264)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t/protgraph.t at line 275)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t/protgraph.t at line 276)
#    Expected: '5'


On Friday 09 June 2006 04:30, Chris Fields wrote:
> Yes; using ActiveState's PPM:
>
> ppm> query CLone
> Querying target 1 (ActivePerl 5.8.7.815)
>   1. Clone [0.20] recursively copy Perl datatypes
> ppm>
>
> v. 0.20 is the latest in CPAN.
>
> I can try some additional tests with the relevant modules to see what the
> problem is.
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > Sent: Thursday, June 08, 2006 2:42 PM
> > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > with"returnundef"
> >
> > Chris,
> >
> > Odd. protgraph.t passes all of its tests on my computer. Do you have the
> > Clone module installed?
> >
> > Brian O.
> >
> > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > Failed 22/66 tests, 66.67% okay
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 08:08:18 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 09:08:18 +0100
Subject: [Bioperl-l] undef query_len error
	with	Bio::Search::Hit::GenericHit::num_unaligned_query
In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu>
Message-ID: <44892C72.2040605@mrc-dunn.cam.ac.uk>

Freimuth, Robert wrote:
> Hi,
> 
> I'm trying to use the Bio::Search::Hit::GenericHit
[snip]
> while( my $result_obj = $blast_obj->next_result() )
> {
>     while( my $hit_obj = $result_obj->next_hit() )
>     {
>         my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name
> => $hit_obj->name() );
>         $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap >
> this number of bp
> 
>         while( my $hsp_obj = $hit_obj->next_hsp() )
>         {
>             # add all HSPs to a GenericHit object so they can be tiled
> together
>             $generic_hit_obj->add_hsp( $hsp_obj );
>         }
> 
>         my $num_unaligned_query =
> $generic_hit_obj->num_unaligned_query();
>         my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit();
> 
> ------------- EXCEPTION  -------------
> MSG: Must have defined query_len
> STACK Bio::Search::Hit::GenericHit::logical_length
[snip]
> I looked through the docs to try to find an explanation or some mention
> of how to set query_len, but I didn't find anything.

As Jason asked, why are you essentially recreating the hit object?
The problem you are seeing is that the query length is normally set via 
SearchIO stream via ResultI when it internally creates a new hit object.
When you created your own hit object you didn't supply -query_len as an 
option to new(), nor did you later use the query_length() method to set it.

If you really do need your $generic_hit_obj (instead of just using 
$hit_obj), do $generic_hit_obj->query_length($hit_obj->query_length); 
(Or if you know the length of your query sequence, supply that directly.)


From zhangchnxp at gmail.com  Fri Jun  9 09:05:36 2006
From: zhangchnxp at gmail.com (Zhang chnxp)
Date: Fri, 9 Jun 2006 17:05:36 +0800
Subject: [Bioperl-l] Are there any modules handling the HLA Typing (Sequence
	Based Typing) ?
Message-ID: <4d1768a60606090205m6e360413paf172fa4e731ef2e@mail.gmail.com>

Hi there,
  I have some .abi trace files from an ABI3100 Genetic Analyzer. Are
there any packages handling the typing work of HLA-A, -B, -C, -DRB1,
etc.? Or are there any free softwares solving the ambiguity through
the SBT?


From cain at cshl.edu  Wed Jun  7 23:02:43 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 07 Jun 2006 19:02:43 -0400
Subject: [Bioperl-l] For CVS developers-potentialpitfall with
	"return	undef"
In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
References: <1149699781.448706c5e803d@webmail.utoronto.ca>
	<3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu>
Message-ID: <1149721363.12513.96.camel@localhost.localdomain>

On Wed, 2006-06-07 at 17:26 -0500, David Messina wrote:
> 
> 
> Failures in 5/12 version of bioperl-live but NOT in today's version
> ===================================================================
> - OntologyStore.t -
> Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been  
> touched between 5/12 and today.
> 
> The error looks like a transient network problem to me, but I'm not  
> sure:
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ 
> *checkout*/song/ontology/so.definition?rev=HEAD, but server threw  
> 500.  retrying...
> ---------------------------------------------------
> [REPEATED 5 times -Dave]
> 
> t/OntologyStore..............FAILED tests 3-6
>          Failed 4/6 tests, 33.33% okay
> 
That is a problem with the cvs server at SourceForge (where the Sequence
Ontology is hosted).  I changed the module that tries to get that file
(I don't remember off hand what it was).  

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060607/eca6cf35/attachment.sig>

From oldham at ucla.edu  Fri Jun  9 02:07:34 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Thu, 8 Jun 2006 19:07:34 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large file
Message-ID: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>

Dear all,

I am a total Bioperl newbie struggling to accomplish a conceptually simple
task.  I have a single large fasta file containing about 200,000 probe
sequences (from an Affymetrix microarray), each of which looks like this:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC

What I would like to do is extract from this file a subset of ~130,800
probes (both the header and the sequence) and output this subset into a new
fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
("1138_at" is the probe set ID in the header listed above); I have these
8,175 IDs listed in a separate file.  I *think* that I managed to create an
index of all 200,000 probes in the original fasta file using the following
script:

#!/usr/bin/perl -w

 # script 1: create the index

 use Bio::Index::Fasta;
 use strict;
 my $Index_File_Name = shift;
 my $inx = Bio::Index::Fasta->new(
     -filename => $Index_File_Name,
     -write_flag => 1);
 $inx->make_index(@ARGV);

I'm not sure if this is the most sensible approach, and even if it is, I'm
not sure what to do next.  Any help would be greatly appreciated!

Many thanks,
Mike O.


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006


From sb at mrc-dunn.cam.ac.uk  Fri Jun  9 14:52:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 09 Jun 2006 15:52:59 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <44898B4B.8080901@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes
[snip]
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
[snip]
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!

I'd say you're on the right lines. Next, you should continue reading the 
  rest of the synopsis and description in the docs for Bio::Index::Fasta.

Perhaps it's not clear, but you don't need to say 
$inx->make_index(@ARGV); if you've already provided -file to new() and 
are only dealing with one file. You also can't supply -file to new() if 
you want to change the id_parser (which you do, since you need to tell 
it how to detect your probe set ID).

Having indexed your file you can then output the desired sequences, just 
like the foreach loop suggested in the synopsis. (You could have that in 
the same script.)


One thing I'm not clear on is why it needs -write_flag => 1. Why can't 
it index a read-only database? Even when you set -write_flag allowing it 
to work, it doesn't write anything...


From simon.andrews at bbsrc.ac.uk  Fri Jun  9 15:01:05 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 9 Jun 2006 16:01:05 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>

 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Michael Oldham
> Sent: 09 June 2006 03:08
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a 
> single large file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a 
> conceptually simple task.  I have a single large fasta file 
> containing about 200,000 probe sequences (from an Affymetrix 
> microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC

Unfortunately that's not Fasta format (which only has a single header
line starting with a '>'.  I'd imagine that most programs which deal
with fasta which read that entry would see it as two sequences, the
first of which is empty.


> What I would like to do is extract from this file a subset of 
> ~130,800 probes (both the header and the sequence) and output 
> this subset into a new fasta file.  These 130,800 probes 
> correspond to 8,175 probe set IDs ("1138_at" is the probe set 
> ID in the header listed above)

If you're only having to do this once then it should be fairly quick to
knock up a one off script to do this.  Since you've only got 8000ish
probeset ids then you can probably just read those into a hash to start
with then parse through your big sequence file with something like;


#!perl
use warnings;
use strict;

my %probe_ids;

# Add real code here to populate your hash
$probe_ids{1138_at} = 1;
##########################################


open (IN,'your_affy_file.txt') or die "Can't read affy file: $!";

open (OUT,'>','probe_list.txt') or die "Can't write output: $!";

while (<IN>) {

  if (/^>probe/) {
    # This assumes there are always 3 lines per probe entry
    if (exists $probe_ids{(split(/:/))[2]}) {
      print OUT;
      print OUT scalar <IN>;
      print OUT scalar <IN>;
    }
  }
}


From MEC at stowers-institute.org  Fri Jun  9 14:58:22 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 9 Jun 2006 09:58:22 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work: 

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a 
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks 
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this 
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I 
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed 
>to create an
>index of all 200,000 probes in the original fasta file using 
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even 
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From senthil at cdfd.org.in  Fri Jun  9 22:21:11 2006
From: senthil at cdfd.org.in (M Senthil Kumar)
Date: Fri, 9 Jun 2006 15:21:11 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D04EE@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>


On Fri, 9 Jun 2006, simon andrews (BI) wrote:
|
|
|> -----Original Message-----
|> From: bioperl-l-bounces at lists.open-bio.org
|> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
|> Michael Oldham
|> Sent: 09 June 2006 03:08
|> To: bioperl-l at lists.open-bio.org
|> Subject: [Bioperl-l] Output a subset of FASTA data from a
|> single large file
|>
|> Dear all,
|>
|> I am a total Bioperl newbie struggling to accomplish a
|> conceptually simple task.  I have a single large fasta file
|> containing about 200,000 probe sequences (from an Affymetrix
|> microarray), each of which looks like this:
|>
|> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
|> >Antisense;
|> TGGCTCCTGCTGAGGTCCCCTTTCC
|
|Unfortunately that's not Fasta format (which only has a single header
|line starting with a '>'.  I'd imagine that most programs which deal
|with fasta which read that entry would see it as two sequences, the
|first of which is empty.
|

[snipped]

hi,

I think the file is in fasta format and probably you might have seen it
differently because of your mail transport agent.

Senthil


From cjfields at uiuc.edu  Fri Jun  9 17:59:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:18 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <Pine.SGI.4.44.0606091519050.1653696-100000@www.cdfd.org.in>
Message-ID: <002b01c68bee$6e3237e0$15327e82@pyrimidine>

No; I saw the same thing here.  It's not FASTA in the traditional sense:

http://www.bioperl.org/wiki/FASTA_sequence_format

though he did get it to build a database successfully.  Well, 'success' in
the sense that no errors were thrown.  I've learned the absence of error
messages does not necessarily mean that everything went as planned; it
depends on how much error handling has been added to the module by the
submitting author.  

It's possible that the second annotation line was ignored completely.  I
suppose it's also possible that two sequences are entered into the database,
an empty sequence for the first '>' line and the full sequence for the
second.  It's all dependent on how the parser handles this.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of M Senthil Kumar
> Sent: Friday, June 09, 2006 5:21 PM
> To: simon andrews (BI)
> Cc: bioperl-l at lists.open-bio.org; Michael Oldham
> Subject: Re: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> 
> 
> On Fri, 9 Jun 2006, simon andrews (BI) wrote:
> |
> |
> |> -----Original Message-----
> |> From: bioperl-l-bounces at lists.open-bio.org
> |> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> |> Michael Oldham
> |> Sent: 09 June 2006 03:08
> |> To: bioperl-l at lists.open-bio.org
> |> Subject: [Bioperl-l] Output a subset of FASTA data from a
> |> single large file
> |>
> |> Dear all,
> |>
> |> I am a total Bioperl newbie struggling to accomplish a
> |> conceptually simple task.  I have a single large fasta file
> |> containing about 200,000 probe sequences (from an Affymetrix
> |> microarray), each of which looks like this:
> |>
> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> |> >Antisense;
> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> |
> |Unfortunately that's not Fasta format (which only has a single header
> |line starting with a '>'.  I'd imagine that most programs which deal
> |with fasta which read that entry would see it as two sequences, the
> |first of which is empty.
> |
> 
> [snipped]
> 
> hi,
> 
> I think the file is in fasta format and probably you might have seen it
> differently because of your mail transport agent.
> 
> Senthil
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 17:59:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 12:59:31 -0500
Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef"
In-Reply-To: <200606091006.30893.heikki@sanbi.ac.za>
Message-ID: <002c01c68bee$76219ef0$15327e82@pyrimidine>

I ran tests this morning on protgraph.t using bioperl-live, Mac OS X (Intel)
running perl 5.8.6 and all tests passed, but I haven't updated from CVS
since June 7th.  The test results are almost exactly alike; most failed
tests are from unexpected results (with exactly the same results for both
OS's).  A few look more serious: test 45 failed on both and tests 10 and 12
failed on linux (the only noticeable difference between the two) 
...

ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
...

The line numbers seem to also be off by one (linux tests seem to have one
extra line); not sure if that means anything.

Here's the full WinXP protgraph.t results:

1..66
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
not ok 11
# Test 11 got: '5' (t\protgraph.t at line 85)
#    Expected: '13'
ok 12
not ok 13
# Test 13 got: '5' (t\protgraph.t at line 94)
#    Expected: '13'
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
not ok 20
# Test 20 got: '0.013' (t\protgraph.t at line 112)
#    Expected: '0.027'
.not ok 21
# Test 21 got: '1' (t\protgraph.t at line 113)
#    Expected: ''
..ok 22
.ok 23
ok 24
..ok 25
.not ok 26
# Test 26 got: '1' (t\protgraph.t at line 121)
#    Expected: '5'
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
not ok 33
# Test 33 got: '139' (t\protgraph.t at line 149)
#    Expected: '71'
ok 34
ok 35
not ok 36
# Test 36 got: '126' (t\protgraph.t at line 157)
#    Expected: '58'
.not ok 37
# Test 37 got: '1' (t\protgraph.t at line 162)
#    Expected: '15'
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
not ok 45
# Failed test 45 in t\protgraph.t at line 186
ok 46
ok 47
not ok 48
# Test 48 got: '75' (t\protgraph.t at line 211)
#    Expected: '72'
not ok 49
# Test 49 got: '343' (t\protgraph.t at line 227)
#    Expected: '72'
not ok 50
# Test 50 got: '368' (t\protgraph.t at line 228)
#    Expected: '74'
not ok 51
# Test 51 got: '344' (t\protgraph.t at line 232)
#    Expected: '73'
not ok 52
# Test 52 got: '368' (t\protgraph.t at line 233)
#    Expected: '74'
not ok 53
# Test 53 got: '432' (t\protgraph.t at line 247)
#    Expected: '72'
not ok 54
# Test 54 got: '461' (t\protgraph.t at line 248)
#    Expected: '74'
not ok 55
# Test 55 got: '434' (t\protgraph.t at line 252)
#    Expected: '74'
not ok 56
# Test 56 got: '463' (t\protgraph.t at line 253)
#    Expected: '76'
ok 57
ok 58
not ok 59
# Test 59 got: '437' (t\protgraph.t at line 262)
#    Expected: '3'
not ok 60
# Test 60 got: '467' (t\protgraph.t at line 263)
#    Expected: '4'
ok 61
ok 62
ok 63
ok 64
not ok 65
# Test 65 got: '440' (t\protgraph.t at line 274)
#    Expected: '3'
not ok 66
# Test 66 got: '472' (t\protgraph.t at line 275)
#    Expected: '5'  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Friday, June 09, 2006 3:07 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields; 'Brian Osborne'
> Subject: Re: [Bioperl-l] For CVS developers-
> potentialpitfallwith"returnundef"
> 
> I am using:
>    This is perl, v5.8.7 built for i486-linux-gnu-thread-multi
> and I have Clone installed, but more than half the tests fail.
> 
> Something is badly wrong.
> 
> 
> 	-Heikki
> bala ~/src/bioperl/core> perl -w t/protgraph.t
> 1..66
> ok 1
> ok 2
> ok 3
> ok 4
> ok 5
> ok 6
> ok 7
> ok 8
> ok 9
> not ok 10
> # Failed test 10 in t/protgraph.t at line 85
> not ok 11
> # Test 11 got: '5' (t/protgraph.t at line 86)
> #    Expected: '13'
> not ok 12
> # Failed test 12 in t/protgraph.t at line 94
> not ok 13
> # Test 13 got: '5' (t/protgraph.t at line 95)
> #    Expected: '13'
> ok 14
> ok 15
> ok 16
> ok 17
> ok 18
> ok 19
> not ok 20
> # Test 20 got: '0.013' (t/protgraph.t at line 113)
> #    Expected: '0.027'
> .not ok 21
> # Test 21 got: '1' (t/protgraph.t at line 114)
> #    Expected: ''
> ..ok 22
> .ok 23
> ok 24
> ..ok 25
> .not ok 26
> # Test 26 got: '1' (t/protgraph.t at line 122)
> #    Expected: '5'
> ok 27
> ok 28
> ok 29
> ok 30
> ok 31
> ok 32
> not ok 33
> # Test 33 got: '139' (t/protgraph.t at line 150)
> #    Expected: '71'
> ok 34
> ok 35
> not ok 36
> # Test 36 got: '126' (t/protgraph.t at line 158)
> #    Expected: '58'
> .not ok 37
> # Test 37 got: '1' (t/protgraph.t at line 163)
> #    Expected: '15'
> ok 38
> ok 39
> ok 40
> ok 41
> ok 42
> ok 43
> ok 44
> not ok 45
> # Failed test 45 in t/protgraph.t at line 187
> ok 46
> ok 47
> not ok 48
> # Test 48 got: '75' (t/protgraph.t at line 212)
> #    Expected: '72'
> not ok 49
> # Test 49 got: '343' (t/protgraph.t at line 228)
> #    Expected: '72'
> not ok 50
> # Test 50 got: '368' (t/protgraph.t at line 229)
> #    Expected: '74'
> not ok 51
> # Test 51 got: '344' (t/protgraph.t at line 233)
> #    Expected: '73'
> not ok 52
> # Test 52 got: '368' (t/protgraph.t at line 234)
> #    Expected: '74'
> not ok 53
> # Test 53 got: '432' (t/protgraph.t at line 248)
> #    Expected: '72'
> not ok 54
> # Test 54 got: '461' (t/protgraph.t at line 249)
> #    Expected: '74'
> not ok 55
> # Test 55 got: '434' (t/protgraph.t at line 253)
> #    Expected: '74'
> not ok 56
> # Test 56 got: '463' (t/protgraph.t at line 254)
> #    Expected: '76'
> ok 57
> ok 58
> not ok 59
> # Test 59 got: '437' (t/protgraph.t at line 263)
> #    Expected: '3'
> not ok 60
> # Test 60 got: '467' (t/protgraph.t at line 264)
> #    Expected: '4'
> ok 61
> ok 62
> ok 63
> ok 64
> not ok 65
> # Test 65 got: '440' (t/protgraph.t at line 275)
> #    Expected: '3'
> not ok 66
> # Test 66 got: '472' (t/protgraph.t at line 276)
> #    Expected: '5'
> 
> 
> On Friday 09 June 2006 04:30, Chris Fields wrote:
> > Yes; using ActiveState's PPM:
> >
> > ppm> query CLone
> > Querying target 1 (ActivePerl 5.8.7.815)
> >   1. Clone [0.20] recursively copy Perl datatypes
> > ppm>
> >
> > v. 0.20 is the latest in CPAN.
> >
> > I can try some additional tests with the relevant modules to see what
> the
> > problem is.
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> > > Sent: Thursday, June 08, 2006 2:42 PM
> > > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org
> > > Cc: Paul.Boutros at utoronto.ca; bioperl-l
> > > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall
> > > with"returnundef"
> > >
> > > Chris,
> > >
> > > Odd. protgraph.t passes all of its tests on my computer. Do you have
> the
> > > Clone module installed?
> > >
> > > Brian O.
> > >
> > > On 6/8/06 3:28 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> > > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26,
> > > > 33, 36-37, 45, 48-56, 59-60, 65-66
> > > > Failed 22/66 tests, 66.67% okay
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Fri Jun  9 18:29:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 09 Jun 2006 14:29:53 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002b01c68bee$6e3237e0$15327e82@pyrimidine>
Message-ID: <C0AF3661.CD0A%sdavis2@mail.nih.gov>


On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> No; I saw the same thing here.  It's not FASTA in the traditional sense:
> 
> http://www.bioperl.org/wiki/FASTA_sequence_format
> 
> though he did get it to build a database successfully.  Well, 'success' in
> the sense that no errors were thrown.  I've learned the absence of error
> messages does not necessarily mean that everything went as planned; it
> depends on how much error handling has been added to the module by the
> submitting author.
> 
> It's possible that the second annotation line was ignored completely.  I
> suppose it's also possible that two sequences are entered into the database,
> an empty sequence for the first '>' line and the full sequence for the
> second.  It's all dependent on how the parser handles this.

I think that Senthil was pointing out that even though >Antisense looks to
be on its own line, it isn't, but is simply a continutation of the FASTA
header.  Judging from the context, that is the only interpretation that
makes sense.  

Sean

>> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> |> >Antisense;
>> |> TGGCTCCTGCTGAGGTCCCCTTTCC
>> |
>> |Unfortunately that's not Fasta format (which only has a single header
>> |line starting with a '>'.  I'd imagine that most programs which deal
>> |with fasta which read that entry would see it as two sequences, the
>> |first of which is empty.
>> |
>> 
>> [snipped]
>> 
>> hi,
>> 
>> I think the file is in fasta format and probably you might have seen it
>> differently because of your mail transport agent.


From cjfields at uiuc.edu  Fri Jun  9 19:05:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:05:44 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <002e01c68bf7$b594d210$15327e82@pyrimidine>

There's information in the HOWTOs:

http://www.bioperl.org/wiki/HOWTO:Flat_databases

http://www.bioperl.org/wiki/HOWTO:OBDA

Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
('fasta' format I/O) and this is what I got as output:

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;


i.e. an empty sequence, which is what I guessed might happen, though I
thought it might pick up the second '>' and the full sequence there.  Since
the sequence is tossed you'll have to prescreen your sequence input stream
by either concatenating the two '>' lines together or screening for the
relevant information you want to retain.  You can try maybe getting this
info into Bio::Seq objects and writing to a Bio::SeqIO stream (to file or
file handle).

Once you have that set up, the HOWTO tells you how to set up custom or
secondary namespaces, so you can use a regex to parse out the information
for a primary or secondary keys:

http://www.bioperl.org/wiki/HOWTO:Flat_databases#Secondary_or_custom_namespa
ces

then you could select specific sequences this way (per the HOWTO):

$db->secondary_namespaces("GI");
my $acc_seq = $db->get_Seq_by_id("P84139");
my $gi_seq = $db->get_Seq_by_secondary("GI",443893);

or for multiple sequences (judging from the POD):

my $acc_seqio = $db->get_Stream_by_id(@ids);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Oldham
> Sent: Thursday, June 08, 2006 9:08 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> Dear all,
> 
> I am a total Bioperl newbie struggling to accomplish a conceptually simple
> task.  I have a single large fasta file containing about 200,000 probe
> sequences (from an Affymetrix microarray), each of which looks like this:
> 
> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; 
> >Antisense;
> TGGCTCCTGCTGAGGTCCCCTTTCC
> 
> What I would like to do is extract from this file a subset of ~130,800
> probes (both the header and the sequence) and output this subset into a
> new
> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> ("1138_at" is the probe set ID in the header listed above); I have these
> 8,175 IDs listed in a separate file.  I *think* that I managed to create
> an
> index of all 200,000 probes in the original fasta file using the following
> script:
> 
> #!/usr/bin/perl -w
> 
>  # script 1: create the index
> 
>  use Bio::Index::Fasta;
>  use strict;
>  my $Index_File_Name = shift;
>  my $inx = Bio::Index::Fasta->new(
>      -filename => $Index_File_Name,
>      -write_flag => 1);
>  $inx->make_index(@ARGV);
> 
> I'm not sure if this is the most sensible approach, and even if it is, I'm
> not sure what to do next.  Any help would be greatly appreciated!
> 
> Many thanks,
> Mike O.
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jun  9 19:49:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Jun 2006 14:49:51 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <C0AF3661.CD0A%sdavis2@mail.nih.gov>
Message-ID: <002f01c68bfd$e1111e20$15327e82@pyrimidine>

> On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > No; I saw the same thing here.  It's not FASTA in the traditional sense:
> >
> > http://www.bioperl.org/wiki/FASTA_sequence_format
> >
> > though he did get it to build a database successfully.  Well, 'success'
> in
> > the sense that no errors were thrown.  I've learned the absence of error
> > messages does not necessarily mean that everything went as planned; it
> > depends on how much error handling has been added to the module by the
> > submitting author.
> >
> > It's possible that the second annotation line was ignored completely.  I
> > suppose it's also possible that two sequences are entered into the
> database,
> > an empty sequence for the first '>' line and the full sequence for the
> > second.  It's all dependent on how the parser handles this.
> 
> I think that Senthil was pointing out that even though >Antisense looks to
> be on its own line, it isn't, but is simply a continutation of the FASTA
> header.  Judging from the context, that is the only interpretation that
> makes sense.
> 
> Sean

Sorry.  Just checked through another mail client and you're right.  That's
what I get for trusting Mr. Gates (stupid Outlook).  I have seen a few funky
FASTA derivations, so I thought that's what was going on here.  My bad!

My point, though erroneous, was that the fasta format parser may not parse
this data correctly if he did have two description lines, but may not
indicate there are problems by throwing an exception.  I demonstrated that
using Bio::SeqIO as an example (you get empty sequences).  Bio::Index::Fasta
parses the file itself using this loop to index:

	# Main indexing loop
	while (<FASTA>) {
		if (/^>/) {
			# $begin is the position of the first character
after the '>'
			my $begin = tell(FASTA) - length( $_ ) + 1;

			foreach my $id (&$id_parser($_)) {
				$self->add_record($id, $i, $begin);
			}
		}
	}

Which simply looks for '>'.  That's fine for a vast majority of sequences.
I thought it would be nice to have something that's a little more strenuous
in verifying the format rather than trusting it implicitly, maybe by using
an eval{} block to make sure the format is FASTA-like and looks like
DNA/RNA/protein.  

Chris


> >> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >> |> >Antisense;
> >> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> >> |
> >> |Unfortunately that's not Fasta format (which only has a single header
> >> |line starting with a '>'.  I'd imagine that most programs which deal
> >> |with fasta which read that entry would see it as two sequences, the
> >> |first of which is empty.
> >> |
> >>
> >> [snipped]
> >>
> >> hi,
> >>
> >> I think the file is in fasta format and probably you might have seen it
> >> differently because of your mail transport agent.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Fri Jun  9 13:23:21 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 9 Jun 2006 15:23:21 +0200
Subject: [Bioperl-l] SimpleAlign
Message-ID: <716af09c0606090623v37c72bc5r1ddbcb2b8355a4a0@mail.gmail.com>

Hi,

Two queries with respect to SimpleAlign. I am using the following code
based on the POD.

my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
print $out $_ while <$in>;

1) is it possible to set set_displayname_flat() globally without doing
$_->set_displayname_flat() per alignment.

2) My input files have an ID and description line for each seq in the
alignment. When the file is converted I loose the description line. I
know I can get the description of the sequences (e.g.
$aln->get_seq_by_pos(2)->description()).
How could I export the complete fasta defline including the
description (I realize that general clustal format has a limit on the
number of characters, but still).

Regards,
Bernd


From oldham at ucla.edu  Sat Jun 10 01:39:45 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 9 Jun 2006 18:39:45 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F92B@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>

Thanks to everyone for their helpful advice.  I think I am getting closer,
but no cigar quite yet.  The script below runs quickly with no errors--but
the output file is empty.  It seems that the problem must lie somewhere in
the 'while' loop, and I'm sure it's quite obvious to a more experienced
eye--but not to mine!  Any suggestions?  Thanks again for your help.

--Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID.dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Friday, June 09, 2006 7:58 AM
To: Michael Oldham; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single large
file


I wouldn't bioperl for this, or create an index.  Perl would do fine and
probably be faster.

Assuming your ids are one per line in a file named id.dat looking like
this

1138_at
1134_at
etc..

this should work:

perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
<idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
mybigfile.fa

good luck

--Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Michael Oldham
>Sent: Thursday, June 08, 2006 9:08 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Output a subset of FASTA data from a
>single large file
>
>Dear all,
>
>I am a total Bioperl newbie struggling to accomplish a
>conceptually simple
>task.  I have a single large fasta file containing about 200,000 probe
>sequences (from an Affymetrix microarray), each of which looks
>like this:
>
>>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>Antisense;
>TGGCTCCTGCTGAGGTCCCCTTTCC
>
>What I would like to do is extract from this file a subset of ~130,800
>probes (both the header and the sequence) and output this
>subset into a new
>fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>("1138_at" is the probe set ID in the header listed above); I
>have these
>8,175 IDs listed in a separate file.  I *think* that I managed
>to create an
>index of all 200,000 probes in the original fasta file using
>the following
>script:
>
>#!/usr/bin/perl -w
>
> # script 1: create the index
>
> use Bio::Index::Fasta;
> use strict;
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fasta->new(
>     -filename => $Index_File_Name,
>     -write_flag => 1);
> $inx->make_index(@ARGV);
>
>I'm not sure if this is the most sensible approach, and even
>if it is, I'm
>not sure what to do next.  Any help would be greatly appreciated!
>
>Many thanks,
>Mike O.
>
>
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: 6/9/2006


From cjfields at uiuc.edu  Sun Jun 11 04:32:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Jun 2006 23:32:04 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <F4E1042A-CE2D-4E51-B711-BDBB6E052FEB@uiuc.edu>

What happens if you just print $idmatch or $1 (i.e. check to see if  
the regex matches anything)?  If there is nothing printed then either  
the regex isn't working as expected or there is something logically  
wrong.  The problem may be that the captured string must match the id  
exactly, the id being the key to the %ID hash; any extra characters  
picked up by the regex outside of your id key and you will not get  
anything.  Looking at Malcolm's regex it should work just fine, but  
we only had one example sequence to try here.

If your while loop is set up like this won't it only print only the  
matched description lines to the outfile (no sequence) even if there  
is a match?  Or is this what you wanted?   If you want the sequence  
you should add 'print OUT <PROBES>;' after the 'print OUT;' line.

Chris

On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:

> Thanks to everyone for their helpful advice.  I think I am getting  
> closer,
> but no cigar quite yet.  The script below runs quickly with no  
> errors--but
> the output file is empty.  It seems that the problem must lie  
> somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more  
> experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>
> --Mike O.
>
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID.dat.txt';
>
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
>
> my $probes = 'HG_U95Av2_probe_fasta.txt';
>
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
> keys=PSIDs and
> all values=1.
>
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;
>
>
> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Friday, June 09, 2006 7:58 AM
> To: Michael Oldham; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
> single large
> file
>
>
>
> I wouldn't bioperl for this, or create an index.  Perl would do  
> fine and
> probably be faster.
>
> Assuming your ids are one per line in a file named id.dat looking like
> this
>
> 1138_at
> 1134_at
> etc..
>
> this should work:
>
> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> mybigfile.fa
>
> good luck
>
> --Malcolm Cook
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Michael Oldham
>> Sent: Thursday, June 08, 2006 9:08 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>> single large file
>>
>> Dear all,
>>
>> I am a total Bioperl newbie struggling to accomplish a
>> conceptually simple
>> task.  I have a single large fasta file containing about 200,000  
>> probe
>> sequences (from an Affymetrix microarray), each of which looks
>> like this:
>>
>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> Antisense;
>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>
>> What I would like to do is extract from this file a subset of  
>> ~130,800
>> probes (both the header and the sequence) and output this
>> subset into a new
>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>> ("1138_at" is the probe set ID in the header listed above); I
>> have these
>> 8,175 IDs listed in a separate file.  I *think* that I managed
>> to create an
>> index of all 200,000 probes in the original fasta file using
>> the following
>> script:
>>
>> #!/usr/bin/perl -w
>>
>> # script 1: create the index
>>
>> use Bio::Index::Fasta;
>> use strict;
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fasta->new(
>>     -filename => $Index_File_Name,
>>     -write_flag => 1);
>> $inx->make_index(@ARGV);
>>
>> I'm not sure if this is the most sensible approach, and even
>> if it is, I'm
>> not sure what to do next.  Any help would be greatly appreciated!
>>
>> Many thanks,
>> Mike O.
>>
>>
>>
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
> 6/8/2006
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
> 6/9/2006
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 08:21:31 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:21:31 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <002e01c68bf7$b594d210$15327e82@pyrimidine>
References: <002e01c68bf7$b594d210$15327e82@pyrimidine>
Message-ID: <448D240B.6040508@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> There's information in the HOWTOs:
> 
> http://www.bioperl.org/wiki/HOWTO:Flat_databases
> 
> http://www.bioperl.org/wiki/HOWTO:OBDA
> 
> Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
> ('fasta' format I/O) and this is what I got as output:
> 
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> 
> 
> i.e. an empty sequence, which is what I guessed might happen
[snip]

As you later discovered, that was an Outlook problem. Just to make this 
thread relevant to bioperl, the bioperl solution is:

use Bio::SeqIO;
use Bio::Index::Fasta;
my $inx = Bio::Index::Fasta->new(-write_flag => 1);
$inx->id_parser(\&get_id);
$inx->make_index(shift);

my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
my $wanted_ids_file = shift;
open(IDS, $wanted_ids_file);
while (<IDS>) {
   chomp;
   my $seq = $inx->fetch($_);
   $out->write_seq($seq);
}

sub get_id {
   my $line = shift;
   $line =~ /^>probe:\S+?:(\S+?):/;
   $1;
}

It works for me on the sample sequence given by the OP.


From sb at mrc-dunn.cam.ac.uk  Mon Jun 12 08:49:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 12 Jun 2006 09:49:49 +0100
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
 file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDGENHCJAA.oldham@ucla.edu>
Message-ID: <448D2AAD.3030601@mrc-dunn.cam.ac.uk>

Michael Oldham wrote:
> Thanks to everyone for their helpful advice.  I think I am getting closer,
> but no cigar quite yet.  The script below runs quickly with no errors--but
> the output file is empty.  It seems that the problem must lie somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more experienced
> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> 
> --Mike O.
> 
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> my $IDs = 'ID.dat.txt';
> 
> unless (open(IDFILE, $IDs)) {
> 	print "Could not open file $IDs!\n";
> 	}
> 
> my $probes = 'HG_U95Av2_probe_fasta.txt';
> 
> unless (open(PROBES, $probes)) {
> 	print "Could not open file $probes!\n";
> 	}
> 
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> 
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
> all values=1.
> 
> 	while (<PROBES>) {
> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> 		if ($idmatch){
> 			print OUT;
> 		}
> 	}
> exit;

Not sure why it would print nothing (are the ids in IDFILE the same case 
as the ids in the fasta file, do they only contain word characters?), 
but even if it did you would only be printing out the fasta headers and 
not the sequences. Doing it the bioperl way gives you more flexibility 
in the future; you may want to do something with the sequences after 
printing them out, in which case do it in bioperl using Seq objects and 
skip the intermediate step of printing them.


From MEC at stowers-institute.org  Mon Jun 12 15:28:41 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:28:41 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
Message-ID: <CED81D34E37D5043A1211565277A51E50563F98D@exchkc02.stowers-institute.org>

Michael,

I don't think you can call perl's `print` on just a filehandle as you
are doing.  This is probably your problem.

If you call `select OUT` after opeining it, print will print $_ to it.
And, every line in the fasta record whose header matches on of the IDS
will get printed, not just the fasta header lines.  Read the code again
nothing that $idmatch is only getting reset when a correctly formatted
fasta header line is matched.

--Malcolm


>-----Original Message-----
>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>Sent: Saturday, June 10, 2006 11:32 PM
>To: Michael Oldham
>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single large file
>
>What happens if you just print $idmatch or $1 (i.e. check to see if  
>the regex matches anything)?  If there is nothing printed then either  
>the regex isn't working as expected or there is something logically  
>wrong.  The problem may be that the captured string must match the id  
>exactly, the id being the key to the %ID hash; any extra characters  
>picked up by the regex outside of your id key and you will not get  
>anything.  Looking at Malcolm's regex it should work just fine, but  
>we only had one example sequence to try here.
>
>If your while loop is set up like this won't it only print only the  
>matched description lines to the outfile (no sequence) even if there  
>is a match?  Or is this what you wanted?   If you want the sequence  
>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>
>Chris
>
>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>
>> Thanks to everyone for their helpful advice.  I think I am getting  
>> closer,
>> but no cigar quite yet.  The script below runs quickly with no  
>> errors--but
>> the output file is empty.  It seems that the problem must lie  
>> somewhere in
>> the 'while' loop, and I'm sure it's quite obvious to a more  
>> experienced
>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>
>> --Mike O.
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>>
>> my $IDs = 'ID.dat.txt';
>>
>> unless (open(IDFILE, $IDs)) {
>> 	print "Could not open file $IDs!\n";
>> 	}
>>
>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>
>> unless (open(PROBES, $probes)) {
>> 	print "Could not open file $probes!\n";
>> 	}
>>
>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>
>> my @ID = <IDFILE>;
>> chomp @ID;
>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>> keys=PSIDs and
>> all values=1.
>>
>> 	while (<PROBES>) {
>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>> 		if ($idmatch){
>> 			print OUT;
>> 		}
>> 	}
>> exit;
>>
>>
>> -----Original Message-----
>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>> Sent: Friday, June 09, 2006 7:58 AM
>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>> single large
>> file
>>
>>
>>
>> I wouldn't bioperl for this, or create an index.  Perl would do  
>> fine and
>> probably be faster.
>>
>> Assuming your ids are one per line in a file named id.dat 
>looking like
>> this
>>
>> 1138_at
>> 1134_at
>> etc..
>>
>> this should work:
>>
>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>> mybigfile.fa
>>
>> good luck
>>
>> --Malcolm Cook
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Michael Oldham
>>> Sent: Thursday, June 08, 2006 9:08 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>> single large file
>>>
>>> Dear all,
>>>
>>> I am a total Bioperl newbie struggling to accomplish a
>>> conceptually simple
>>> task.  I have a single large fasta file containing about 200,000  
>>> probe
>>> sequences (from an Affymetrix microarray), each of which looks
>>> like this:
>>>
>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>> Antisense;
>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>
>>> What I would like to do is extract from this file a subset of  
>>> ~130,800
>>> probes (both the header and the sequence) and output this
>>> subset into a new
>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>> ("1138_at" is the probe set ID in the header listed above); I
>>> have these
>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>> to create an
>>> index of all 200,000 probes in the original fasta file using
>>> the following
>>> script:
>>>
>>> #!/usr/bin/perl -w
>>>
>>> # script 1: create the index
>>>
>>> use Bio::Index::Fasta;
>>> use strict;
>>> my $Index_File_Name = shift;
>>> my $inx = Bio::Index::Fasta->new(
>>>     -filename => $Index_File_Name,
>>>     -write_flag => 1);
>>> $inx->make_index(@ARGV);
>>>
>>> I'm not sure if this is the most sensible approach, and even
>>> if it is, I'm
>>> not sure what to do next.  Any help would be greatly appreciated!
>>>
>>> Many thanks,
>>> Mike O.
>>>
>>>
>>>
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>> 6/8/2006
>>
>> --
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>> 6/9/2006
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>Christopher Fields
>Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>Dept of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>


From MEC at stowers-institute.org  Mon Jun 12 15:47:09 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:47:09 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F991@exchkc02.stowers-institute.org>

ooops, in my message 


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From MEC at stowers-institute.org  Mon Jun 12 15:48:02 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 10:48:02 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>

oops,

s/matches on of/matches one of/
s/nothing that/noting that/ 

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu] 
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if  
>>the regex matches anything)?  If there is nothing printed 
>then either  
>>the regex isn't working as expected or there is something logically  
>>wrong.  The problem may be that the captured string must 
>match the id  
>>exactly, the id being the key to the %ID hash; any extra characters  
>>picked up by the regex outside of your id key and you will not get  
>>anything.  Looking at Malcolm's regex it should work just fine, but  
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the  
>>matched description lines to the outfile (no sequence) even if there  
>>is a match?  Or is this what you wanted?   If you want the sequence  
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting  
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no  
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie  
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more  
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with  
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a  
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do  
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat 
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000  
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of  
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:  
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:  
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From hubert.prielinger at gmx.at  Mon Jun 12 18:29:19 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 12 Jun 2006 12:29:19 -0600
Subject: [Bioperl-l] How to use gi2taxonid
Message-ID: <448DB27F.6090107@gmx.at>

hi,
I have downloaded the gi2taxonid file to get the taxonid for a GI number 
taken from a report as recommended here, but I don't know how to use the 
gi2taxonid file.
Jason wrote in a previous post that you have to make a DB_File out of 
it, but I don't know how....and finally tie it to a hash....
Can anybody give me a hint how to use it..... my final goal is to get 
the taxonomy.

thanks
Hubert


From cjfields at uiuc.edu  Mon Jun 12 19:13:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 14:13:30 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <000f01c68e54$4d155ac0$15327e82@pyrimidine>

Michael, Malcolm et al,

I ran Michael's code (not Malcolm's one-liner), with and w/o adding the file
handle line that I suggested.  My suggestion works b/c I'm calling the file
handle in scalar context, which reads the next line, just like '$foo =
<FILE>' or 'while(<FILE>) {}' advances to the next line (with $/ = "\n")
each time the file handle is called.  You could use:

$_ = <PROBES>;
print OUT;

I just chopped it down to one line.

Without the extra line I suggested I get only the description line (I used
this as a test file based on the original sequence and Michael's description
of the ID):

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;

Which I don't think Michael wants (he mentioned sequence and description, I
think).  

Modifying the loop in Michael's code to:
...

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT <PROBES>; # grabs next line and prints
	}
}

Gets:

>probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense;
AGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense;
TGGCTCCTGCTGAGGTCCCCTATCC
>probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense;
TGGATCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense
;
TGGCTACTGCTGAGGTCCCCTTTCC

Which matches the ID's in the ID file (there are 10 sequences in the probes
file).  

I did notice one odd thing; I tried the above code on Mac OS X and it worked
fine (i.e. printed only the descriptions and sequences for the ID's in the
ID hash).  If I used Windows, I needed to use this version:

while (<PROBES>) {
	my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
	if ($idmatch){
		print OUT;
		print OUT scalar(<PROBES>);		
	}
}

Or 'print <PROBES>;' prints all sequences (I guess it assumes list context
instead of scalar context when printing, so this forces it to be scalar).

Like I said, I haven't tried Malcolm's one-liner.  It's possible that it
works just as well as what I suggested.  I'm just responding to Michael's
code request.

Chris


> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 10:48 AM
> To: Cook, Malcolm; Chris Fields; Michael Oldham
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> oops,
> 
> s/matches on of/matches one of/
> s/nothing that/noting that/
> 
> --Malcolm
> 
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >Cook, Malcolm
> >Sent: Monday, June 12, 2006 10:29 AM
> >To: Chris Fields; Michael Oldham
> >Cc: bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >Michael,
> >
> >I don't think you can call perl's `print` on just a filehandle as you
> >are doing.  This is probably your problem.
> >
> >If you call `select OUT` after opeining it, print will print $_ to it.
> >And, every line in the fasta record whose header matches on of the IDS
> >will get printed, not just the fasta header lines.  Read the code again
> >nothing that $idmatch is only getting reset when a correctly formatted
> >fasta header line is matched.
> >
> >--Malcolm
> >
> >
> >>-----Original Message-----
> >>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>Sent: Saturday, June 10, 2006 11:32 PM
> >>To: Michael Oldham
> >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
> >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >>single large file
> >>
> >>What happens if you just print $idmatch or $1 (i.e. check to see if
> >>the regex matches anything)?  If there is nothing printed
> >then either
> >>the regex isn't working as expected or there is something logically
> >>wrong.  The problem may be that the captured string must
> >match the id
> >>exactly, the id being the key to the %ID hash; any extra characters
> >>picked up by the regex outside of your id key and you will not get
> >>anything.  Looking at Malcolm's regex it should work just fine, but
> >>we only had one example sequence to try here.
> >>
> >>If your while loop is set up like this won't it only print only the
> >>matched description lines to the outfile (no sequence) even if there
> >>is a match?  Or is this what you wanted?   If you want the sequence
> >>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
> >>
> >>Chris
> >>
> >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
> >>
> >>> Thanks to everyone for their helpful advice.  I think I am getting
> >>> closer,
> >>> but no cigar quite yet.  The script below runs quickly with no
> >>> errors--but
> >>> the output file is empty.  It seems that the problem must lie
> >>> somewhere in
> >>> the 'while' loop, and I'm sure it's quite obvious to a more
> >>> experienced
> >>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
> >>>
> >>> --Mike O.
> >>>
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> use strict;
> >>>
> >>> my $IDs = 'ID.dat.txt';
> >>>
> >>> unless (open(IDFILE, $IDs)) {
> >>> 	print "Could not open file $IDs!\n";
> >>> 	}
> >>>
> >>> my $probes = 'HG_U95Av2_probe_fasta.txt';
> >>>
> >>> unless (open(PROBES, $probes)) {
> >>> 	print "Could not open file $probes!\n";
> >>> 	}
> >>>
> >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
> >>>
> >>> my @ID = <IDFILE>;
> >>> chomp @ID;
> >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
> >>> keys=PSIDs and
> >>> all values=1.
> >>>
> >>> 	while (<PROBES>) {
> >>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> >>> 		if ($idmatch){
> >>> 			print OUT;
> >>> 		}
> >>> 	}
> >>> exit;
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> >>> Sent: Friday, June 09, 2006 7:58 AM
> >>> To: Michael Oldham; bioperl-l at lists.open-bio.org
> >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
> >>> single large
> >>> file
> >>>
> >>>
> >>>
> >>> I wouldn't bioperl for this, or create an index.  Perl would do
> >>> fine and
> >>> probably be faster.
> >>>
> >>> Assuming your ids are one per line in a file named id.dat
> >>looking like
> >>> this
> >>>
> >>> 1138_at
> >>> 1134_at
> >>> etc..
> >>>
> >>> this should work:
> >>>
> >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
> >>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
> >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
> >>> mybigfile.fa
> >>>
> >>> good luck
> >>>
> >>> --Malcolm Cook
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>>> Michael Oldham
> >>>> Sent: Thursday, June 08, 2006 9:08 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
> >>>> single large file
> >>>>
> >>>> Dear all,
> >>>>
> >>>> I am a total Bioperl newbie struggling to accomplish a
> >>>> conceptually simple
> >>>> task.  I have a single large fasta file containing about 200,000
> >>>> probe
> >>>> sequences (from an Affymetrix microarray), each of which looks
> >>>> like this:
> >>>>
> >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> >>>> Antisense;
> >>>> TGGCTCCTGCTGAGGTCCCCTTTCC
> >>>>
> >>>> What I would like to do is extract from this file a subset of
> >>>> ~130,800
> >>>> probes (both the header and the sequence) and output this
> >>>> subset into a new
> >>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
> >>>> ("1138_at" is the probe set ID in the header listed above); I
> >>>> have these
> >>>> 8,175 IDs listed in a separate file.  I *think* that I managed
> >>>> to create an
> >>>> index of all 200,000 probes in the original fasta file using
> >>>> the following
> >>>> script:
> >>>>
> >>>> #!/usr/bin/perl -w
> >>>>
> >>>> # script 1: create the index
> >>>>
> >>>> use Bio::Index::Fasta;
> >>>> use strict;
> >>>> my $Index_File_Name = shift;
> >>>> my $inx = Bio::Index::Fasta->new(
> >>>>     -filename => $Index_File_Name,
> >>>>     -write_flag => 1);
> >>>> $inx->make_index(@ARGV);
> >>>>
> >>>> I'm not sure if this is the most sensible approach, and even
> >>>> if it is, I'm
> >>>> not sure what to do next.  Any help would be greatly appreciated!
> >>>>
> >>>> Many thanks,
> >>>> Mike O.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> No virus found in this outgoing message.
> >>>> Checked by AVG Free Edition.
> >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>>> 6/8/2006
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>> --
> >>> No virus found in this incoming message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
> >>> 6/8/2006
> >>>
> >>> --
> >>> No virus found in this outgoing message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
> >>> 6/9/2006
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>Christopher Fields
> >>Postdoctoral Researcher
> >>Lab of Dr. Robert Switzer
> >>Dept of Biochemistry
> >>University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From hlapp at gmx.net  Mon Jun 12 20:06:23 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 12 Jun 2006 16:06:23 -0400
Subject: [Bioperl-l] How to use gi2taxonid
In-Reply-To: <448DB27F.6090107@gmx.at>
References: <448DB27F.6090107@gmx.at>
Message-ID: <878FB829-AD31-457D-957E-210448D7F6F5@gmx.net>

Thought about typing

	$ perldoc DB_File

at the command line?

Hubert, are you trying to outsource what should be your own work to  
the bioperl list, or what motivates you to waste everybody's time? If  
you google 'how to ask good questions' this (indeed frequently cited,  
also on the bioperl list if you had paid attention) comes up as the  
first link:

http://www.catb.org/~esr/faqs/smart-questions.html

There's nothing I can add, except to read it in full before your next  
posting or you may reach the point fast at which nobody will bother  
to respond to you and do your homework for you.

On Jun 12, 2006, at 2:29 PM, Hubert Prielinger wrote:

> hi,
> I have downloaded the gi2taxonid file to get the taxonid for a GI  
> number
> taken from a report as recommended here, but I don't know how to  
> use the
> gi2taxonid file.
> Jason wrote in a previous post that you have to make a DB_File out of
> it, but I don't know how....and finally tie it to a hash....
> Can anybody give me a hint how to use it..... my final goal is to get
> the taxonomy.
>
> thanks
> Hubert
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon Jun 12 20:35:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 15:35:10 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <448D240B.6040508@mrc-dunn.cam.ac.uk>
Message-ID: <001201c68e5f$b34ec8c0$15327e82@pyrimidine>

...
> Chris Fields wrote:
> > There's information in the HOWTOs:
> >
> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >
> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >
...
> As you later discovered, that was an Outlook problem. Just to make this
> thread relevant to bioperl, the bioperl solution is:

Agreed (stupid Outlook).  It might be much faster to use non-Bioperl-ish
ways, but it is easier to further manipulate sequences (convert format,
analyze sequences, etc) using Bioperl directly.  I haven't used flat
databases much but it should move very quickly, even in an OO environment.

The one problem with the proposed non-bioperl method is, if you wanted
100,000 sequences (based on ID's) in a FASTA database file containing
200,000 sequences, all ID's would need to be stored (1) in an array (which
gulped the data from the ID file) and then map the ID's to (2) a hash;
that's may be a pretty big memory footprint depending on your system.  

Sendu's BioPerl version indexes the FASTA file based on the ID, then (1)
reads the ID's in one at a time from the file, (2) retrieves the data, then
(3) prints it out.   The advantage of this approach is that the built index
can be used in other bioperl scripts as well w/o having to rebuild it again,
so if you wanted a different set of ID's later on you can access the
database using the prebuilt index.  More can be found in the
Bio::Index::Fasta POD.  

You can also use the ideas and code in the HOWTO (Flat Databases) I
mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
advantage of these is that you can use Sleepycat's Berkeley Database through
the Perl BerkeleyDB module (more functionality than DB_File) which is faster
than a standard flat database.  In the HOWTO, specifically look under
'Secondary or custom namespaces' for ideas on how to use your ID as a
primary or secondary key.

Chris

> use Bio::SeqIO;
> use Bio::Index::Fasta;
> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> $inx->id_parser(\&get_id);
> $inx->make_index(shift);
> 
> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> my $wanted_ids_file = shift;
> open(IDS, $wanted_ids_file);
> while (<IDS>) {
>    chomp;
>    my $seq = $inx->fetch($_);
>    $out->write_seq($seq);
> }
> 
> sub get_id {
>    my $line = shift;
>    $line =~ /^>probe:\S+?:(\S+?):/;
>    $1;
> }
> 
> It works for me on the sample sequence given by the OP.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Mon Jun 12 20:23:45 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 12 Jun 2006 16:23:45 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>

I'm trying to install the bioperl-run package and an getting errors from
make test regarding PAML:

t/PAML....................ok 2/18Can't call method "get_MLmatrix" on an
undefined value at t/PAML.t line 85, <GEN2> line 85.
t/PAML....................dubious
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Is this a legitimate error or am I missing something?

Ryan


From MEC at stowers-institute.org  Mon Jun 12 21:15:35 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Mon, 12 Jun 2006 16:15:35 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>

Yeah, good points...

... my recommendation of the one-liner was motivated based on a small
number of IDs and no other applications needing to index the entire
fasta database.


--Malcolm [At which point he bowed out of this fray]

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>Sent: Monday, June 12, 2006 3:35 PM
>To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>...
>> Chris Fields wrote:
>> > There's information in the HOWTOs:
>> >
>> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
>> >
>> > http://www.bioperl.org/wiki/HOWTO:OBDA
>> >
>...
>> As you later discovered, that was an Outlook problem. Just 
>to make this
>> thread relevant to bioperl, the bioperl solution is:
>
>Agreed (stupid Outlook).  It might be much faster to use 
>non-Bioperl-ish
>ways, but it is easier to further manipulate sequences (convert format,
>analyze sequences, etc) using Bioperl directly.  I haven't used flat
>databases much but it should move very quickly, even in an OO 
>environment.
>
>The one problem with the proposed non-bioperl method is, if you wanted
>100,000 sequences (based on ID's) in a FASTA database file containing
>200,000 sequences, all ID's would need to be stored (1) in an 
>array (which
>gulped the data from the ID file) and then map the ID's to (2) a hash;
>that's may be a pretty big memory footprint depending on your system.  
>
>Sendu's BioPerl version indexes the FASTA file based on the 
>ID, then (1)
>reads the ID's in one at a time from the file, (2) retrieves 
>the data, then
>(3) prints it out.   The advantage of this approach is that 
>the built index
>can be used in other bioperl scripts as well w/o having to 
>rebuild it again,
>so if you wanted a different set of ID's later on you can access the
>database using the prebuilt index.  More can be found in the
>Bio::Index::Fasta POD.  
>
>You can also use the ideas and code in the HOWTO (Flat Databases) I
>mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
>advantage of these is that you can use Sleepycat's Berkeley 
>Database through
>the Perl BerkeleyDB module (more functionality than DB_File) 
>which is faster
>than a standard flat database.  In the HOWTO, specifically look under
>'Secondary or custom namespaces' for ideas on how to use your ID as a
>primary or secondary key.
>
>Chris
>
>> use Bio::SeqIO;
>> use Bio::Index::Fasta;
>> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
>> $inx->id_parser(\&get_id);
>> $inx->make_index(shift);
>> 
>> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
>> my $wanted_ids_file = shift;
>> open(IDS, $wanted_ids_file);
>> while (<IDS>) {
>>    chomp;
>>    my $seq = $inx->fetch($_);
>>    $out->write_seq($seq);
>> }
>> 
>> sub get_id {
>>    my $line = shift;
>>    $line =~ /^>probe:\S+?:(\S+?):/;
>>    $1;
>> }
>> 
>> It works for me on the sample sequence given by the OP.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Jun 12 21:20:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Jun 2006 16:20:55 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9B0@exchkc02.stowers-institute.org>
Message-ID: <001601c68e66$17b760a0$15327e82@pyrimidine>

Sorry Malcolm.  I didn't want to imply that your way or the bioperl way was
best, just point out advantages/disadvantages.  

Oops, didn't point out the possible Bioperl disadvantage (too many objects
generated = slow slow slow).  

Chris

> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
> Sent: Monday, June 12, 2006 4:16 PM
> To: Chris Fields; Sendu Bala; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
> largefile
> 
> Yeah, good points...
> 
> ... my recommendation of the one-liner was motivated based on a small
> number of IDs and no other applications needing to index the entire
> fasta database.
> 
> 
> --Malcolm [At which point he bowed out of this fray]
> 
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Monday, June 12, 2006 3:35 PM
> >To: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
> >single largefile
> >
> >...
> >> Chris Fields wrote:
> >> > There's information in the HOWTOs:
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:Flat_databases
> >> >
> >> > http://www.bioperl.org/wiki/HOWTO:OBDA
> >> >
> >...
> >> As you later discovered, that was an Outlook problem. Just
> >to make this
> >> thread relevant to bioperl, the bioperl solution is:
> >
> >Agreed (stupid Outlook).  It might be much faster to use
> >non-Bioperl-ish
> >ways, but it is easier to further manipulate sequences (convert format,
> >analyze sequences, etc) using Bioperl directly.  I haven't used flat
> >databases much but it should move very quickly, even in an OO
> >environment.
> >
> >The one problem with the proposed non-bioperl method is, if you wanted
> >100,000 sequences (based on ID's) in a FASTA database file containing
> >200,000 sequences, all ID's would need to be stored (1) in an
> >array (which
> >gulped the data from the ID file) and then map the ID's to (2) a hash;
> >that's may be a pretty big memory footprint depending on your system.
> >
> >Sendu's BioPerl version indexes the FASTA file based on the
> >ID, then (1)
> >reads the ID's in one at a time from the file, (2) retrieves
> >the data, then
> >(3) prints it out.   The advantage of this approach is that
> >the built index
> >can be used in other bioperl scripts as well w/o having to
> >rebuild it again,
> >so if you wanted a different set of ID's later on you can access the
> >database using the prebuilt index.  More can be found in the
> >Bio::Index::Fasta POD.
> >
> >You can also use the ideas and code in the HOWTO (Flat Databases) I
> >mentioned, which focuses on the Bio::DB::Flat system and ODBA.  The
> >advantage of these is that you can use Sleepycat's Berkeley
> >Database through
> >the Perl BerkeleyDB module (more functionality than DB_File)
> >which is faster
> >than a standard flat database.  In the HOWTO, specifically look under
> >'Secondary or custom namespaces' for ideas on how to use your ID as a
> >primary or secondary key.
> >
> >Chris
> >
> >> use Bio::SeqIO;
> >> use Bio::Index::Fasta;
> >> my $inx = Bio::Index::Fasta->new(-write_flag => 1);
> >> $inx->id_parser(\&get_id);
> >> $inx->make_index(shift);
> >>
> >> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
> >> my $wanted_ids_file = shift;
> >> open(IDS, $wanted_ids_file);
> >> while (<IDS>) {
> >>    chomp;
> >>    my $seq = $inx->fetch($_);
> >>    $out->write_seq($seq);
> >> }
> >>
> >> sub get_id {
> >>    my $line = shift;
> >>    $line =~ /^>probe:\S+?:(\S+?):/;
> >>    $1;
> >> }
> >>
> >> It works for me on the sample sequence given by the OP.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >


From roy at colibase.bham.ac.uk  Mon Jun 12 15:46:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 12 Jun 2006 16:46:49 +0100
Subject: [Bioperl-l] Truncate sequence with features
In-Reply-To: <200606090935.12758.heikki@sanbi.ac.za>
References: <448850CE.1040105@colibase.bham.ac.uk>
	<200606090935.12758.heikki@sanbi.ac.za>
Message-ID: <448D8C69.4030005@colibase.bham.ac.uk>

Hi Heikki.

> Two questions come to mind:
> 
> 1. Can you parse your joint location using bioperl without errors?
Seems to work fine as far as I can tell (no errors, and to_FTstring 
reproduces the location as expected).

> 2. Is there a practical advantage in including a location which has no 
> relevance to the sequence in hand?
I think it would be misleading to imply that a location was complete 
when it is only a part of the originally annotated feature. From the FT 
definition the other possibility would be to include the missing parts 
of the feature as remote locations, I guess that may be more satisfactory.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From colin.erdman at du.edu  Mon Jun 12 19:52:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Mon, 12 Jun 2006 13:52:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
Message-ID: <1150141965.2992.17.camel@localhost.localdomain>

Hello all,

I am doing a project relating to some forensic analysis of mitochondrial
DNA. 

I would like to write a script that will take a reference sequence, in
this case the Anderson sequence which is the standard mitochondrial
sequence which sample sequences are compared to, and compare it to an
unknown sequence.

I have been using this script:

use Bio::SearchIO;
use strict;
my $fh;
my @nomatches;
open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p blastn |") || die $!;

my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);

if( my $result = $parser->next_result ) { 
     if( my $hit = $result->next_hit ) {   
     if( my $hsp = $hit->next_hsp ) { 
         my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
	 my ( @hitbases) = $hsp->hit_string;
	 my ( @querybases) = $hsp->query_string;
	 my $seq_string = join("", at querybases);
	 my $seq_string1 = join("", at hitbases);
         for my $base (  @qmismatches ) {
            print "base $base of the hit sequence is a mismatch: ";
	    print substr $seq_string, $base-1, 1;
	    print "->";
            print substr $seq_string1, $base-1, 1;
            print "\n";
        }
	
     }
     }
}


The problem is, that some mitochondrial sequences from individuals have
insertions, deletion etc, that cause them to be offset from the
reference sequence, this then offsets the numbering system.

To provide an example:

>Anderson Reference Sequence|HV2
ATTTGGT...
1234567

>Sample|HV2....
ATTTG|C|GT
12345,5.1,67

The |C| denote an insertion, and traditionally in the forensics community
this would be called position 5.1G, but the program reads it as position 6.

So basically I need to figure out how to modify a perl script in order to recognize 
that 5.1G is an insertion, and that it is not position 6, position 6 is actually 
the G to the right of it, followed by position 7-T.

Any ideas and suggestions would be greatly helpful, I know this could be very tricky,
or very easy - I just have come to the point where the idea flow has stopped and would 
love to gather some outside input.

Thanks
Colin Erdman
colin.erdman at du.edu
Undergraduate Research Associate
Institute For Forensic Genetic
University of Denver 


From jason at bioperl.org  Tue Jun 13 14:19:04 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 10:19:04 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
Message-ID: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>

The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors  
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"  
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason at bioperl.org  Tue Jun 13 15:45:27 2006
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 13 Jun 2006 11:45:27 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1>
	<B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <F802F582-28E4-4761-873C-2A49A60B3593@bioperl.org>

And just to say - codeml 3.15 parsing does work - yn00 parsing just  
hasn't been updated.   I agree that it is bad the test is failing but  
it is dependent on the version that is installed and we should put  
some sort of detect version-skip test code in there so it doesn't  
cause the tests to fail.  Just need more hands on deck tracking these  
sort of things....

-jason
On Jun 13, 2006, at 10:19 AM, Jason Stajich wrote:

> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start  
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
>
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
>
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
>
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
>
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>>
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>>
>> Is this a legitimate error or am I missing something?
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Tue Jun 13 16:04:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 12:04:46 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001001c68f03$17429070$e6028a0a@GOLHARMOBILE1>

I'll take a look at it and see what I can do.  While I'm at it,
bioperl-run tests a module called Coil, but I don't have that installed.
The documentation doesn't specify where I can get this application.
Does anyone know where Coil comes from?


-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Kevin.M.Brown at asu.edu  Tue Jun 13 17:42:40 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 13 Jun 2006 10:42:40 -0700
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?


From aaron.j.mackey at gsk.com  Tue Jun 13 12:19:11 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 13 Jun 2006 08:19:11 -0400
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <1150141965.2992.17.camel@localhost.localdomain>
Message-ID: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>

See Bio::LocatableSeq

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:

> Hello all,
> 
> I am doing a project relating to some forensic analysis of mitochondrial
> DNA. 
> 
> I would like to write a script that will take a reference sequence, in
> this case the Anderson sequence which is the standard mitochondrial
> sequence which sample sequences are compared to, and compare it to an
> unknown sequence.
> 
> I have been using this script:
> 
> use Bio::SearchIO;
> use strict;
> my $fh;
> my @nomatches;
> open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> blastn |") || die $!;
> 
> my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> 
> if( my $result = $parser->next_result ) { 
>      if( my $hit = $result->next_hit ) { 
>      if( my $hsp = $hit->next_hsp ) { 
>          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
>     my ( @hitbases) = $hsp->hit_string;
>     my ( @querybases) = $hsp->query_string;
>     my $seq_string = join("", at querybases);
>     my $seq_string1 = join("", at hitbases);
>          for my $base (  @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch: ";
>        print substr $seq_string, $base-1, 1;
>        print "->";
>             print substr $seq_string1, $base-1, 1;
>             print "\n";
>         }
> 
>      }
>      }
> }
> 
> 
> The problem is, that some mitochondrial sequences from individuals have
> insertions, deletion etc, that cause them to be offset from the
> reference sequence, this then offsets the numbering system.
> 
> To provide an example:
> 
> >Anderson Reference Sequence|HV2
> ATTTGGT...
> 1234567
> 
> >Sample|HV2....
> ATTTG|C|GT
> 12345,5.1,67
> 
> The |C| denote an insertion, and traditionally in the forensics 
community
> this would be called position 5.1G, but the program reads it as position 
6.
> 
> So basically I need to figure out how to modify a perl script in 
> order to recognize 
> that 5.1G is an insertion, and that it is not position 6, position 6
> is actually 
> the G to the right of it, followed by position 7-T.
> 
> Any ideas and suggestions would be greatly helpful, I know this 
> could be very tricky,
> or very easy - I just have come to the point where the idea flow has
> stopped and would 
> love to gather some outside input.
> 
> Thanks
> Colin Erdman
> colin.erdman at du.edu
> Undergraduate Research Associate
> Institute For Forensic Genetic
> University of Denver 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From colin.erdman at du.edu  Tue Jun 13 15:12:45 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 09:12:45 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150211566.7034.1.camel@localhost.localdomain>

I could see how this will help... but I am not sure how to implement it
in my situation, I am not very familiar with the Bio::Range or
Bio::Location modules...

Thanks very much,
Colin E.

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From colin.erdman at du.edu  Tue Jun 13 16:05:30 2006
From: colin.erdman at du.edu (Colin Erdman)
Date: Tue, 13 Jun 2006 10:05:30 -0600
Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA
In-Reply-To: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
References: <OF6C07C920.632A6A78-ON8525718C.0043A50C-8525718C.0043AD34@gsk.com>
Message-ID: <1150214730.12044.2.camel@localhost.localdomain>

I actually have found EMBOSS DiffSeq to work quite well for detecting
the insertions and SNPs in the "sample sequence" as compared to the
"reference sequence". 

If I get this all figured out and integrated I will post a method, I
imagine this would prove useful to others as well.

Thanks all,
Colin

On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> 
> > Hello all,
> > 
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA. 
> > 
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> > 
> > I have been using this script:
> > 
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> > blastn |") || die $!;
> > 
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> > 
> > if( my $result = $parser->next_result ) { 
> >      if( my $hit = $result->next_hit ) { 
> >      if( my $hsp = $hit->next_hsp ) { 
> >          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> >     my ( @hitbases) = $hsp->hit_string;
> >     my ( @querybases) = $hsp->query_string;
> >     my $seq_string = join("", at querybases);
> >     my $seq_string1 = join("", at hitbases);
> >          for my $base (  @qmismatches ) {
> >             print "base $base of the hit sequence is a mismatch: ";
> >        print substr $seq_string, $base-1, 1;
> >        print "->";
> >             print substr $seq_string1, $base-1, 1;
> >             print "\n";
> >         }
> > 
> >      }
> >      }
> > }
> > 
> > 
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> > 
> > To provide an example:
> > 
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> > 
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> > 
> > The |C| denote an insertion, and traditionally in the forensics 
> community
> > this would be called position 5.1G, but the program reads it as position 
> 6.
> > 
> > So basically I need to figure out how to modify a perl script in 
> > order to recognize 
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually 
> > the G to the right of it, followed by position 7-T.
> > 
> > Any ideas and suggestions would be greatly helpful, I know this 
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would 
> > love to gather some outside input.
> > 
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 


From golharam at umdnj.edu  Tue Jun 13 18:59:59 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:59:59 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
Message-ID: <002301c68f1b$917b8c80$e6028a0a@GOLHARMOBILE1>

Nevermind - don't check it in yet.  There are still some other problems
not being picked up by the test suite.  I'll work on that and add to the
test suite.  Jason, I'll send you everything once I have it complete.


-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, June 13, 2006 2:34 PM
To: 'Jason Stajich'
Cc: 'bioperl-l at bioperl.org'
Subject: RE: [Bioperl-l] Test errors in bioperl-run


It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors 
> from make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix" on 
> an undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Jonathan_Epstein at nih.gov  Tue Jun 13 18:21:00 2006
From: Jonathan_Epstein at nih.gov (Jonathan_Epstein at nih.gov)
Date: Tue, 13 Jun 2006 14:21:00 -0400
Subject: [Bioperl-l] Blast or blat against custom db?
Message-ID: <0J0T001LE9O5M6@lswsmta04.nmcc.sprintspectrum.com>

sounds like a job for MUMMER (from Steven Salzberg's group).

Jonathan Epstein 

----------- 
Sent from my Treo

-----Original Message-----

From:  "Kevin Brown" <Kevin.M.Brown at asu.edu>
Subj:  [Bioperl-l] Blast or blat against custom db?
Date:  Tue Jun 13, 2006 2:17 pm
Size:  1K
To:  <bioperl-l at lists.open-bio.org>

I've been tasked to write a "small" application for the lab I work at
that basically starts with an NCBI file for an organism and goes through
a number of steps to distill out the unique protein coding sequences and
then designs oligos for the building of the genes.  One of the steps is
comparing the overlap region of the oligos to all the others designed to
try and prevent mismatches in the build that might truncate a gene or
splice in another gene into it during the build step.  I tried to do
this within perl with just a looped string comparison regex, but by my
calculations comparing each half of an oligo with all the other oligos
for this organism results in well over 8 BILLION comparisons needed.
The system was still crunching at it 3 days later with no sign of
nearing completion.

So, my thought was to utilize something like blastall from within the
script to find other oligos of similar match, but it means that I need
to dump out the oligos designed, create the db with formatdb.  Then do
the blast and finally analyze the result file to see what needs to be
changed in the oligos to prevent a mismatch redesigning any matches.
I'm just trying to figure out how to do it all without leaving the
script, but as yet haven't noticed a way to create a db from within perl
using bioperl?

Any thoughts on directions I should look?

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

--- message truncated ---


From golharam at umdnj.edu  Tue Jun 13 18:34:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 14:34:00 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <B3A8C163-9CB0-4944-A072-DFBB84238E9A@bioperl.org>
Message-ID: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>

It looks like the output contains two new sections at the bottom and the
comment sections have been changed slightly.  I've modified
bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
I've attached it to this message.  It passs all the PAML tests from
bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
(or someone) can check it into CVS?

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason at bioperl.org] 
Sent: Tuesday, June 13, 2006 10:19 AM
To: golharam at umdnj.edu
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


The latest version of YN00 (3.15) doesn't work with the current code  
as the output has changed substantially as Yang is now provided  
several different method's simple Ka and Ks calculations.  Downgrade  
to PAML 3.14 or roll up your sleeves and figure out what is breaking  
-- which is the regexp in about line 363 that detects when to start  
parsing for the Pairwise data as well as the function  
parse_YN_Pairwise....

I just don't have very much time anymore to follow changes to the  
software packages so I am hopeful that other developers that use our  
software as do molecular evolutionary studies will get involved to  
help this effort.

I may have to run a few batches of analyses myself later in the week  
using PAML so I will try and fix this if I can make the time.

-jason
On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:

> I'm trying to install the bioperl-run package and an getting errors
> from
> make test regarding PAML:
>
> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> on an
> undefined value at t/PAML.t line 85, <GEN2> line 85.
> t/PAML....................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Is this a legitimate error or am I missing something?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAML.pm
Type: application/octet-stream
Size: 43262 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060613/566881b4/attachment-0004.obj>

From cjfields at uiuc.edu  Wed Jun 14 01:41:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:41:45 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000601c68f53$b1e4b090$15327e82@pyrimidine>

I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Wed Jun 14 01:42:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Jun 2006 20:42:25 -0500
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <000701c68f53$c9addcb0$15327e82@pyrimidine>

Sorry, Brian beat me to it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors
> > from
> > make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix"
> > on an
> > undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From osborne1 at optonline.net  Wed Jun 14 01:38:09 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 13 Jun 2006 21:38:09 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1>
Message-ID: <C0B4E0C1.8D74%osborne1@optonline.net>

Checked in.


On 6/13/06 2:34 PM, "Ryan Golhar" <golharam at umdnj.edu> wrote:

> It looks like the output contains two new sections at the bottom and the
> comment sections have been changed slightly.  I've modified
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00.
> I've attached it to this message.  It passs all the PAML tests from
> bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  Can you
> (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code
> as the output has changed substantially as Yang is now provided
> several different method's simple Ka and Ks calculations.  Downgrade
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start
> parsing for the Pairwise data as well as the function
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the
> software packages so I am hopeful that other developers that use our
> software as do molecular evolutionary studies will get involved to
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
>> I'm trying to install the bioperl-run package and an getting errors
>> from
>> make test regarding PAML:
>> 
>> t/PAML....................ok 2/18Can't call method "get_MLmatrix"
>> on an
>> undefined value at t/PAML.t line 85, <GEN2> line 85.
>> t/PAML....................dubious
>>         Test returned status 2 (wstat 512, 0x200)
>>         after all the subtests completed successfully
>> 
>> Is this a legitimate error or am I missing something?
>> 
>> Ryan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Wed Jun 14 01:55:49 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 13 Jun 2006 21:55:49 -0400
Subject: [Bioperl-l] Test errors in bioperl-run
In-Reply-To: <000601c68f53$b1e4b090$15327e82@pyrimidine>
Message-ID: <000101c68f55$a9fa8ec0$2f01a8c0@GOLHARMOBILE1>

Okay, that's fine.  It does pass the bioperl-live tests.  When I ran the
bp_pairwise_kaks script, it didn't work, the script doesn't work with
3.15.  It looks like the current test suite is not exhaustive.  

When I looked into the code more so, I see that codeml 3.15 generates
some files slightly different than 3.14 which needs to be accounted for.
I'll work on that and post it here...shouldn't be too long.

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Tuesday, June 13, 2006 9:42 PM
To: golharam at umdnj.edu; 'Jason Stajich'
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Test errors in bioperl-run


I committed it.  Passes PAML.t for bioperl-live.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, June 13, 2006 1:34 PM
> To: 'Jason Stajich'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> It looks like the output contains two new sections at the bottom and 
> the comment sections have been changed slightly.  I've modified 
> bioperl-live/Bio/Tools/PAML.pm to read the new and old format from 
> YN00. I've attached it to this message.  It passs all the PAML tests 
> from bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00.  
> Can you (or someone) can check it into CVS?
> 
> Ryan
> 
> -----Original Message-----
> From: Jason Stajich [mailto:jason at bioperl.org]
> Sent: Tuesday, June 13, 2006 10:19 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Test errors in bioperl-run
> 
> 
> The latest version of YN00 (3.15) doesn't work with the current code 
> as the output has changed substantially as Yang is now provided 
> several different method's simple Ka and Ks calculations.  Downgrade 
> to PAML 3.14 or roll up your sleeves and figure out what is breaking
> -- which is the regexp in about line 363 that detects when to start 
> parsing for the Pairwise data as well as the function 
> parse_YN_Pairwise....
> 
> I just don't have very much time anymore to follow changes to the 
> software packages so I am hopeful that other developers that use our 
> software as do molecular evolutionary studies will get involved to 
> help this effort.
> 
> I may have to run a few batches of analyses myself later in the week 
> using PAML so I will try and fix this if I can make the time.
> 
> -jason
> On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote:
> 
> > I'm trying to install the bioperl-run package and an getting errors 
> > from make test regarding PAML:
> >
> > t/PAML....................ok 2/18Can't call method "get_MLmatrix" on

> > an undefined value at t/PAML.t line 85, <GEN2> line 85.
> > t/PAML....................dubious
> >         Test returned status 2 (wstat 512, 0x200)
> >         after all the subtests completed successfully
> >
> > Is this a legitimate error or am I missing something?
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Wed Jun 14 01:10:04 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 14 Jun 2006 09:10:04 +0800
Subject: [Bioperl-l] SimpleAlign /Bio::AlignIO; POD code doesn't work for me
Message-ID: <5b6410e0606131810k495d8f55mc6dc73f0cd5a6df5@mail.gmail.com>

>
> Hi,
>
> Two queries with respect to SimpleAlign. I am using the following code
> based on the POD.
>
> my $in  = Bio::AlignIO->newFh(-file => $file, -format => 'fasta');
> my $out = Bio::AlignIO->newFh('-format' => 'clustalw');
> print $out $_ while <$in>;
>
> 1) is it possible to set set_displayname_flat() globally without doing
> $_->set_displayname_flat() per alignment.
>
> 2) My input files have an ID and description line for each seq in the
> alignment. When the file is converted I loose the description line. I
> know I can get the description of the sequences (e.g.
> $aln->get_seq_by_pos(2)->description()).
> How could I export the complete fasta defline including the
> description (I realize that general clustal format has a limit on the
> number of characters, but still).
>
> Regards,
> Bernd
> _______________________________________________
>
I might be totally wrong here but what I understand about the FASTA format
is that the first word  (ie no spaces) is the only true name of the seq. So
anything other than the first word is discarded. putting underscores for me
works.

on a sidenote does ur 3rd line work?
it doesn't on my 1.5rc1
I had to add the bold line which was missing in the POD doc.
dont' think it was the use strict pragma
    open MYIN,"<$file" or die "Can't open input alignment";
    open MYOUT, ">$file2" or die "can't write to output";
    my $in  = Bio::AlignIO->newFh(-fh     => \*MYIN,
                               -format => 'fasta');
    my $out = Bio::AlignIO->newFh(-fh     =>  \*MYOUT,
                               -format => 'clustalw');
    print $out $_ while <$in>;

Cheers
kevin


From sb at mrc-dunn.cam.ac.uk  Wed Jun 14 07:49:10 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 14 Jun 2006 08:49:10 +0100
Subject: [Bioperl-l] Blast or blat against custom db?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu>
Message-ID: <448FBF76.1090505@mrc-dunn.cam.ac.uk>

Kevin Brown wrote:
[snip]
> So, my thought was to utilize something like blastall from within the
> script to find other oligos of similar match, but it means that I need
> to dump out the oligos designed, create the db with formatdb. [snip]
> I'm just trying to figure out how to do it all without leaving the
> script, but as yet haven't noticed a way to create a db from within perl
> using bioperl?
> 
> Any thoughts on directions I should look?

AFAIK there's no bioperl interface onto formatdb, but the way to do it 
is make a fasta file (perhaps using bioperl) with all the oligos (what 
you want to become the db), then use a perl system call (or similar) to 
run formatdb. Still in the same script you'd then run and analyse the 
blast with bioperl calls (presumably starting with StandAloneBlast - 
http://bioperl.org/wiki/HOWTO:Beginners#BLAST if you need it).

Just be sure to carefully craft your blast parameters so they're 
suitable for oligo-sized matches and test the 3' base of hits are identical.


From MEC at stowers-institute.org  Wed Jun 14 13:47:59 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 14 Jun 2006 08:47:59 -0500
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
Message-ID: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>

 
Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT; 

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu] 
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a 
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; 
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs 
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret 
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the 
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for 
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175 
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 
>6/13/2006
>
>


From oldham at ucla.edu  Wed Jun 14 02:03:04 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Tue, 13 Jun 2006 19:03:04 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F992@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDOEOLCJAA.oldham@ucla.edu>

Dear Malcolm, Chris, et al,

Thanks to everyone for your helpful suggestions.  When I run the code
below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
output file is still blank.  If I replace this list with a single ID
("542_at"), it works:

>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
GCGCAGCAGCGAGAATTTCGACGAG
>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
GAATTTCGACGAGCTGCTGAAGGCA
>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
CGACGAGCTGCTGAAGGCACTGGGT
........etc.

If I try a list of two IDs ("542_at" and "31799_at"), only the last one
is present in the output:

>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; Antisense;
GTTCATCACAAATCTATTGTGCTTG
>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
Antisense;
GTCCACTAAATGTAGTAACGAAATG
>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
Antisense;
TCCACTAAATGTAGTAACGAAATGT
........etc.

The same thing seems to happen if I go to 3 IDs, or 4 IDs (only the last
ID is present in the output file).  At this point I have no idea why
this is happening, and I am not sure how to interpret Malcolm's comment:

oops,

s/matches on of/matches one of/
s/nothing that/noting that/

Any ideas?  Thanks again................!

Mike O.


#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.

	while (<PROBES>) {
		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print OUT;
			print OUT scalar(<PROBES>);
		}
	}
exit;


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Monday, June 12, 2006 8:48 AM
To: Cook, Malcolm; Chris Fields; Michael Oldham
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


oops,

s/matches on of/matches one of/
s/nothing that/noting that/

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>Cook, Malcolm
>Sent: Monday, June 12, 2006 10:29 AM
>To: Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Michael,
>
>I don't think you can call perl's `print` on just a filehandle as you
>are doing.  This is probably your problem.
>
>If you call `select OUT` after opeining it, print will print $_ to it.
>And, every line in the fasta record whose header matches on of the IDS
>will get printed, not just the fasta header lines.  Read the code again
>nothing that $idmatch is only getting reset when a correctly formatted
>fasta header line is matched.
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>Sent: Saturday, June 10, 2006 11:32 PM
>>To: Michael Oldham
>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single large file
>>
>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>the regex matches anything)?  If there is nothing printed
>then either
>>the regex isn't working as expected or there is something logically
>>wrong.  The problem may be that the captured string must
>match the id
>>exactly, the id being the key to the %ID hash; any extra characters
>>picked up by the regex outside of your id key and you will not get
>>anything.  Looking at Malcolm's regex it should work just fine, but
>>we only had one example sequence to try here.
>>
>>If your while loop is set up like this won't it only print only the
>>matched description lines to the outfile (no sequence) even if there
>>is a match?  Or is this what you wanted?   If you want the sequence
>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>
>>Chris
>>
>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>
>>> Thanks to everyone for their helpful advice.  I think I am getting
>>> closer,
>>> but no cigar quite yet.  The script below runs quickly with no
>>> errors--but
>>> the output file is empty.  It seems that the problem must lie
>>> somewhere in
>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>> experienced
>>> eye--but not to mine!  Any suggestions?  Thanks again for your help.
>>>
>>> --Mike O.
>>>
>>>
>>> #!/usr/bin/perl -w
>>>
>>> use strict;
>>>
>>> my $IDs = 'ID.dat.txt';
>>>
>>> unless (open(IDFILE, $IDs)) {
>>> 	print "Could not open file $IDs!\n";
>>> 	}
>>>
>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>
>>> unless (open(PROBES, $probes)) {
>>> 	print "Could not open file $probes!\n";
>>> 	}
>>>
>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>
>>> my @ID = <IDFILE>;
>>> chomp @ID;
>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>> keys=PSIDs and
>>> all values=1.
>>>
>>> 	while (<PROBES>) {
>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>> 		if ($idmatch){
>>> 			print OUT;
>>> 		}
>>> 	}
>>> exit;
>>>
>>>
>>> -----Original Message-----
>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>> Sent: Friday, June 09, 2006 7:58 AM
>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>> single large
>>> file
>>>
>>>
>>>
>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>> fine and
>>> probably be faster.
>>>
>>> Assuming your ids are one per line in a file named id.dat
>>looking like
>>> this
>>>
>>> 1138_at
>>> 1134_at
>>> etc..
>>>
>>> this should work:
>>>
>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>> mybigfile.fa
>>>
>>> good luck
>>>
>>> --Malcolm Cook
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>> Michael Oldham
>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large file
>>>>
>>>> Dear all,
>>>>
>>>> I am a total Bioperl newbie struggling to accomplish a
>>>> conceptually simple
>>>> task.  I have a single large fasta file containing about 200,000
>>>> probe
>>>> sequences (from an Affymetrix microarray), each of which looks
>>>> like this:
>>>>
>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>> Antisense;
>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>
>>>> What I would like to do is extract from this file a subset of
>>>> ~130,800
>>>> probes (both the header and the sequence) and output this
>>>> subset into a new
>>>> fasta file.  These 130,800 probes correspond to 8,175 probe set IDs
>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>> have these
>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>> to create an
>>>> index of all 200,000 probes in the original fasta file using
>>>> the following
>>>> script:
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> # script 1: create the index
>>>>
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> my $Index_File_Name = shift;
>>>> my $inx = Bio::Index::Fasta->new(
>>>>     -filename => $Index_File_Name,
>>>>     -write_flag => 1);
>>>> $inx->make_index(@ARGV);
>>>>
>>>> I'm not sure if this is the most sensible approach, and even
>>>> if it is, I'm
>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>
>>>> Many thanks,
>>>> Mike O.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>> 6/8/2006
>>>
>>> --
>>> No virus found in this outgoing message.
>>> Checked by AVG Free Edition.
>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>> 6/9/2006
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 6/11/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006


From s_maheshwari84 at rediffmail.com  Thu Jun 15 11:42:24 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 Jun 2006 11:42:24 -0000
Subject: [Bioperl-l] simple problem plz look
Message-ID: <20060615114224.21669.qmail@webmail31.rediffmail.com>

I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI


From rkulasekaran at accelrys.com  Thu Jun 15 12:06:30 2006
From: rkulasekaran at accelrys.com (rkulasekaran at accelrys.com)
Date: Thu, 15 Jun 2006 17:36:30 +0530
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <OF88050CF5.C0508A24-ON6525718E.00425D40-6525718E.00428384@accelrys.com>

Hi,

Can you declare the array ( my @data ) before reading the index.

I guess that will work fine.

- Raja


"saurabh maheshwari" <s_maheshwari84 at rediffmail.com> 
Sent by: bioperl-l-bounces at lists.open-bio.org
15/06/2006 17:12
Please respond to
saurabh maheshwari <s_maheshwari84 at rediffmail.com>


To
bioperl-l at lists.open-bio.org
cc

Subject
[Bioperl-l] simple problem plz look


I m using this statement :

my $data[0][0] = 'P_p';

what is wrong i this as i am getting syntax error>


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Click on the link below to report this email as spam
https://www.mailcontrol.com/sr/behF6u7j0vHYfoNqVfMn0T6lftsSPmT67PBEri3aA93L4mIZnnEsbOOgcm5LPEUItueIAtlw4aAQAjnhffjwxluskn5SCC6PU4sqvHqdy3UBLnb7IgqQIpogrs47CqHnPsig3hjMwg17c5A4zs49QdfwQIXZ3EkZGQpytOaqXTas8SlXA7tRyL!Oh9pq4bqQJsTF3icLnDHTJZLEigD5cPnlrScQD5EK 


From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 12:52:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 13:52:53 +0100
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
References: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <44915825.8040902@mrc-dunn.cam.ac.uk>

saurabh maheshwari wrote:
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>

I don't think general Perl problems are appropriate for this list.
Try subscribing to the beginners mailing list via http://learn.perl.org/

But in any case, say:
my @data;
$data[0][0] = 'P_p';


From cjfields at uiuc.edu  Thu Jun 15 15:18:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 10:18:32 -0500
Subject: [Bioperl-l] simple problem plz look
In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com>
Message-ID: <002001c6908e$f8b11b30$15327e82@pyrimidine>

And exactly how is this applicable to BioPerl?

Start here:

http://learn.perl.org/

My guess: you need to declare 'my @data;' first.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Thursday, June 15, 2006 6:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] simple problem plz look
> 
> I m using this statement :
> 
> my $data[0][0] = 'P_p';
> 
> what is wrong i this as i am getting syntax error>
> 
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 17:42:52 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 10:42:52 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
Message-ID: <44919C1C.1060901@email.arizona.edu>

We are unable to parse BLAST 2.2.14 results from the NCBI website using 
SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
bioperl-live, but when users download either plain text or HTML blast 
outputs from the NCBI page, SearchIO cannot parse them.  This used to 
work prior to BLAST 2.2.14.  Should I try installing the entire 
bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
that makes any difference.)

Thanks,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597


From sb at mrc-dunn.cam.ac.uk  Thu Jun 15 19:00:38 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 15 Jun 2006 20:00:38 +0100
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
References: <mailman.2366.1149793019.2084.bioperl-l@lists.open-bio.org>
	<44919C1C.1060901@email.arizona.edu>
Message-ID: <4491AE56.6090505@mrc-dunn.cam.ac.uk>

Susan J. Miller wrote:
> We are unable to parse BLAST 2.2.14 results from the NCBI website using 
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in 
> bioperl-live, but when users download either plain text or HTML blast 
> outputs from the NCBI page, SearchIO cannot parse them.  This used to 
> work prior to BLAST 2.2.14.  Should I try installing the entire 
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if 
> that makes any difference.)

Parsing saved results from the website works fine here. Please be more 
specific in what you mean by 'unable to parse'. What error messages do 
you get? What exact code did you use to get those errors? Exactly what 
input data did you use? Exactly how did you generate that data?

Cheers,
Sendu.


From cjfields at uiuc.edu  Thu Jun 15 21:06:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 16:06:13 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <44919C1C.1060901@email.arizona.edu>
Message-ID: <002701c690bf$8b732410$15327e82@pyrimidine>

Bio::SearchIO can't handle HTML output directly; you have to junk the tags
first, and we can't really guarantee anymore that will work either (I
haven't tried it).  The FAQ tells you how:

http://www.bioperl.org/wiki/FAQ

I would avoid HTML parsing altogether.  The only sure-fire method that will
always work, according to NCBI, is XML output, and that's parsable using
Bio::SearchIO::blastxml.  You can also try tabular format, which
Bio::SearchIO::blasttable can parse as well.

However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
as well using BLASTP (and that's still set up to parse text output using
SearchIO I believe).  Could you give us an example of the type of BLAST you
were running, the sequence you used, and the error you had?  It could be
program-specific output that may be causing the problems.  The last time
text parsing broke it was changes specifically to only BLASTN/TBLASTX output
or something along those lines.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Susan J. Miller
> Sent: Thursday, June 15, 2006 12:43 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
> 
> We are unable to parse BLAST 2.2.14 results from the NCBI website using
> SearchIO.  I have updated Bio::SearchIO::blast.pm to what's in
> bioperl-live, but when users download either plain text or HTML blast
> outputs from the NCBI page, SearchIO cannot parse them.  This used to
> work prior to BLAST 2.2.14.  Should I try installing the entire
> bioperl-live distribution?  (We are running Solaris 8 and perl 5.8 if
> that makes any difference.)
> 
> Thanks,
> -susan
> 
> Susan J. Miller
> Biotechnology Computing Facility
> Arizona Research Laboratories
> Bio West 228
> University of Arizona
> Tucson, AZ  85721
> (520) 626-2597
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sjmiller at email.arizona.edu  Thu Jun 15 21:43:59 2006
From: sjmiller at email.arizona.edu (Susan J. Miller)
Date: Thu, 15 Jun 2006 14:43:59 -0700
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <002701c690bf$8b732410$15327e82@pyrimidine>
References: <002701c690bf$8b732410$15327e82@pyrimidine>
Message-ID: <4491D49F.4030208@email.arizona.edu>

Chris Fields wrote:
> 
> However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly)
> to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work
> as well using BLASTP (and that's still set up to parse text output using
> SearchIO I believe).  Could you give us an example of the type of BLAST you
> were running, the sequence you used, and the error you had?  It could be
> program-specific output that may be causing the problems.  The last time
> text parsing broke it was changes specifically to only BLASTN/TBLASTX output
> or something along those lines.

Hi Chris and Sendu,

Thanks for your replies.  I am using blastp from the NCBI BLAST page, 
with this input sequence:

MSRLLWRKVAGATVGPGPVPAPGRWVSSSVPASDPSDGQRRRQQQQQQQQQQQQQPQQPQVLSSEGGQLR
HNPLDIQMLSRGLHEQIFGQGGEMPGEAAVRRSVEHLQKHGLWGQPAVPLPDVELRLPPLYGDNLDQHFR
LLAQKQSLPYLEAANLLLQAQLPPKPPAWAWAEGWTRYGPEGEAVPVAIPEERALVFDVEVCLAEGTCPT
LAVAISPSAWYSWCSQRLVEERYSWTSQLSPADLIPLEVPTGASSPTQRDWQEQLVVGHNVSFDRAHIRE
QYLIQGSRMRFLDTMSMHMAISGLSSFQRSLWIAAKQGKHKVQPPTKQGQKSQRKARRGPAISSWDWLDI

I have tried saving HTML (with and without the graphical overview), 
plain text, and XML.  I am parsing with this script:

#!/usr/local/bin/perl -w

use Bio::SearchIO;

while ($fil = shift(@ARGV)) {

   $srchio = new Bio::SearchIO ('-format' => 'blast', '-file' => $fil);
   while ($result = $srchio->next_result) {

         $db = $result->database_name;
         $alg = $result->algorithm;
         print "DB $db\n ALG $alg\n";

         $qid = $result->query_name;
         print "QRY $qid\n";

         while ($hit = $result->next_hit) {

           $hitnam = $hit->name;
           print "\t$hitnam\n";

           $nhsp = 0;
           while ($hit->next_hsp) {
                 $nhsp++;
           }
           print "\tHSPS: $nhsp\n";
         } # end next_hit
   }
}

Interestingly, the results are different (but never correct) for the 
different types of output I've tried.  For xml, the script runs but 
produces no output, for plain text the script hangs with no output, and 
for html, I get these errors:


-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|27502689|gb|AAH42571.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 308.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 308.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|21779923|gb|AAM77583.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 333.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 333.

-------------------- WARNING ---------------------
MSG: No HSPs for this Hit (gi|1644239|dbj|BAA12223.1|)
---------------------------------------------------
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 349, <GEN1> line 358.
Use of uninitialized value in numeric le (<=) at 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm 
line 304, <GEN1> line 358.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline Positives = 270/273 (98%), Gaps = 0/273 (0%) 
Query 78
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result 
/usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/blast.pm:1172
STACK: ./srchio.pl:8


At this point I should probably try installing all of bioperl-live, or 
at least get IteratedSearchResultEventBuilder.pm - or would you 
recommend something else?  Let me know if you need more info.

Thanks again,
-susan

Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ  85721
(520) 626-2597


From cjfields at uiuc.edu  Thu Jun 15 23:03:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Jun 2006 18:03:37 -0500
Subject: [Bioperl-l] Parsing BLAST 2.2.14 output
In-Reply-To: <4491D49F.4030208@email.arizona.edu>
Message-ID: <002b01c690cf$efa05510$15327e82@pyrimidine>

...

> Hi Chris and Sendu,
> 
> Thanks for your replies.  I am using blastp from the NCBI BLAST page,
> with this input sequence:

...

> I have tried saving HTML (with and without the graphical overview),
> plain text, and XML.  I am parsing with this script:


> #!/usr/local/bin/perl -w
> 
> use Bio::SearchIO;
> ...
> }

I got this script to work.  I used your sequence and retrieved BLASTP text
output from NCBI BLASTP 2.2.14, then saved it from the web browser, and just
copied it to three separate files.  Using those files as input, they all
parse fine, with output like this:

DB All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding
environmental samples
 ALG BLASTP
QRY
        gi|27502689|gb|AAH42571.1|
        HSPS: 1
        gi|21779923|gb|AAM77583.1|
        HSPS: 1
...

> Interestingly, the results are different (but never correct) for the
> different types of output I've tried.  For xml, the script runs but
> produces no output, for plain text the script hangs with no output, and
> for html, I get these errors:

What's interesting is that HTML did anything at all.  You MUST strip out the
HTML tags as per the FAQ, which I pointed out before:

http://www.bioperl.org/wiki/FAQ

See the question : Does Bio::SearchIO parse the HTML output that BLAST
creates using the -T option?

Again, I would NOT attempt parsing HTML.  The only reason we have a FAQ
question about it is b/c it popped up on the list many many times in the
past (i.e. it is a FAQ) and someone found out that HTML::Strip works.  We
will never adequately support it beyond suggesting stripping the tags out.
NCBI changes their HTML output more often than their text output.

If you tried parsing XML with the format set to 'blast' you'll get nothing
(the blast text parser looks for text output using regexes, so it just
bypasses all the XML tags).  You must set:

-format => 'blastxml' 

You'll also need to install XML::SAX, and I would suggest installing
XML::SAX::ExpatXS and the Expat XML parser for your system to speed things
up.

The 'hanging' you mention using text parsing sounds like the old bug where
it got caught in an infinite loop.  I don't have this problem.  It could be
a couple of things:

1) You have an old version of bioperl and updated Bio::SearchIO, but you
haven't updated Bio::SearchIO::blast. That's the plugin module where the
error was (not Bio::SearchIO).  Try updating either that or install the
entire distribution from scratch.

2) You have two versions of Bioperl installed (an old one and bioperl-live)
and perl is using the old version of bioperl (and the old version of
SearchIO::blast).  Make sure you only have one version installed and that it
is bioperl-live.

> At this point I should probably try installing all of bioperl-live, or
> at least get IteratedSearchResultEventBuilder.pm - or would you
> recommend something else?  Let me know if you need more info.

If you have the entire distribution installed, you should have ISREB anyway.
ISREB (IteratedSearchResultEventBuilder) has nothing to do with the problems
here, though.

Chris

> Thanks again,
> -susan


From cain at cshl.edu  Thu Jun 15 15:25:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 15 Jun 2006 11:25:54 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
Message-ID: <1150385154.2622.152.camel@localhost.localdomain>

Hi all,

When running make test on a fairly new system, I got the following
failure:

t/Compatible.................No Set::Scalar. Unable to test Bio::Tree::Compatible
Can't locate Set/Scalar.pm in @INC
....
BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Tree/Compatible.pm line 138.
Compilation failed in require at t/Compatible.t line 42.
BEGIN failed--compilation aborted at t/Compatible.t line 42.
t/Compatible.................dubious                                         
        Test returned status 2 (wstat 512, 0x200)
        after all the subtests completed successfully

Set::Scalar is mentioned in Makefile.PL as an optional package (but not
required) and isn't mentioned in the INSTALL doc anywhere.  It looks
like the author of the test (t/Compatible.t) is trying to skip this test
if Set::Scalar isn't found, but the 'dubious' result gets marked
ultimately as a failure.

What is the right thing to do here?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/8cb53ee4/attachment.sig>

From hlapp at gmx.net  Fri Jun 16 04:42:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 16 Jun 2006 00:42:25 -0400
Subject: [Bioperl-l] Set::Scalar missing causing a test failure
	for	Bio::Tree::Compatible
In-Reply-To: <1150385154.2622.152.camel@localhost.localdomain>
References: <1150385154.2622.152.camel@localhost.localdomain>
Message-ID: <D4E96C47-977E-474C-B093-82CDE775F6C1@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Should be fixed on the main trunk. -hilmar

On Jun 15, 2006, at 11:25 AM, Scott Cain wrote:

> Hi all,
>
> When running make test on a fairly new system, I got the following
> failure:
>
> t/Compatible.................No Set::Scalar. Unable to test  
> Bio::Tree::Compatible
> Can't locate Set/Scalar.pm in @INC
> ....
> BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl- 
> live/blib/lib/Bio/Tree/Compatible.pm line 138.
> Compilation failed in require at t/Compatible.t line 42.
> BEGIN failed--compilation aborted at t/Compatible.t line 42.
> t/Compatible.................dubious
>         Test returned status 2 (wstat 512, 0x200)
>         after all the subtests completed successfully
>
> Set::Scalar is mentioned in Makefile.PL as an optional package (but  
> not
> required) and isn't mentioned in the INSTALL doc anywhere.  It looks
> like the author of the test (t/Compatible.t) is trying to skip this  
> test
> if Set::Scalar isn't found, but the 'dubious' result gets marked
> ultimately as a failure.
>
> What is the right thing to do here?
>
> Thanks,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFEkja5uV6N2JxL7qsRAjqCAJ9RTgPntJ+dmGHeiovS5FeG3QvZagCeMzmw
sKkizbLUYAsyJqVw/2SplcQ=
=ehd6
-----END PGP SIGNATURE-----


From rmb32 at cornell.edu  Fri Jun 16 01:37:03 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 15 Jun 2006 18:37:03 -0700
Subject: [Bioperl-l] reading and writing GFF3
Message-ID: <44920B3F.90405@cornell.edu>

There is stuff in bioperl for reading and writing GFF3.  There's 
Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
is the 'best' one to use?

Neither of these is working very well for me.

My proximate use case is reading in a RepeatMasker report with 
Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
writing those out to a GFF3 file.

Bio::Tools::GFF will take these things and write out something that 
closely resembles GFF3, but with Target attributes that don't seem to 
comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
commas instead of spaces.  I'm attaching a little script that 
illustrates this.

Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
features contained in them, throwing 'only Bio::SeqFeature::Annotated 
objects are writeable'.  This seems a bit silly, since one of the whole 
points of Bioperl is using polymorphism to make it easy to connect 
things together.  I've attached a little script to illustrate this one too.

So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
deprecated?  Why does Bio::FeatureIO::gff only accept 
Bio::SeqFeature::Annotated objects?

Thanks in advance.

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_featureio_gff_test.pl
Type: application/x-perl
Size: 1455 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bio_tools_gff_test.pl
Type: application/x-perl
Size: 1436 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment-0001.pl>

From cain at cshl.edu  Fri Jun 16 14:18:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 10:18:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <44920B3F.90405@cornell.edu>
References: <44920B3F.90405@cornell.edu>
Message-ID: <1150467493.2622.209.camel@localhost.localdomain>

Hi Rob,

I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
but that is actually a good thing.  The tighter constraints results in a
better, more consistent file format.

The reason only BSF::Annotated features are writable is that there needs
to be tight control on the 'type' of the feature, to insure that the
type is part of the Sequence Ontology.  It also makes it much easier to
properly write out the attributes in the ninth column, particularly the
ones that are 'reserved', like Parent, Dbxref, and Ontology_term.

BTG is still usable, but the GFF3 it puts out is actually more
'GFF3-like'; that is, it looks like GFF3, but because there are no
constraints on the type and the terms that are used in the ninth column,
you have to be very careful using it to produce GFF3, by making sure
that your feature objects conform to the standard before BTG tries to
write them out.  (Of course, one way to do that would be to convert your
feature objects to BSF::Annotated objects, but then you could use
BFIO::gff :-)

[Long pause while scott goes and monkeys with Bio::Tools::GFF]

OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
this is completely valid.  (I even fixed the escaping the of the stray
'=' in 'hind_R=2046'.)  The output I get is this:

##gff-version 3
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120

Scott


On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> There is stuff in bioperl for reading and writing GFF3.  There's 
> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
> is the 'best' one to use?
> 
> Neither of these is working very well for me.
> 
> My proximate use case is reading in a RepeatMasker report with 
> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
> writing those out to a GFF3 file.
> 
> Bio::Tools::GFF will take these things and write out something that 
> closely resembles GFF3, but with Target attributes that don't seem to 
> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
> commas instead of spaces.  I'm attaching a little script that 
> illustrates this.
> 
> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
> objects are writeable'.  This seems a bit silly, since one of the whole 
> points of Bioperl is using polymorphism to make it easy to connect 
> things together.  I've attached a little script to illustrate this one too.
> 
> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
> deprecated?  Why does Bio::FeatureIO::gff only accept 
> Bio::SeqFeature::Annotated objects?
> 
> Thanks in advance.
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/572bd98b/attachment.sig>

From rmb32 at cornell.edu  Fri Jun 16 18:36:22 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 11:36:22 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
References: <44920B3F.90405@cornell.edu>
	<1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <4492FA26.6030909@cornell.edu>

Thanks for the reply Scott.  It's good that the BSF::Annotated features 
control the type to be in the SO.  I sort of came to the "BTG is only 
gff3-/like/" conclusion myself as I poked around in the two modules in 
question, so I'd much rather use BSF::gff.  So I guess the question now 
is (and this will probably be a pretty common use case) how does one 
take an "old" Bio::SeqFeature::Generic or the like object and make it 
into a Bio::SeqFeature::Annotated?


Rob

Scott Cain wrote:
> Hi Rob,
>
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
>
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
>
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
>
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556    918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736    488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105    1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036    312     -       .       Target=hind_R%3D2046 59 120
>
> Scott
>
>
>
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>   
>> There is stuff in bioperl for reading and writing GFF3.  There's 
>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which 
>> is the 'best' one to use?
>>
>> Neither of these is working very well for me.
>>
>> My proximate use case is reading in a RepeatMasker report with 
>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then 
>> writing those out to a GFF3 file.
>>
>> Bio::Tools::GFF will take these things and write out something that 
>> closely resembles GFF3, but with Target attributes that don't seem to 
>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with 
>> commas instead of spaces.  I'm attaching a little script that 
>> illustrates this.
>>
>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the 
>> features contained in them, throwing 'only Bio::SeqFeature::Annotated 
>> objects are writeable'.  This seems a bit silly, since one of the whole 
>> points of Bioperl is using polymorphism to make it easy to connect 
>> things together.  I've attached a little script to illustrate this one too.
>>
>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF 
>> deprecated?  Why does Bio::FeatureIO::gff only accept 
>> Bio::SeqFeature::Annotated objects?
>>
>> Thanks in advance.
>>
>> Rob
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 16 19:12:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Jun 2006 14:12:28 -0500
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain>
Message-ID: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>

Scott, 

Looks like Robert also submitted a bug report related to this as well.
Could you check into it (pretty-please)?  I'm still GFF3-illiterate.

http://bugzilla.open-bio.org/show_bug.cgi?id=2025

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Scott Cain
> Sent: Friday, June 16, 2006 9:18 AM
> To: Robert Buels
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] reading and writing GFF3
> 
> Hi Rob,
> 
> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> but that is actually a good thing.  The tighter constraints results in a
> better, more consistent file format.
> 
> The reason only BSF::Annotated features are writable is that there needs
> to be tight control on the 'type' of the feature, to insure that the
> type is part of the Sequence Ontology.  It also makes it much easier to
> properly write out the attributes in the ninth column, particularly the
> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> 
> BTG is still usable, but the GFF3 it puts out is actually more
> 'GFF3-like'; that is, it looks like GFF3, but because there are no
> constraints on the type and the terms that are used in the ninth column,
> you have to be very careful using it to produce GFF3, by making sure
> that your feature objects conform to the standard before BTG tries to
> write them out.  (Of course, one way to do that would be to convert your
> feature objects to BSF::Annotated objects, but then you could use
> BFIO::gff :-)
> 
> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> 
> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> this is completely valid.  (I even fixed the escaping the of the stray
> '=' in 'hind_R=2046'.)  The output I get is this:
> 
> ##gff-version 3
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> 918     -       .       Target=Contig151 325 832
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> 488     -       .       Target=Contig386 1 124
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> 1718    +       .       Target=Contig358 1 311
> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> 312     -       .       Target=hind_R%3D2046 59 120
> 
> Scott
> 
> 
> 
> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > There is stuff in bioperl for reading and writing GFF3.  There's
> > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > is the 'best' one to use?
> >
> > Neither of these is working very well for me.
> >
> > My proximate use case is reading in a RepeatMasker report with
> > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > writing those out to a GFF3 file.
> >
> > Bio::Tools::GFF will take these things and write out something that
> > closely resembles GFF3, but with Target attributes that don't seem to
> > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > commas instead of spaces.  I'm attaching a little script that
> > illustrates this.
> >
> > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > objects are writeable'.  This seems a bit silly, since one of the whole
> > points of Bioperl is using polymorphism to make it easy to connect
> > things together.  I've attached a little script to illustrate this one
> too.
> >
> > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > deprecated?  Why does Bio::FeatureIO::gff only accept
> > Bio::SeqFeature::Annotated objects?
> >
> > Thanks in advance.
> >
> > Rob
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory


From rmb32 at cornell.edu  Fri Jun 16 19:30:23 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:30:23 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <449306CF.1030301@cornell.edu>

Woops, I should have said something about that.  I submitted it before I 
saw that Scott had already done the escaping in CVS.

Chris Fields wrote:
> Scott, 
>
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>> Sent: Friday, June 16, 2006 9:18 AM
>> To: Robert Buels
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>
>> Hi Rob,
>>
>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>> but that is actually a good thing.  The tighter constraints results in a
>> better, more consistent file format.
>>
>> The reason only BSF::Annotated features are writable is that there needs
>> to be tight control on the 'type' of the feature, to insure that the
>> type is part of the Sequence Ontology.  It also makes it much easier to
>> properly write out the attributes in the ninth column, particularly the
>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>
>> BTG is still usable, but the GFF3 it puts out is actually more
>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>> constraints on the type and the terms that are used in the ninth column,
>> you have to be very careful using it to produce GFF3, by making sure
>> that your feature objects conform to the standard before BTG tries to
>> write them out.  (Of course, one way to do that would be to convert your
>> feature objects to BSF::Annotated objects, but then you could use
>> BFIO::gff :-)
>>
>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>
>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>> this is completely valid.  (I even fixed the escaping the of the stray
>> '=' in 'hind_R=2046'.)  The output I get is this:
>>
>> ##gff-version 3
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>> 918     -       .       Target=Contig151 325 832
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>> 488     -       .       Target=Contig386 1 124
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>> 1718    +       .       Target=Contig358 1 311
>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>> 312     -       .       Target=hind_R%3D2046 59 120
>>
>> Scott
>>
>>
>>
>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>     
>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>> is the 'best' one to use?
>>>
>>> Neither of these is working very well for me.
>>>
>>> My proximate use case is reading in a RepeatMasker report with
>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>> writing those out to a GFF3 file.
>>>
>>> Bio::Tools::GFF will take these things and write out something that
>>> closely resembles GFF3, but with Target attributes that don't seem to
>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>> commas instead of spaces.  I'm attaching a little script that
>>> illustrates this.
>>>
>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>> points of Bioperl is using polymorphism to make it easy to connect
>>> things together.  I've attached a little script to illustrate this one
>>>       
>> too.
>>     
>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>> Bio::SeqFeature::Annotated objects?
>>>
>>> Thanks in advance.
>>>
>>> Rob
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Fri Jun 16 19:34:16 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 12:34:16 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150486453.4412.30.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
Message-ID: <449307B8.5040802@cornell.edu>

So about that converting ye olde feature objects into 
Bio::SeqFeature::Annotated objects.  How do I do it?


Scott Cain wrote:
> That's OK--You added a few items that should be escaped that weren't, so
> I added those too.
>
> Thanks,
> Scott
>
>
> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>   
>> Woops, I should have said something about that.  I submitted it before
>> I saw that Scott had already done the escaping in CVS.
>>
>> Chris Fields wrote: 
>>     
>>> Scott, 
>>>
>>> Looks like Robert also submitted a bug report related to this as well.
>>> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
>>>
>>> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
>>>
>>> Chris
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Scott Cain
>>>> Sent: Friday, June 16, 2006 9:18 AM
>>>> To: Robert Buels
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] reading and writing GFF3
>>>>
>>>> Hi Rob,
>>>>
>>>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
>>>> but that is actually a good thing.  The tighter constraints results in a
>>>> better, more consistent file format.
>>>>
>>>> The reason only BSF::Annotated features are writable is that there needs
>>>> to be tight control on the 'type' of the feature, to insure that the
>>>> type is part of the Sequence Ontology.  It also makes it much easier to
>>>> properly write out the attributes in the ninth column, particularly the
>>>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
>>>>
>>>> BTG is still usable, but the GFF3 it puts out is actually more
>>>> 'GFF3-like'; that is, it looks like GFF3, but because there are no
>>>> constraints on the type and the terms that are used in the ninth column,
>>>> you have to be very careful using it to produce GFF3, by making sure
>>>> that your feature objects conform to the standard before BTG tries to
>>>> write them out.  (Of course, one way to do that would be to convert your
>>>> feature objects to BSF::Annotated objects, but then you could use
>>>> BFIO::gff :-)
>>>>
>>>> [Long pause while scott goes and monkeys with Bio::Tools::GFF]
>>>>
>>>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
>>>> your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
>>>> this is completely valid.  (I even fixed the escaping the of the stray
>>>> '=' in 'hind_R=2046'.)  The output I get is this:
>>>>
>>>> ##gff-version 3
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
>>>> 918     -       .       Target=Contig151 325 832
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
>>>> 488     -       .       Target=Contig386 1 124
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
>>>> 1718    +       .       Target=Contig358 1 311
>>>> C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
>>>> 312     -       .       Target=hind_R%3D2046 59 120
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
>>>>     
>>>>         
>>>>> There is stuff in bioperl for reading and writing GFF3.  There's
>>>>> Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
>>>>> is the 'best' one to use?
>>>>>
>>>>> Neither of these is working very well for me.
>>>>>
>>>>> My proximate use case is reading in a RepeatMasker report with
>>>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
>>>>> writing those out to a GFF3 file.
>>>>>
>>>>> Bio::Tools::GFF will take these things and write out something that
>>>>> closely resembles GFF3, but with Target attributes that don't seem to
>>>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
>>>>> commas instead of spaces.  I'm attaching a little script that
>>>>> illustrates this.
>>>>>
>>>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
>>>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated
>>>>> objects are writeable'.  This seems a bit silly, since one of the whole
>>>>> points of Bioperl is using polymorphism to make it easy to connect
>>>>> things together.  I've attached a little script to illustrate this one
>>>>>       
>>>>>           
>>>> too.
>>>>     
>>>>         
>>>>> So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
>>>>> deprecated?  Why does Bio::FeatureIO::gff only accept
>>>>> Bio::SeqFeature::Annotated objects?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Rob
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>     
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>   
>>>       
>> -- 
>> Robert Buels
>> SGN Bioinformatics Analyst
>> 252A Emerson Hall, Cornell University
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>>
>>     

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Fri Jun 16 19:28:52 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:28:52 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
Message-ID: <1150486133.4412.25.camel@localhost.localdomain>

I tweaked the patch and applied it, and closed the bug.

Thanks for pointing it out--I doubt I would have noticed it in the
bioper-guts mailing, which I generally don't look too closely at :-o

Scott


On Fri, 2006-06-16 at 14:12 -0500, Chris Fields wrote:
> Scott, 
> 
> Looks like Robert also submitted a bug report related to this as well.
> Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > Sent: Friday, June 16, 2006 9:18 AM
> > To: Robert Buels
> > Cc: bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] reading and writing GFF3
> > 
> > Hi Rob,
> > 
> > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > but that is actually a good thing.  The tighter constraints results in a
> > better, more consistent file format.
> > 
> > The reason only BSF::Annotated features are writable is that there needs
> > to be tight control on the 'type' of the feature, to insure that the
> > type is part of the Sequence Ontology.  It also makes it much easier to
> > properly write out the attributes in the ninth column, particularly the
> > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > 
> > BTG is still usable, but the GFF3 it puts out is actually more
> > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > constraints on the type and the terms that are used in the ninth column,
> > you have to be very careful using it to produce GFF3, by making sure
> > that your feature objects conform to the standard before BTG tries to
> > write them out.  (Of course, one way to do that would be to convert your
> > feature objects to BSF::Annotated objects, but then you could use
> > BFIO::gff :-)
> > 
> > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > 
> > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > this is completely valid.  (I even fixed the escaping the of the stray
> > '=' in 'hind_R=2046'.)  The output I get is this:
> > 
> > ##gff-version 3
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > 918     -       .       Target=Contig151 325 832
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > 488     -       .       Target=Contig386 1 124
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > 1718    +       .       Target=Contig358 1 311
> > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > 312     -       .       Target=hind_R%3D2046 59 120
> > 
> > Scott
> > 
> > 
> > 
> > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > is the 'best' one to use?
> > >
> > > Neither of these is working very well for me.
> > >
> > > My proximate use case is reading in a RepeatMasker report with
> > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > writing those out to a GFF3 file.
> > >
> > > Bio::Tools::GFF will take these things and write out something that
> > > closely resembles GFF3, but with Target attributes that don't seem to
> > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > commas instead of spaces.  I'm attaching a little script that
> > > illustrates this.
> > >
> > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > points of Bioperl is using polymorphism to make it easy to connect
> > > things together.  I've attached a little script to illustrate this one
> > too.
> > >
> > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > Bio::SeqFeature::Annotated objects?
> > >
> > > Thanks in advance.
> > >
> > > Rob
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/912257e8/attachment.sig>

From cain at cshl.edu  Fri Jun 16 19:34:13 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:34:13 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449306CF.1030301@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
Message-ID: <1150486453.4412.30.camel@localhost.localdomain>

That's OK--You added a few items that should be escaped that weren't, so
I added those too.

Thanks,
Scott


On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> Woops, I should have said something about that.  I submitted it before
> I saw that Scott had already done the escaping in CVS.
> 
> Chris Fields wrote: 
> > Scott, 
> > 
> > Looks like Robert also submitted a bug report related to this as well.
> > Could you check into it (pretty-please)?  I'm still GFF3-illiterate.
> > 
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2025
> > 
> > Chris
> > 
> >   
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Scott Cain
> > > Sent: Friday, June 16, 2006 9:18 AM
> > > To: Robert Buels
> > > Cc: bioperl-l at bioperl.org
> > > Subject: Re: [Bioperl-l] reading and writing GFF3
> > > 
> > > Hi Rob,
> > > 
> > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2,
> > > but that is actually a good thing.  The tighter constraints results in a
> > > better, more consistent file format.
> > > 
> > > The reason only BSF::Annotated features are writable is that there needs
> > > to be tight control on the 'type' of the feature, to insure that the
> > > type is part of the Sequence Ontology.  It also makes it much easier to
> > > properly write out the attributes in the ninth column, particularly the
> > > ones that are 'reserved', like Parent, Dbxref, and Ontology_term.
> > > 
> > > BTG is still usable, but the GFF3 it puts out is actually more
> > > 'GFF3-like'; that is, it looks like GFF3, but because there are no
> > > constraints on the type and the terms that are used in the ninth column,
> > > you have to be very careful using it to produce GFF3, by making sure
> > > that your feature objects conform to the standard before BTG tries to
> > > write them out.  (Of course, one way to do that would be to convert your
> > > feature objects to BSF::Annotated objects, but then you could use
> > > BFIO::gff :-)
> > > 
> > > [Long pause while scott goes and monkeys with Bio::Tools::GFF]
> > > 
> > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for
> > > your sample data.  Conveniently, since 'nucleotide_motif' is a SO term,
> > > this is completely valid.  (I even fixed the escaping the of the stray
> > > '=' in 'hind_R=2046'.)  The output I get is this:
> > > 
> > > ##gff-version 3
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2095    2556
> > > 918     -       .       Target=Contig151 325 832
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2590    2736
> > > 488     -       .       Target=Contig386 1 124
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        2787    3105
> > > 1718    +       .       Target=Contig358 1 311
> > > C08HBa0001K22.1 RepeatMasker    nucleotide_motif        3974    4036
> > > 312     -       .       Target=hind_R%3D2046 59 120
> > > 
> > > Scott
> > > 
> > > 
> > > 
> > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote:
> > >     
> > > > There is stuff in bioperl for reading and writing GFF3.  There's
> > > > Bio::Tools::GFF.  There's Bio::FeatureIO::gff.  Are there more?  Which
> > > > is the 'best' one to use?
> > > > 
> > > > Neither of these is working very well for me.
> > > > 
> > > > My proximate use case is reading in a RepeatMasker report with
> > > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then
> > > > writing those out to a GFF3 file.
> > > > 
> > > > Bio::Tools::GFF will take these things and write out something that
> > > > closely resembles GFF3, but with Target attributes that don't seem to
> > > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with
> > > > commas instead of spaces.  I'm attaching a little script that
> > > > illustrates this.
> > > > 
> > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the
> > > > features contained in them, throwing 'only Bio::SeqFeature::Annotated
> > > > objects are writeable'.  This seems a bit silly, since one of the whole
> > > > points of Bioperl is using polymorphism to make it easy to connect
> > > > things together.  I've attached a little script to illustrate this one
> > > >       
> > > too.
> > >     
> > > > So my questions are:  what _should_ I be doing here?  Is Bio::Tools::GFF
> > > > deprecated?  Why does Bio::FeatureIO::gff only accept
> > > > Bio::SeqFeature::Annotated objects?
> > > > 
> > > > Thanks in advance.
> > > > 
> > > > Rob
> > > > 
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >       
> > > --
> > > ------------------------------------------------------------------------
> > > Scott Cain, Ph. D.                                         cain at cshl.edu
> > > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > > Cold Spring Harbor Laboratory
> > >     
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >   
> 
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/3dfde2ea/attachment.sig>

From cain at cshl.edu  Fri Jun 16 19:55:31 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 15:55:31 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449307B8.5040802@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
Message-ID: <1150487731.4412.35.camel@localhost.localdomain>

Um, yeah, good question.  The reason I didn't answer you when you wrote
before is that I was hoping for divine inspiration for an answer (or for
somebody else to answer, which would have been really great :-)

The short answer (and easy one for me to type) is that you will probably
need an ad hoc method to do it, which is the same thing I do when I need
to convert gff2 to gff3, to make sure the things I need mapped get
mapped the 'right' way (that is, the way I want them to go).  I don't
have any sample code that does this, but if you want to start working up
an ad hoc method, I will certainly try to help you as much as I can.

Scott


On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> So about that converting ye olde feature objects into 
> Bio::SeqFeature::Annotated objects.  How do I do it?
> 
> 
> Scott Cain wrote:
> > That's OK--You added a few items that should be escaped that weren't, so
> > I added those too.
> >
> > Thanks,
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >   
> >> Woops, I should have said something about that.  I submitted it before
> >> I saw that Scott had already done the escaping in CVS.
> >>
> >> Chris Fields wrote: 
> >>     
> >>> Scott, 
> >>>
> >>> Looks like Robert also submitted a bug report related to this as well

From rmb32 at cornell.edu  Fri Jun 16 20:31:08 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 16 Jun 2006 13:31:08 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150487731.4412.35.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
Message-ID: <4493150C.1080909@cornell.edu>

Rather than cobble together some ad-hoc solution, I would be interested 
in working on a good solution to this problem, because it seems like 
it's just going to get more common as more people start wanting to write 
GFF3.  What about some code in whatever customarily makes these objects 
(probably BSF::Annotated's new() method?) that could take another type 
of Feature object and attempt to shoehorn its data into a new 
BSF::Annotated?  If it failed (because the type isn't in SO or 
whatever), it could throw() some informative error message.

Then, people could write straightforward code something like:

while(my $oldstylefeature = $features_in->next_feature) {
    $oldstylefeature->primary_tag('something_that_is_in_so');
    $oldstylefeature->something_else('some other something that needs to 
be changed for compliance');
    my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
    $gff3_out->write_feature($newfeature);
}

Does that sound like a good idea?  I'd be more than willing to implement 
this, since I'm going to need to do this sort of thing with many more 
things than just RepeatMasker.

Rob

Scott Cain wrote:
> Um, yeah, good question.  The reason I didn't answer you when you wrote
> before is that I was hoping for divine inspiration for an answer (or for
> somebody else to answer, which would have been really great :-)
>
> The short answer (and easy one for me to type) is that you will probably
> need an ad hoc method to do it, which is the same thing I do when I need
> to convert gff2 to gff3, to make sure the things I need mapped get
> mapped the 'right' way (that is, the way I want them to go).  I don't
> have any sample code that does this, but if you want to start working up
> an ad hoc method, I will certainly try to help you as much as I can.
>
> Scott
>
>
> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>   
>> So about that converting ye olde feature objects into 
>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>
>>
>> Scott Cain wrote:
>>     
>>> That's OK--You added a few items that should be escaped that weren't, so
>>> I added those too.
>>>
>>> Thanks,
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> Woops, I should have said something about that.  I submitted it before
>>>> I saw that Scott had already done the escaping in CVS.
>>>>
>>>> Chris Fields wrote: 
>>>>     
>>>>         
>>>>> Scott, 
>>>>>
>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From rmb32 at cornell.edu  Sat Jun 17 10:36:59 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 03:36:59 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	
	<449306CF.1030301@cornell.edu>	
	<1150486453.4412.30.camel@localhost.localdomain>	
	<449307B8.5040802@cornell.edu>	
	<1150487731.4412.35.camel@localhost.localdomain>	
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <4493DB4B.4020509@cornell.edu>

Yep.  I'm almost finished with the first draft of a function that does 
this.  I'll polish it up over the weekend then on Monday I'll submit a 
bugzilla bug and patch with it so you can take a look.

Rob

Scott Cain wrote:
> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>   
>> Rather than cobble together some ad-hoc solution, I would be interested 
>> in working on a good solution to this problem, because it seems like 
>> it's just going to get more common as more people start wanting to write 
>> GFF3.  What about some code in whatever customarily makes these objects 
>> (probably BSF::Annotated's new() method?) that could take another type 
>> of Feature object and attempt to shoehorn its data into a new 
>> BSF::Annotated?  If it failed (because the type isn't in SO or 
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that needs to 
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to implement 
>> this, since I'm going to need to do this sort of thing with many more 
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>     
>>> Um, yeah, good question.  The reason I didn't answer you when you wrote
>>> before is that I was hoping for divine inspiration for an answer (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will probably
>>> need an ad hoc method to do it, which is the same thing I do when I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>> have any sample code that does this, but if you want to start working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>   
>>>       
>>>> So about that converting ye olde feature objects into 
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>     
>>>>         
>>>>> That's OK--You added a few items that should be escaped that weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Woops, I should have said something about that.  I submitted it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote: 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Scott, 
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this as well=
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>               

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain at cshl.edu  Sat Jun 17 03:56:44 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 16 Jun 2006 23:56:44 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <4493150C.1080909@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
Message-ID: <1150516605.2600.9.camel@localhost.localdomain>

Rob,

I came to the same conclusion as well; I wrote my response as I was
heading out the door and while I was running errands, I realized the
right thing to do is to write a Bio::SeqFeature::Annotated method called
new_from_object, whose usage would be:

  my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args);

where you would give it a Bio::SeqFeatureI compliant object and try to
create a BSFA like use suggested below.  You could allow passing in args
to control how different things are handled, like mapping non-SO types
to SO types.  I'll think about this over the weekend and let you know if
brilliance strikes me.

Scott


On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> Rather than cobble together some ad-hoc solution, I would be interested 
> in working on a good solution to this problem, because it seems like 
> it's just going to get more common as more people start wanting to write 
> GFF3.  What about some code in whatever customarily makes these objects 
> (probably BSF::Annotated's new() method?) that could take another type 
> of Feature object and attempt to shoehorn its data into a new 
> BSF::Annotated?  If it failed (because the type isn't in SO or 
> whatever), it could throw() some informative error message.
> 
> Then, people could write straightforward code something like:
> 
> while(my $oldstylefeature = $features_in->next_feature) {
>     $oldstylefeature->primary_tag('something_that_is_in_so');
>     $oldstylefeature->something_else('some other something that needs to 
> be changed for compliance');
>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>     $gff3_out->write_feature($newfeature);
> }
> 
> Does that sound like a good idea?  I'd be more than willing to implement 
> this, since I'm going to need to do this sort of thing with many more 
> things than just RepeatMasker.
> 
> Rob
> 
> Scott Cain wrote:
> > Um, yeah, good question.  The reason I didn't answer you when you wrote
> > before is that I was hoping for divine inspiration for an answer (or for
> > somebody else to answer, which would have been really great :-)
> >
> > The short answer (and easy one for me to type) is that you will probably
> > need an ad hoc method to do it, which is the same thing I do when I need
> > to convert gff2 to gff3, to make sure the things I need mapped get
> > mapped the 'right' way (that is, the way I want them to go).  I don't
> > have any sample code that does this, but if you want to start working up
> > an ad hoc method, I will certainly try to help you as much as I can.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >   
> >> So about that converting ye olde feature objects into 
> >> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>
> >>
> >> Scott Cain wrote:
> >>     
> >>> That's OK--You added a few items that should be escaped that weren't, so
> >>> I added those too.
> >>>
> >>> Thanks,
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>   
> >>>       
> >>>> Woops, I should have said something about that.  I submitted it before
> >>>> I saw that Scott had already done the escaping in CVS.
> >>>>
> >>>> Chris Fields wrote: 
> >>>>     
> >>>>         
> >>>>> Scott, 
> >>>>>
> >>>>> Looks like Robert also submitted a bug report related to this as well=
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/7ff49e0d/attachment.sig>

From hlapp at gmx.net  Sat Jun 17 16:20:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 17 Jun 2006 12:20:08 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
Message-ID: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You don't need a new method for this. Instead, support a -feature  
argument.

	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);

This should work for any instance of Bio::SeqFeatureI. If it is a  
B::SF::Annotated already it is obviously just a deep copy (if copy is  
desired - could be another parameter). Otherwise more will be involved.

Alternatively, and possibly better, is to write a specialized  
SeqFeatureI factory (that would implement  
Bio::Factory::ObjectFactoryI) and then delegate this job to it:

	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
		-type_ontology => $sequence_ontology,
		-source_ontology => $feature_source_ontology,
		-unflatten => 1);
	my $bsfa = $feat_factory->create_object({-feature => $feature});

This is preferable because it separates business logic that isn't  
necessarily related into defined units. I.e., the logic necessary to  
convert an ordinary feature into a strongly typed one is different  
from how to represent a strongly typed feature. IMHO anyway ...

Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
started as the result of a discussion thread earlier this (or last?)  
year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
though not in concept.

Maybe we need to get together again and thrash out a strategy; or a  
BOF at the GMOD meeting? I feel this does need a core group of people  
who care, hash out a strategy that will also solve the backwards  
compatibility problem with the current Bio::SeqFeatureI state-of- 
limbo, and allow us to implement the decisions with a few people in a  
concentrated effort. This will then also remove the only real large  
stumbling block towards a 1.6 release.

Maybe we should think about a little pre-GMOD hackathon to clear up  
this mess? Scott, you'll be there a day early? I'll be already back  
and Jason I believe will still be in town, although he may have other  
commitments already. Nonetheless, it shouldn't really take that much  
but rather dedicated time, a whiteboard, and a few people who care  
thrashing this out and then do it.

Thoughts?

	-hilmar

On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:

> Rob,
>
> I came to the same conclusion as well; I wrote my response as I was
> heading out the door and while I was running errands, I realized the
> right thing to do is to write a Bio::SeqFeature::Annotated method  
> called
> new_from_object, whose usage would be:
>
>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> ($my_BSFI, %args);
>
> where you would give it a Bio::SeqFeatureI compliant object and try to
> create a BSFA like use suggested below.  You could allow passing in  
> args
> to control how different things are handled, like mapping non-SO types
> to SO types.  I'll think about this over the weekend and let you  
> know if
> brilliance strikes me.
>
> Scott
>
>
> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>> Rather than cobble together some ad-hoc solution, I would be  
>> interested
>> in working on a good solution to this problem, because it seems like
>> it's just going to get more common as more people start wanting to  
>> write
>> GFF3.  What about some code in whatever customarily makes these  
>> objects
>> (probably BSF::Annotated's new() method?) that could take another  
>> type
>> of Feature object and attempt to shoehorn its data into a new
>> BSF::Annotated?  If it failed (because the type isn't in SO or
>> whatever), it could throw() some informative error message.
>>
>> Then, people could write straightforward code something like:
>>
>> while(my $oldstylefeature = $features_in->next_feature) {
>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>     $oldstylefeature->something_else('some other something that  
>> needs to
>> be changed for compliance');
>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>> ($oldstylefeature);
>>     $gff3_out->write_feature($newfeature);
>> }
>>
>> Does that sound like a good idea?  I'd be more than willing to  
>> implement
>> this, since I'm going to need to do this sort of thing with many more
>> things than just RepeatMasker.
>>
>> Rob
>>
>> Scott Cain wrote:
>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>> wrote
>>> before is that I was hoping for divine inspiration for an answer  
>>> (or for
>>> somebody else to answer, which would have been really great :-)
>>>
>>> The short answer (and easy one for me to type) is that you will  
>>> probably
>>> need an ad hoc method to do it, which is the same thing I do when  
>>> I need
>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>> don't
>>> have any sample code that does this, but if you want to start  
>>> working up
>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>
>>>> So about that converting ye olde feature objects into
>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>>> That's OK--You added a few items that should be escaped that  
>>>>> weren't, so
>>>>> I added those too.
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>
>>>>>
>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>> it before
>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>> Scott,
>>>>>>>
>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>> as well=
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> --------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

- --
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
ImoAXD/jrbF0gXzSr2CY4tQ=
=XfDq
-----END PGP SIGNATURE-----


From rmb32 at cornell.edu  Sat Jun 17 18:36:28 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Sat, 17 Jun 2006 11:36:28 -0700
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <44944BAC.7000302@cornell.edu>

I'd love to help more with this, since with the new tomato genome coming 
in my job is going to be working more and more with annotations, but I'm 
not a core person and I can't go to the meeting in NC.  In the interests 
of getting my job done right now, I've implemented a -feature argument 
to Bio::SeqFeature::Annotated's constructor, which calls uses a method 
from_feature() I added.  If you guys want it, it's attached to bug 2026.

 From the perspective of a casual bioperl user, anything you guys can do 
to make the handling of features and annotations less fragmented and 
more robust would be wonderful.  I'd be happy to help with 
implementation if one of you grizzled veterans would give me marching 
orders. :-)

Rob

Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> You don't need a new method for this. Instead, support a -feature 
> argument.
>
>     my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>
> This should work for any instance of Bio::SeqFeatureI. If it is a 
> B::SF::Annotated already it is obviously just a deep copy (if copy is 
> desired - could be another parameter). Otherwise more will be involved.
>
> Alternatively, and possibly better, is to write a specialized 
> SeqFeatureI factory (that would implement 
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>
>     my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>         -type_ontology => $sequence_ontology,
>         -source_ontology => $feature_source_ontology,
>         -unflatten => 1);
>     my $bsfa = $feat_factory->create_object({-feature => $feature});
>
> This is preferable because it separates business logic that isn't 
> necessarily related into defined units. I.e., the logic necessary to 
> convert an ordinary feature into a strongly typed one is different 
> from how to represent a strongly typed feature. IMHO anyway ...
>
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan 
> started as the result of a discussion thread earlier this (or last?) 
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted, 
> though not in concept.
>
> Maybe we need to get together again and thrash out a strategy; or a 
> BOF at the GMOD meeting? I feel this does need a core group of people 
> who care, hash out a strategy that will also solve the backwards 
> compatibility problem with the current Bio::SeqFeatureI 
> state-of-limbo, and allow us to implement the decisions with a few 
> people in a concentrated effort. This will then also remove the only 
> real large stumbling block towards a 1.6 release.
>
> Maybe we should think about a little pre-GMOD hackathon to clear up 
> this mess? Scott, you'll be there a day early? I'll be already back 
> and Jason I believe will still be in town, although he may have other 
> commitments already. Nonetheless, it shouldn't really take that much 
> but rather dedicated time, a whiteboard, and a few people who care 
> thrashing this out and then do it.
>
> Thoughts?
>
>     -hilmar
>
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>
>> Rob,
>>
>> I came to the same conclusion as well; I wrote my response as I was
>> heading out the door and while I was running errands, I realized the
>> right thing to do is to write a Bio::SeqFeature::Annotated method called
>> new_from_object, whose usage would be:
>>
>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, 
>> %args);
>>
>> where you would give it a Bio::SeqFeatureI compliant object and try to
>> create a BSFA like use suggested below.  You could allow passing in args
>> to control how different things are handled, like mapping non-SO types
>> to SO types.  I'll think about this over the weekend and let you know if
>> brilliance strikes me.
>>
>> Scott
>>
>>
>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>> Rather than cobble together some ad-hoc solution, I would be interested
>>> in working on a good solution to this problem, because it seems like
>>> it's just going to get more common as more people start wanting to 
>>> write
>>> GFF3.  What about some code in whatever customarily makes these objects
>>> (probably BSF::Annotated's new() method?) that could take another type
>>> of Feature object and attempt to shoehorn its data into a new
>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>> whatever), it could throw() some informative error message.
>>>
>>> Then, people could write straightforward code something like:
>>>
>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>     $oldstylefeature->something_else('some other something that 
>>> needs to
>>> be changed for compliance');
>>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>>     $gff3_out->write_feature($newfeature);
>>> }
>>>
>>> Does that sound like a good idea?  I'd be more than willing to 
>>> implement
>>> this, since I'm going to need to do this sort of thing with many more
>>> things than just RepeatMasker.
>>>
>>> Rob
>>>
>>> Scott Cain wrote:
>>>> Um, yeah, good question.  The reason I didn't answer you when you 
>>>> wrote
>>>> before is that I was hoping for divine inspiration for an answer 
>>>> (or for
>>>> somebody else to answer, which would have been really great :-)
>>>>
>>>> The short answer (and easy one for me to type) is that you will 
>>>> probably
>>>> need an ad hoc method to do it, which is the same thing I do when I 
>>>> need
>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>>> have any sample code that does this, but if you want to start 
>>>> working up
>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>
>>>>> So about that converting ye olde feature objects into
>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> That's OK--You added a few items that should be escaped that 
>>>>>> weren't, so
>>>>>> I added those too.
>>>>>>
>>>>>> Thanks,
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> Woops, I should have said something about that.  I submitted it 
>>>>>>> before
>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Scott,
>>>>>>>>
>>>>>>>> Looks like Robert also submitted a bug report related to this 
>>>>>>>> as well=
>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> -------------------------------------------------------------------------- 
>>
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
>
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Sat Jun 17 20:21:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 17 Jun 2006 15:21:37 -0500
Subject: [Bioperl-l] OT : Re:  reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1D0C8412-3705-47EF-9AAA-1DD0B09AD6B5@uiuc.edu>


On Jun 17, 2006, at 11:20 AM, Hilmar Lapp wrote:
>
> Maybe we need to get together again and thrash out a strategy; or a
> BOF at the GMOD meeting? I feel this does need a core group of people
> who care, hash out a strategy that will also solve the backwards
> compatibility problem with the current Bio::SeqFeatureI state-of-
> limbo, and allow us to implement the decisions with a few people in a
> concentrated effort. This will then also remove the only real large
> stumbling block towards a 1.6 release.

That would be fantastic!

A bit OT, but if plans are afoot for a 1.6 release maybe the 'core  
group' that meets at NC could start drawing up a list of ideas/plans  
towards that release, even if it is still a ways off.  A roadmap of  
sorts so the community knows where to put forth the majority of their  
effort and focus.

Chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Mon Jun 19 10:16:57 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 19 Jun 2006 12:16:57 +0200
Subject: [Bioperl-l] doc.bioperl
Message-ID: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>

Hi,

I just noted that it can happen that the pages at doc.bioperl.org
state "No synopsis" whereas there is one in the PM file (use perldoc
or the CVS).
An example:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Fasta.html
No synopsis, No description, but

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup

shows both.

So, if you're looking for documentation don't forget to do e.g.
"perldoc Bio::DB::Fasta"

regards,
bernd


From cjfields at uiuc.edu  Mon Jun 19 14:38:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 09:38:01 -0500
Subject: [Bioperl-l] doc.bioperl
In-Reply-To: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com>
Message-ID: <001501c693ad$f7689790$15327e82@pyrimidine>

This has been reported as a bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1926

Jason mentions in the bug report that the POD may contain something that
messes with the way PDOC deals with code so should be rewritten.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Monday, June 19, 2006 5:17 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] doc.bioperl
> 
> Hi,
> 
> I just noted that it can happen that the pages at doc.bioperl.org
> state "No synopsis" whereas there is one in the PM file (use perldoc
> or the CVS).
> An example:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-
> live/Bio/DB/Fasta.html
> No synopsis, No description, but
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-
> live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup
> 
> shows both.
> 
> So, if you're looking for documentation don't forget to do e.g.
> "perldoc Bio::DB::Fasta"
> 
> regards,
> bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Mon Jun 19 14:59:23 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 19 Jun 2006 09:59:23 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <83485BEB-2457-4FD6-90B8-353228868C3A@wustl.edu>

Hi,

Just curious if any other BioPerlers will be at the YAPC conference  
in Chicago next week (http://yapcchicago.org/). Some of us from the  
WashU GSC will be there, and it might be fun to meet some other  
BioPerl people over lunch or something. If there's enough interest, I  
will organize.

By the way, if you're unfamiliar with the conference and are  
interested in attending, I think registration is still open. The fee  
is low ($100).

Dave


-- 
Dave Messina
Informatics Analyst
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1415


From ClarkeW at AGR.GC.CA  Mon Jun 19 22:34:37 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 19 Jun 2006 18:34:37 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>

Hi,

I am getting the following warning and then exception 

 
-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is 1

---------------------------------------------------

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to [ACTG*] which does not look
healthy

 
NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
sequence)

 
when extracting display name and sequence from a MYSQL database. My code
is as follows:

 
my $sql = "select Clone_Name,Sequence from tbl_bgene";

     my $sth = $dbh->prepare($sql);

     $sth->execute();

     while (my $hash = $sth->fetchrow_hashref()) {

          # print("Name: ".$hash->{'Clone_Name'}."\n");

          my $seq = new Bio::Seq(  -display_id     =>
$hash->{'Clone_Name'},

                                   -seq      =>   $hash->{'Sequence'});

          $handle->write_seq($seq);

          # print("Sequence: ".$hash->{'Sequence'}."\n");

     }

 
For some reason it is failing on a particular sequence, which is a valid
DNA sequence. If anyone has any ideas on why this is I would appreciate
it.

 
Thanks, Wayne


From torsten.seemann at infotech.monash.edu.au  Mon Jun 19 23:30:19 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 09:30:19 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <4497338B.3030609@infotech.monash.edu.au>

> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 1
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.

Usually a '*' indicates a STOP codon in a protein sequence.
I don't think it is valid in a DNA sequence?

So my guess is that BioPerl is auto-detecting it as Protein sequence,
as A,C,T,G are all valid amino acids, and * is a stop codon.

So I think BioPerl is doing the right thing.

If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
Bio:Seq constructor.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From taerwin at gmail.com  Tue Jun 20 01:38:14 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 11:38:14 +1000
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>

Hi all,

Does anyone have a runnable for cap3? There seems to be some discussion
about one in the mailing archives (
http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
find any code.


Regards,

Tim


From osborne1 at optonline.net  Tue Jun 20 02:23:43 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 19 Jun 2006 22:23:43 -0400
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
Message-ID: <C0BCD46F.8EA5%osborne1@optonline.net>

Tim,

The code seems to be here, not clear if there's an executable:

http://seq.cs.iastate.edu/download.html


Brian O.


On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:

> Hi all,
> 
> Does anyone have a runnable for cap3? There seems to be some discussion
> about one in the mailing archives (
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot
> find any code.
> 
> 
> 
> Regards,
> 
> Tim
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun 20 03:23:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Jun 2006 22:23:26 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
Message-ID: <000701c69418$e53b9110$15327e82@pyrimidine>

You really haven't given us much to work with more than "this doesn't work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array; hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?  I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>   $hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a valid
> DNA sequence. If anyone has any ideas on why this is I would appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From taerwin at gmail.com  Tue Jun 20 03:05:13 2006
From: taerwin at gmail.com (Tim Erwin)
Date: Tue, 20 Jun 2006 13:05:13 +1000
Subject: [Bioperl-l] cap3 runnable?
In-Reply-To: <C0BCD46F.8EA5%osborne1@optonline.net>
References: <c7d2b5330606191838q76e0b6d6ofbc195d227f1086b@mail.gmail.com>
	<C0BCD46F.8EA5%osborne1@optonline.net>
Message-ID: <c7d2b5330606192005o63ed5d6i608d6b2076399932@mail.gmail.com>

Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3

Regards,

Tim

On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
>
> Tim,
>
> The code seems to be here, not clear if there's an executable:
>
> http://seq.cs.iastate.edu/download.html
>
>
> Brian O.
>
>
> On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
>
> > Hi all,
> >
> > Does anyone have a runnable for cap3? There seems to be some discussion
> > about one in the mailing archives (
> > http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> cannot
> > find any code.
> >
> >
> >
> > Regards,
> >
> > Tim
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


From torsten.seemann at infotech.monash.edu.au  Tue Jun 20 03:07:12 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Jun 2006 13:07:12 +1000
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <4497338B.3030609@infotech.monash.edu.au>
References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca>
	<4497338B.3030609@infotech.monash.edu.au>
Message-ID: <44976660.7030107@infotech.monash.edu.au>

> If you want to force it, try adding a " -alphabet=>'protein' " parameter to the 
> Bio:Seq constructor.

That should be -alphabet => 'dna'.
D'oh!

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From Marc.Logghe at DEVGEN.com  Tue Jun 20 07:13:22 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 20 Jun 2006 09:13:22 +0200
Subject: [Bioperl-l] cap3 runnable?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6D3D60B@ANTARESIA.be.devgen.com>

It is about 3 years old and did not test it with the current bioperl
release.
Feel free to play with it.
Cheers,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Tim Erwin
> Sent: Tuesday, June 20, 2006 5:05 AM
> To: Brian Osborne
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] cap3 runnable?
> 
> Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3
> 
> Regards,
> 
> Tim
> 
> On 6/20/06, Brian Osborne <osborne1 at optonline.net> wrote:
> >
> > Tim,
> >
> > The code seems to be here, not clear if there's an executable:
> >
> > http://seq.cs.iastate.edu/download.html
> >
> >
> > Brian O.
> >
> >
> > On 6/19/06 9:38 PM, "Tim Erwin" <taerwin at gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Does anyone have a runnable for cap3? There seems to be some 
> > > discussion about one in the mailing archives (
> > > 
> http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I
> > cannot
> > > find any code.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Tim
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cap3.pm
Type: application/octet-stream
Size: 3374 bytes
Desc: Cap3.pm
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/0976a7d9/attachment-0004.obj>

From G.Tzotzos at unido.org  Tue Jun 20 09:18:48 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 11:18:48 +0200
Subject: [Bioperl-l] Error message
Message-ID: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D

Wagramerstrasse 5
A-1400 Vienna
Austria

Email: g.tzotzos at unido.org


From G.Tzotzos at unido.org  Tue Jun 20 11:36:18 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 13:36:18 +0200
Subject: [Bioperl-l] Error message
Message-ID: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>

I'm a BioPerl novice. I used CPAN to install BioPerl and run the  
following script to test the installation:

use Bio::Perl;
use strict;
use warnings;

my $seq_object = get_sequence('swissprot', "P09651");

write_sequence(">roa1.fasta", 'fasta', $seq_object);

I used as argument both "ROA1_HUMAN" and "P09651". In both cases I  
get the message below.

Any help on the nature of the problem and how to overcome it would be  
greatly appreciated.

Thanks

George


------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ 
swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ 
WebDBSeqI.pm:153
STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
STACK toplevel tut2.pl:5


George T. Tzotzos Ph.D
Vienna, Austria


From s-merchant at northwestern.edu  Tue Jun 20 14:41:33 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Tue, 20 Jun 2006 09:41:33 -0500
Subject: [Bioperl-l] YAPC anyone?
Message-ID: <002701c69477$9ffa7c10$c2987ca5@pc13>

Hey Dave,
  I am doing a talk on dictyBase at the YAPC . I think it would be great to
meet for lunch. 

Cheers,
Sohel Merchant.

dictyBase
Northwestern University,
Chicago

>

>Just curious if any other BioPerlers will be at the YAPC conference in 

>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
Some of us from the WashU 

>GSC will be there, and it might be fun to meet some other BioPerl 

>people over lunch or something. If there's enough interest, I will 

>organize.

>

>By the way, if you're unfamiliar with the conference and are interested 

>in attending, I think registration is still open. The fee is low 

>($100).

>

>Dave

>

>

>--


From cain at cshl.edu  Tue Jun 20 16:03:26 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 20 Jun 2006 12:03:26 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>
	<449306CF.1030301@cornell.edu>
	<1150486453.4412.30.camel@localhost.localdomain>
	<449307B8.5040802@cornell.edu>
	<1150487731.4412.35.camel@localhost.localdomain>
	<4493150C.1080909@cornell.edu>
	<1150516605.2600.9.camel@localhost.localdomain>
	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
Message-ID: <1150819406.2585.27.camel@localhost.localdomain>

Hi Hilmar,

Of course you are right--I was under the influence of a perl module that
I work with that does something similar, but both of your solutions are
better.

I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
look this week.

As for next week, I plan on spending the day at NESCent on Wednesday
(though I haven't told Todd or Jeff that I am arriving early yet) just
to make sure all the details are in place.  I imagine I'll have a fair
amount of free time to hash this stuff out.  Anyone else who is in town
(that is, in Durham, NC, USA) is welcome to come draw on a white board
too. :-)

Scott


On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> You don't need a new method for this. Instead, support a -feature  
> argument.
> 
> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
> 
> This should work for any instance of Bio::SeqFeatureI. If it is a  
> B::SF::Annotated already it is obviously just a deep copy (if copy is  
> desired - could be another parameter). Otherwise more will be involved.
> 
> Alternatively, and possibly better, is to write a specialized  
> SeqFeatureI factory (that would implement  
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
> 
> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
> 		-type_ontology => $sequence_ontology,
> 		-source_ontology => $feature_source_ontology,
> 		-unflatten => 1);
> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
> 
> This is preferable because it separates business logic that isn't  
> necessarily related into defined units. I.e., the logic necessary to  
> convert an ordinary feature into a strongly typed one is different  
> from how to represent a strongly typed feature. IMHO anyway ...
> 
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
> started as the result of a discussion thread earlier this (or last?)  
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
> though not in concept.
> 
> Maybe we need to get together again and thrash out a strategy; or a  
> BOF at the GMOD meeting? I feel this does need a core group of people  
> who care, hash out a strategy that will also solve the backwards  
> compatibility problem with the current Bio::SeqFeatureI state-of- 
> limbo, and allow us to implement the decisions with a few people in a  
> concentrated effort. This will then also remove the only real large  
> stumbling block towards a 1.6 release.
> 
> Maybe we should think about a little pre-GMOD hackathon to clear up  
> this mess? Scott, you'll be there a day early? I'll be already back  
> and Jason I believe will still be in town, although he may have other  
> commitments already. Nonetheless, it shouldn't really take that much  
> but rather dedicated time, a whiteboard, and a few people who care  
> thrashing this out and then do it.
> 
> Thoughts?
> 
> 	-hilmar
> 
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
> 
> > Rob,
> >
> > I came to the same conclusion as well; I wrote my response as I was
> > heading out the door and while I was running errands, I realized the
> > right thing to do is to write a Bio::SeqFeature::Annotated method  
> > called
> > new_from_object, whose usage would be:
> >
> >   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
> > ($my_BSFI, %args);
> >
> > where you would give it a Bio::SeqFeatureI compliant object and try to
> > create a BSFA like use suggested below.  You could allow passing in  
> > args
> > to control how different things are handled, like mapping non-SO types
> > to SO types.  I'll think about this over the weekend and let you  
> > know if
> > brilliance strikes me.
> >
> > Scott
> >
> >
> > On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
> >> Rather than cobble together some ad-hoc solution, I would be  
> >> interested
> >> in working on a good solution to this problem, because it seems like
> >> it's just going to get more common as more people start wanting to  
> >> write
> >> GFF3.  What about some code in whatever customarily makes these  
> >> objects
> >> (probably BSF::Annotated's new() method?) that could take another  
> >> type
> >> of Feature object and attempt to shoehorn its data into a new
> >> BSF::Annotated?  If it failed (because the type isn't in SO or
> >> whatever), it could throw() some informative error message.
> >>
> >> Then, people could write straightforward code something like:
> >>
> >> while(my $oldstylefeature = $features_in->next_feature) {
> >>     $oldstylefeature->primary_tag('something_that_is_in_so');
> >>     $oldstylefeature->something_else('some other something that  
> >> needs to
> >> be changed for compliance');
> >>     my $newfeature = Bio::SeqFeature::Annotated->new 
> >> ($oldstylefeature);
> >>     $gff3_out->write_feature($newfeature);
> >> }
> >>
> >> Does that sound like a good idea?  I'd be more than willing to  
> >> implement
> >> this, since I'm going to need to do this sort of thing with many more
> >> things than just RepeatMasker.
> >>
> >> Rob
> >>
> >> Scott Cain wrote:
> >>> Um, yeah, good question.  The reason I didn't answer you when you  
> >>> wrote
> >>> before is that I was hoping for divine inspiration for an answer  
> >>> (or for
> >>> somebody else to answer, which would have been really great :-)
> >>>
> >>> The short answer (and easy one for me to type) is that you will  
> >>> probably
> >>> need an ad hoc method to do it, which is the same thing I do when  
> >>> I need
> >>> to convert gff2 to gff3, to make sure the things I need mapped get
> >>> mapped the 'right' way (that is, the way I want them to go).  I  
> >>> don't
> >>> have any sample code that does this, but if you want to start  
> >>> working up
> >>> an ad hoc method, I will certainly try to help you as much as I can.
> >>>
> >>> Scott
> >>>
> >>>
> >>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
> >>>
> >>>> So about that converting ye olde feature objects into
> >>>> Bio::SeqFeature::Annotated objects.  How do I do it?
> >>>>
> >>>>
> >>>> Scott Cain wrote:
> >>>>
> >>>>> That's OK--You added a few items that should be escaped that  
> >>>>> weren't, so
> >>>>> I added those too.
> >>>>>
> >>>>> Thanks,
> >>>>> Scott
> >>>>>
> >>>>>
> >>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
> >>>>>
> >>>>>
> >>>>>> Woops, I should have said something about that.  I submitted  
> >>>>>> it before
> >>>>>> I saw that Scott had already done the escaping in CVS.
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Scott,
> >>>>>>>
> >>>>>>> Looks like Robert also submitted a bug report related to this  
> >>>>>>> as well=
> >>>>>>> ---------------------------------------------------------------- 
> >>>>>>> --------
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
> 
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/4b71554e/attachment.sig>

From osborne1 at optonline.net  Tue Jun 20 16:13:51 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 12:13:51 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org>
Message-ID: <C0BD96FF.8EC3%osborne1@optonline.net>

George,

The docs I'm reading say to use 'swiss', not 'swissprot' but I think there's
some other problem that may be specific to SwissProt. Can you retrieve from
GenBank? E.g.:

my $seq_object = get_sequence('genbank', 2);

Brian O.


On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:

> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
> 
> use Bio::Perl;
> use strict;
> use warnings;
> 
> my $seq_object = get_sequence('swissprot', "P09651");
> 
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
> 
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
> 
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
> 
> Thanks
> 
> George
> 
> 
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
> 
> 
> 
> George T. Tzotzos Ph.D
> Vienna, Austria
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Tzotzos at unido.org  Tue Jun 20 16:21:32 2006
From: G.Tzotzos at unido.org (George Tzotzos)
Date: Tue, 20 Jun 2006 18:21:32 +0200
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
References: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <76750E11-3BD6-42EB-832D-3A12BC6B4BEE@unido.org>

Brian

Neither <swiss> nor <swissprot> work. However, your suggestion does  
work fine. So does Chandan's.  Many thanks to both.

Cheers

George


On 20 Jun 2006, at 18:13, Brian Osborne wrote:

> George,
>
> The docs I'm reading say to use 'swiss', not 'swissprot' but I  
> think there's
> some other problem that may be specific to SwissProt. Can you  
> retrieve from
> GenBank? E.g.:
>
> my $seq_object = get_sequence('genbank', 2);
>
> Brian O.
>
>
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>
>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>> following script to test the installation:
>>
>> use Bio::Perl;
>> use strict;
>> use warnings;
>>
>> my $seq_object = get_sequence('swissprot', "P09651");
>>
>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>
>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>> get the message below.
>>
>> Any help on the nature of the problem and how to overcome it would be
>> greatly appreciated.
>>
>> Thanks
>>
>> George
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>> swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>> WebDBSeqI.pm:153
>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>> STACK toplevel tut2.pl:5
>>
>>
>>
>> George T. Tzotzos Ph.D
>> Vienna, Austria
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From ClarkeW at AGR.GC.CA  Tue Jun 20 16:57:34 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 12:57:34 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>


The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
trace is 
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
STACK: /home/wayne/bin/mast_fasta.pl:59

And the full script is attached. 

However I would like to clarify that the actual sequence is not ACTG*,
this was a notation to represent that I had checked it to be sure that
it was a valid DNA sequence but due to confidentiality I cannot disclose
the actual sequence. I know this makes it more difficult and that I
perhaps should have been clearer about this originally. The $handle is a
Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
name  

'Clone_Name' => 'sJ1485'
        };
then the error message. I hope this is more helpful than my last
message.

Thanks, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mast_fasta.pl
Type: application/octet-stream
Size: 1998 bytes
Desc: mast_fasta.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/53770697/attachment-0004.obj>

From cjfields at uiuc.edu  Tue Jun 20 17:16:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:16:32 -0500
Subject: [Bioperl-l] Error message
In-Reply-To: <C0BD96FF.8EC3%osborne1@optonline.net>
Message-ID: <000c01c6948d$46e992d0$15327e82@pyrimidine>

Brian,

Brian,

Looks like EBI switched the url parameter for swissprot 'swall' to
'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
this and solves the issue.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Tuesday, June 20, 2006 11:14 AM
> To: George Tzotzos; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Error message
> 
> George,
> 
> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
> there's
> some other problem that may be specific to SwissProt. Can you retrieve
> from
> GenBank? E.g.:
> 
> my $seq_object = get_sequence('genbank', 2);
> 
> Brian O.
> 
> 
> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
> 
> > I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> > following script to test the installation:
> >
> > use Bio::Perl;
> > use strict;
> > use warnings;
> >
> > my $seq_object = get_sequence('swissprot', "P09651");
> >
> > write_sequence(">roa1.fasta", 'fasta', $seq_object);
> >
> > I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> > get the message below.
> >
> > Any help on the nature of the problem and how to overcome it would be
> > greatly appreciated.
> >
> > Thanks
> >
> > George
> >
> >
> > ------------- EXCEPTION  -------------
> > MSG: swissprot stream with no ID. Not swissprot in my book
> > STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> > swiss.pm:179
> > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> > WebDBSeqI.pm:153
> > STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> > STACK toplevel tut2.pl:5
> >
> >
> >
> > George T. Tzotzos Ph.D
> > Vienna, Austria
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chandan.kr.singh at gmail.com  Tue Jun 20 14:46:01 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Tue, 20 Jun 2006 20:16:01 +0530
Subject: [Bioperl-l] Error message
In-Reply-To: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
References: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org>
Message-ID: <2d4f320606200746ja53cebs73923c510b535c44@mail.gmail.com>

Hi
It seems the 'swall' servertype on EBI no longer exists. May be this  has
already been reported  and debugged. I hope somebody throws light on it.

As for George, if u r in hurry u can use Bio::DB::SwissProt module directly.
Here is a typical code to do this

use strict ;
use warnings ;
use Bio::DB::SwissProt ;
use Bio::Perl ;
my $seq_obj = new Bio::DB::SwissProt('-servertype' => 'expasy' ,
'-hostlocation' => 'us') ;
my $seq = $seq_obj->get_Seq_by_id('ROA1_HUMAN') ;
write_sequence("> roa.sp" , 'fasta' , $seq) ;


See the module for any help .

cheers
Chandan


On 6/20/06, George Tzotzos <G.Tzotzos at unido.org> wrote:
>
> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
> following script to test the installation:
>
> use Bio::Perl;
> use strict;
> use warnings;
>
> my $seq_object = get_sequence('swissprot', "P09651");
>
> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>
> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
> get the message below.
>
> Any help on the nature of the problem and how to overcome it would be
> greatly appreciated.
>
> Thanks
>
> George
>
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
> swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
> WebDBSeqI.pm:153
> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
> STACK toplevel tut2.pl:5
>
>
>
>
>
> George T. Tzotzos Ph.D
>
> Wagramerstrasse 5
> A-1400 Vienna
> Austria
>
> Email: g.tzotzos at unido.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From osborne1 at optonline.net  Tue Jun 20 17:33:07 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 20 Jun 2006 13:33:07 -0400
Subject: [Bioperl-l] Error message
In-Reply-To: <000c01c6948d$46e992d0$15327e82@pyrimidine>
Message-ID: <C0BDA993.8ED3%osborne1@optonline.net>

Chris,

You beat me to it!

Brian O.


On 6/20/06 1:16 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Brian,
> 
> Brian,
> 
> Looks like EBI switched the url parameter for swissprot 'swall' to
> 'UniProtKB'.  I committed a change to Bio::DB::SwissProt in CVS which fixes
> this and solves the issue.
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Tuesday, June 20, 2006 11:14 AM
>> To: George Tzotzos; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Error message
>> 
>> George,
>> 
>> The docs I'm reading say to use 'swiss', not 'swissprot' but I think
>> there's
>> some other problem that may be specific to SwissProt. Can you retrieve
>> from
>> GenBank? E.g.:
>> 
>> my $seq_object = get_sequence('genbank', 2);
>> 
>> Brian O.
>> 
>> 
>> On 6/20/06 7:36 AM, "George Tzotzos" <G.Tzotzos at unido.org> wrote:
>> 
>>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the
>>> following script to test the installation:
>>> 
>>> use Bio::Perl;
>>> use strict;
>>> use warnings;
>>> 
>>> my $seq_object = get_sequence('swissprot', "P09651");
>>> 
>>> write_sequence(">roa1.fasta", 'fasta', $seq_object);
>>> 
>>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I
>>> get the message below.
>>> 
>>> Any help on the nature of the problem and how to overcome it would be
>>> greatly appreciated.
>>> 
>>> Thanks
>>> 
>>> George
>>> 
>>> 
>>> ------------- EXCEPTION  -------------
>>> MSG: swissprot stream with no ID. Not swissprot in my book
>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/
>>> swiss.pm:179
>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/
>>> WebDBSeqI.pm:153
>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513
>>> STACK toplevel tut2.pl:5
>>> 
>>> 
>>> 
>>> George T. Tzotzos Ph.D
>>> Vienna, Austria
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue Jun 20 17:44:42 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 20 Jun 2006 13:44:42 -0400
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
Message-ID: <320530F83FA47047823E57F110DDEAADB15A66@onncrxms4.agr.gc.ca>

Hi all, 

It seems that there is a newline character which is causing the problem,
this wasn't obvious at first due to the size of my shell window but that
is what is giving the mismatch error. Thanks to Chris and Torsten for
the help and for pointing me in the direction of validate_seq which was
helpful in finding the problem.

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Monday, June 19, 2006 9:23 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq

You really haven't given us much to work with more than "this doesn't
work."
We need the following information, otherwise we can't do anything.  

1)  Bioperl version (1.4, 1.5.1, live)
2)  OS
3)  The exception trace (not just the chunk you've shown)
4)  The full script.  What is $handle?  A Bio::SeqIO object?  

At first glance I would say Torsten's right, that it could be the '*' in
the
sequence.  The problem is, I don't think validate_seq (from PrimarySeq
and
where the warning came from) distinguishes between nucleotides and amino
acids, and it allows for '*' and various gap symbols in sequences.  If
this
caused the problem, the error would be: 

MSG: seq doesn't validate, mismatch is *

The actual error is:

MSG: seq doesn't validate, mismatch is 1

It looks like something is being evaluated in the wrong context (scalar
context is expected, but looks like it's evaluating a list).  Maybe it
thinks $hash->{'Sequence'} is a complex data type such as an array;
hence
the mismatch is 1.  What do you get printing $hash using Data::Dumper?
I
tried using this anon hash and it work fine when a new Bio::Seq is
constructed.

my $hash = {'Clone_Name' => 'test',
            'Sequence'   => 'ACTG*'};

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Monday, June 19, 2006 5:35 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> Hi,
> 
> I am getting the following warning and then exception
> 
> 
> 
> -------------------- WARNING ---------------------
> 
> MSG: seq doesn't validate, mismatch is 1
> 
> ---------------------------------------------------
> 
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: Attempting to set the sequence to [ACTG*] which does not look
> healthy
> 
> 
> 
> NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA
> sequence)
> 
> 
> 
> when extracting display name and sequence from a MYSQL database. My
code
> is as follows:
> 
> 
> 
> my $sql = "select Clone_Name,Sequence from tbl_bgene";
> 
>      my $sth = $dbh->prepare($sql);
> 
>      $sth->execute();
> 
>      while (my $hash = $sth->fetchrow_hashref()) {
> 
>           # print("Name: ".$hash->{'Clone_Name'}."\n");
> 
>           my $seq = new Bio::Seq(  -display_id     =>
> $hash->{'Clone_Name'},
> 
>                                    -seq      =>
$hash->{'Sequence'});
> 
>           $handle->write_seq($seq);
> 
>           # print("Sequence: ".$hash->{'Sequence'}."\n");
> 
>      }
> 
> 
> 
> For some reason it is failing on a particular sequence, which is a
valid
> DNA sequence. If anyone has any ideas on why this is I would
appreciate
> it.
> 
> 
> 
> Thanks, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Jun 20 17:55:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 12:55:28 -0500
Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca>
Message-ID: <000e01c69492$b74e0ec0$15327e82@pyrimidine>

> -----Original Message-----
> From: Clarke, Wayne [mailto:ClarkeW at AGR.GC.CA]
> Sent: Tuesday, June 20, 2006 11:58 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq
> 
> 
> The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the
> trace is
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498
> STACK: /home/wayne/bin/mast_fasta.pl:59
> 
> And the full script is attached.

Have you tried a newer version of Bioperl to see if it fixed the issue?  v.
1.5.1 has been out for a bit now and it's pretty stable.

> However I would like to clarify that the actual sequence is not ACTG*,
> this was a notation to represent that I had checked it to be sure that
> it was a valid DNA sequence but due to confidentiality I cannot disclose
> the actual sequence. I know this makes it more difficult and that I
> perhaps should have been clearer about this originally. 

That's not a problem.  We run into that here a bit.  Example data is fine.

> The $handle is a
> Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone
> name
> 
> 'Clone_Name' => 'sJ1485'
>         };
> then the error message. I hope this is more helpful than my last
> message.
> 
> Thanks, Wayne

Make sure you aren't using bioperl-specific methods when you run
Data::Dumper on your hash or the script crashes.

Okay, I was able to reproduce your error using PrimarySeq from v. 1.4 (BTW,
the error message changes if you use a newer version of Bioperl but it is
still there).  See if you can follow me here...

I used this script:
-------------------------
use Bio::Seq;
use Bio::SeqIO;
use Data::Dumper;

my $hash = {'Clone'     => 'test',
            'Sequence'  => 'ACTG*'};

my $seqout = Bio::SeqIO->new (-format   => 'fasta',
                              -fh       => \*STDOUT);

print Dumper($hash);

my $seq = Bio::Seq->new(-seq            => $hash->{'Sequence'},
                        -display_id     => $hash->{'Clone'});

$seqout->write_seq($seq);
-------------------------

And everything works fine, with this output:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => 'ACTG*'
        };
>test
ACTG*

Changing the anonymous hash to this causes the crash and error.

my $hash = {'Clone'     => 'test',
            'Sequence'  => ['ACTG*']};

Gets this:

$VAR1 = {
          'Clone' => 'test',
          'Sequence' => [
                          'ACTG*'
                        ]
        };

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 1
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [ARRAY(0x2354b0)] which does not look
healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\core/Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::seq C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:268
STACK: Bio::PrimarySeq::new C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:217
STACK: Bio::Seq::new C:\Perl\src\bioperl\core/Bio/Seq.pm:497
STACK: C:\Perl\Scripts\seq-test\test.pl:17
-----------------------------------------------------------

It could be that the sequence data is stored in another complex data type
(object, hash) that's causing the problem.  Looks like you retrieve your
hash from another method ('my $hash = $sth->fetchrow_hashref()'); you might
want to check that method to make sure you're getting the right kind of data
into your hash.
 
Chris


From rmb32 at cornell.edu  Tue Jun 20 18:09:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 20 Jun 2006 12:09:38 -0600
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <1150819406.2585.27.camel@localhost.localdomain>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
Message-ID: <449839E2.5080402@cornell.edu>

Getting to know this code a little better, I notice a couple of little 
things: 

1.) my patch attached to bug 2026 draws unnecessary distinctions between 
feature types that use tags, and those that use annotations, since all 
features are now Bio::AnnotatableI's and the *_tags_* methods are 
implemented in AnnotatableI in terms of annotation objects now.  You 
guys should probably just ignore it, since from the sound of it you're 
going to be changing all of this around anyway.  Wish I could be there 
to help and learn more.

2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar 
accessors to use when translating Bio::Annotation::* objects to and from 
scalar tags.  Seems to me, this would be much better accomplished by 
using polymorphism of some sort, probably adding a multipurpose as_tag() 
accessor in Bio::AnnotationI and the objects that implement it, then 
using that in Bio::AnnotatableI instead of %tag2text.  Does this make 
sense, or am I misinterpreting something here?  Reason I've noticed this 
is because I've been wrestling with how to translate  
Bio::Annotation::Target objects to and from scalar tag values, since a 
Target is being represented as an ordered list of 3 or 4 scalar tags in 
old things that were designed to interoperate with gff2, and I can't 
figure out a nice way to do it using the rather inflexible %tag2text 
mechanism.

Sorry to be a pain, just wanted to get that in there before you guys 
start your jam session in Durham.

Rob

Scott Cain wrote:
> Hi Hilmar,
>
> Of course you are right--I was under the influence of a perl module that
> I work with that does something similar, but both of your solutions are
> better.
>
> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
> look this week.
>
> As for next week, I plan on spending the day at NESCent on Wednesday
> (though I haven't told Todd or Jeff that I am arriving early yet) just
> to make sure all the details are in place.  I imagine I'll have a fair
> amount of free time to hash this stuff out.  Anyone else who is in town
> (that is, in Durham, NC, USA) is welcome to come draw on a white board
> too. :-)
>
> Scott
>
>
> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>   
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> You don't need a new method for this. Instead, support a -feature  
>> argument.
>>
>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>
>> This should work for any instance of Bio::SeqFeatureI. If it is a  
>> B::SF::Annotated already it is obviously just a deep copy (if copy is  
>> desired - could be another parameter). Otherwise more will be involved.
>>
>> Alternatively, and possibly better, is to write a specialized  
>> SeqFeatureI factory (that would implement  
>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>
>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>> 		-type_ontology => $sequence_ontology,
>> 		-source_ontology => $feature_source_ontology,
>> 		-unflatten => 1);
>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>
>> This is preferable because it separates business logic that isn't  
>> necessarily related into defined units. I.e., the logic necessary to  
>> convert an ordinary feature into a strongly typed one is different  
>> from how to represent a strongly typed feature. IMHO anyway ...
>>
>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan  
>> started as the result of a discussion thread earlier this (or last?)  
>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,  
>> though not in concept.
>>
>> Maybe we need to get together again and thrash out a strategy; or a  
>> BOF at the GMOD meeting? I feel this does need a core group of people  
>> who care, hash out a strategy that will also solve the backwards  
>> compatibility problem with the current Bio::SeqFeatureI state-of- 
>> limbo, and allow us to implement the decisions with a few people in a  
>> concentrated effort. This will then also remove the only real large  
>> stumbling block towards a 1.6 release.
>>
>> Maybe we should think about a little pre-GMOD hackathon to clear up  
>> this mess? Scott, you'll be there a day early? I'll be already back  
>> and Jason I believe will still be in town, although he may have other  
>> commitments already. Nonetheless, it shouldn't really take that much  
>> but rather dedicated time, a whiteboard, and a few people who care  
>> thrashing this out and then do it.
>>
>> Thoughts?
>>
>> 	-hilmar
>>
>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>
>>     
>>> Rob,
>>>
>>> I came to the same conclusion as well; I wrote my response as I was
>>> heading out the door and while I was running errands, I realized the
>>> right thing to do is to write a Bio::SeqFeature::Annotated method  
>>> called
>>> new_from_object, whose usage would be:
>>>
>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object 
>>> ($my_BSFI, %args);
>>>
>>> where you would give it a Bio::SeqFeatureI compliant object and try to
>>> create a BSFA like use suggested below.  You could allow passing in  
>>> args
>>> to control how different things are handled, like mapping non-SO types
>>> to SO types.  I'll think about this over the weekend and let you  
>>> know if
>>> brilliance strikes me.
>>>
>>> Scott
>>>
>>>
>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>       
>>>> Rather than cobble together some ad-hoc solution, I would be  
>>>> interested
>>>> in working on a good solution to this problem, because it seems like
>>>> it's just going to get more common as more people start wanting to  
>>>> write
>>>> GFF3.  What about some code in whatever customarily makes these  
>>>> objects
>>>> (probably BSF::Annotated's new() method?) that could take another  
>>>> type
>>>> of Feature object and attempt to shoehorn its data into a new
>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>> whatever), it could throw() some informative error message.
>>>>
>>>> Then, people could write straightforward code something like:
>>>>
>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>     $oldstylefeature->something_else('some other something that  
>>>> needs to
>>>> be changed for compliance');
>>>>     my $newfeature = Bio::SeqFeature::Annotated->new 
>>>> ($oldstylefeature);
>>>>     $gff3_out->write_feature($newfeature);
>>>> }
>>>>
>>>> Does that sound like a good idea?  I'd be more than willing to  
>>>> implement
>>>> this, since I'm going to need to do this sort of thing with many more
>>>> things than just RepeatMasker.
>>>>
>>>> Rob
>>>>
>>>> Scott Cain wrote:
>>>>         
>>>>> Um, yeah, good question.  The reason I didn't answer you when you  
>>>>> wrote
>>>>> before is that I was hoping for divine inspiration for an answer  
>>>>> (or for
>>>>> somebody else to answer, which would have been really great :-)
>>>>>
>>>>> The short answer (and easy one for me to type) is that you will  
>>>>> probably
>>>>> need an ad hoc method to do it, which is the same thing I do when  
>>>>> I need
>>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>>> mapped the 'right' way (that is, the way I want them to go).  I  
>>>>> don't
>>>>> have any sample code that does this, but if you want to start  
>>>>> working up
>>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>
>>>>>           
>>>>>> So about that converting ye olde feature objects into
>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>
>>>>>>
>>>>>> Scott Cain wrote:
>>>>>>
>>>>>>             
>>>>>>> That's OK--You added a few items that should be escaped that  
>>>>>>> weren't, so
>>>>>>> I added those too.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Scott
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Woops, I should have said something about that.  I submitted  
>>>>>>>> it before
>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Scott,
>>>>>>>>>
>>>>>>>>> Looks like Robert also submitted a bug report related to this  
>>>>>>>>> as well=
>>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>>> --------
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>                   
>>> -- 
>>> ---------------------------------------------------------------------- 
>>> --
>>> Scott Cain, Ph. D.                                          
>>> cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> - --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (Darwin)
>>
>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>> ImoAXD/jrbF0gXzSr2CY4tQ=
>> =XfDq
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From hlapp at gmx.net  Tue Jun 20 18:24:45 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 20 Jun 2006 14:24:45 -0400
Subject: [Bioperl-l] reading and writing GFF3
In-Reply-To: <449839E2.5080402@cornell.edu>
References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine>	<449306CF.1030301@cornell.edu>	<1150486453.4412.30.camel@localhost.localdomain>	<449307B8.5040802@cornell.edu>	<1150487731.4412.35.camel@localhost.localdomain>	<4493150C.1080909@cornell.edu>	<1150516605.2600.9.camel@localhost.localdomain>	<35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net>
	<1150819406.2585.27.camel@localhost.localdomain>
	<449839E2.5080402@cornell.edu>
Message-ID: <A3627468-CCA4-41FD-8C09-F5E1BFCE67D0@gmx.net>

Yes, this is the sore problem area. AnnotatableI used to have only a  
single method (annotation()), the *_tag_* methods are new since 1.5  
(and truly a developer release feature - don't rely on them staying).

Likewise, the tag2text is an utterly ugly artifact (after all, this  
is an interface) rooted in the above addition. If we can't manage to  
remove it I'll remove my name from that module ;)

	-hilmar

On Jun 20, 2006, at 2:09 PM, Robert Buels wrote:

> Getting to know this code a little better, I notice a couple of little
> things:
>
> 1.) my patch attached to bug 2026 draws unnecessary distinctions  
> between
> feature types that use tags, and those that use annotations, since all
> features are now Bio::AnnotatableI's and the *_tags_* methods are
> implemented in AnnotatableI in terms of annotation objects now.  You
> guys should probably just ignore it, since from the sound of it you're
> going to be changing all of this around anyway.  Wish I could be there
> to help and learn more.
>
> 2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar
> accessors to use when translating Bio::Annotation::* objects to and  
> from
> scalar tags.  Seems to me, this would be much better accomplished by
> using polymorphism of some sort, probably adding a multipurpose  
> as_tag()
> accessor in Bio::AnnotationI and the objects that implement it, then
> using that in Bio::AnnotatableI instead of %tag2text.  Does this make
> sense, or am I misinterpreting something here?  Reason I've noticed  
> this
> is because I've been wrestling with how to translate
> Bio::Annotation::Target objects to and from scalar tag values, since a
> Target is being represented as an ordered list of 3 or 4 scalar  
> tags in
> old things that were designed to interoperate with gff2, and I can't
> figure out a nice way to do it using the rather inflexible %tag2text
> mechanism.
>
> Sorry to be a pain, just wanted to get that in there before you guys
> start your jam session in Durham.
>
> Rob
>
> Scott Cain wrote:
>> Hi Hilmar,
>>
>> Of course you are right--I was under the influence of a perl  
>> module that
>> I work with that does something similar, but both of your  
>> solutions are
>> better.
>>
>> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
>> look this week.
>>
>> As for next week, I plan on spending the day at NESCent on Wednesday
>> (though I haven't told Todd or Jeff that I am arriving early yet)  
>> just
>> to make sure all the details are in place.  I imagine I'll have a  
>> fair
>> amount of free time to hash this stuff out.  Anyone else who is in  
>> town
>> (that is, in Durham, NC, USA) is welcome to come draw on a white  
>> board
>> too. :-)
>>
>> Scott
>>
>>
>> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> You don't need a new method for this. Instead, support a -feature
>>> argument.
>>>
>>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>>
>>> This should work for any instance of Bio::SeqFeatureI. If it is a
>>> B::SF::Annotated already it is obviously just a deep copy (if  
>>> copy is
>>> desired - could be another parameter). Otherwise more will be  
>>> involved.
>>>
>>> Alternatively, and possibly better, is to write a specialized
>>> SeqFeatureI factory (that would implement
>>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>>
>>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>>> 		-type_ontology => $sequence_ontology,
>>> 		-source_ontology => $feature_source_ontology,
>>> 		-unflatten => 1);
>>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>>
>>> This is preferable because it separates business logic that isn't
>>> necessarily related into defined units. I.e., the logic necessary to
>>> convert an ordinary feature into a strongly typed one is different
>>> from how to represent a strongly typed feature. IMHO anyway ...
>>>
>>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan
>>> started as the result of a discussion thread earlier this (or last?)
>>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,
>>> though not in concept.
>>>
>>> Maybe we need to get together again and thrash out a strategy; or a
>>> BOF at the GMOD meeting? I feel this does need a core group of  
>>> people
>>> who care, hash out a strategy that will also solve the backwards
>>> compatibility problem with the current Bio::SeqFeatureI state-of-
>>> limbo, and allow us to implement the decisions with a few people  
>>> in a
>>> concentrated effort. This will then also remove the only real large
>>> stumbling block towards a 1.6 release.
>>>
>>> Maybe we should think about a little pre-GMOD hackathon to clear up
>>> this mess? Scott, you'll be there a day early? I'll be already back
>>> and Jason I believe will still be in town, although he may have  
>>> other
>>> commitments already. Nonetheless, it shouldn't really take that much
>>> but rather dedicated time, a whiteboard, and a few people who care
>>> thrashing this out and then do it.
>>>
>>> Thoughts?
>>>
>>> 	-hilmar
>>>
>>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>>
>>>
>>>> Rob,
>>>>
>>>> I came to the same conclusion as well; I wrote my response as I was
>>>> heading out the door and while I was running errands, I realized  
>>>> the
>>>> right thing to do is to write a Bio::SeqFeature::Annotated method
>>>> called
>>>> new_from_object, whose usage would be:
>>>>
>>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object
>>>> ($my_BSFI, %args);
>>>>
>>>> where you would give it a Bio::SeqFeatureI compliant object and  
>>>> try to
>>>> create a BSFA like use suggested below.  You could allow passing in
>>>> args
>>>> to control how different things are handled, like mapping non-SO  
>>>> types
>>>> to SO types.  I'll think about this over the weekend and let you
>>>> know if
>>>> brilliance strikes me.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>>
>>>>> Rather than cobble together some ad-hoc solution, I would be
>>>>> interested
>>>>> in working on a good solution to this problem, because it seems  
>>>>> like
>>>>> it's just going to get more common as more people start wanting to
>>>>> write
>>>>> GFF3.  What about some code in whatever customarily makes these
>>>>> objects
>>>>> (probably BSF::Annotated's new() method?) that could take another
>>>>> type
>>>>> of Feature object and attempt to shoehorn its data into a new
>>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>>> whatever), it could throw() some informative error message.
>>>>>
>>>>> Then, people could write straightforward code something like:
>>>>>
>>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>>     $oldstylefeature->something_else('some other something that
>>>>> needs to
>>>>> be changed for compliance');
>>>>>     my $newfeature = Bio::SeqFeature::Annotated->new
>>>>> ($oldstylefeature);
>>>>>     $gff3_out->write_feature($newfeature);
>>>>> }
>>>>>
>>>>> Does that sound like a good idea?  I'd be more than willing to
>>>>> implement
>>>>> this, since I'm going to need to do this sort of thing with  
>>>>> many more
>>>>> things than just RepeatMasker.
>>>>>
>>>>> Rob
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> Um, yeah, good question.  The reason I didn't answer you when you
>>>>>> wrote
>>>>>> before is that I was hoping for divine inspiration for an answer
>>>>>> (or for
>>>>>> somebody else to answer, which would have been really great :-)
>>>>>>
>>>>>> The short answer (and easy one for me to type) is that you will
>>>>>> probably
>>>>>> need an ad hoc method to do it, which is the same thing I do when
>>>>>> I need
>>>>>> to convert gff2 to gff3, to make sure the things I need mapped  
>>>>>> get
>>>>>> mapped the 'right' way (that is, the way I want them to go).  I
>>>>>> don't
>>>>>> have any sample code that does this, but if you want to start
>>>>>> working up
>>>>>> an ad hoc method, I will certainly try to help you as much as  
>>>>>> I can.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> So about that converting ye olde feature objects into
>>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>>
>>>>>>>
>>>>>>> Scott Cain wrote:
>>>>>>>
>>>>>>>
>>>>>>>> That's OK--You added a few items that should be escaped that
>>>>>>>> weren't, so
>>>>>>>> I added those too.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Woops, I should have said something about that.  I submitted
>>>>>>>>> it before
>>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>>
>>>>>>>>> Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Scott,
>>>>>>>>>>
>>>>>>>>>> Looks like Robert also submitted a bug report related to this
>>>>>>>>>> as well=
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> ---
>>>>>>>>>> --------
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------- 
>>>> ---
>>>> --
>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> - --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (Darwin)
>>>
>>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>>> ImoAXD/jrbF0gXzSr2CY4tQ=
>>> =XfDq
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> -------------------------------------------------------------------- 
>>> ----
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 20 20:22:45 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Jun 2006 21:22:45 +0100
Subject: [Bioperl-l] Bio::Map changes
Message-ID: <44985915.8010607@sendu.me.uk>

Some initial changes have been made to some modules in Bio::Map to allow 
Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
(see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)

Further changes are needed in some remaining Bio::Map modules for this 
addition to be complete (a number of Bio::Map related tests in the test 
suite currently fail), notably Bio::Map::Cyto* since they had 
implemented their own Range-related features.

I propose bringing all Bio::Map into line so it behaves with and makes 
good use of the RangeI nature of Position. Beyond this initial change I 
want to add relative positioning and more, but I'll describe that in a 
future post to this thread.

Can anyone see any issues with ranged positions (it's done in a backward 
compatible way)? Do any developers want to maintain control of a 
Bio::Map module or shall I just dive in?


From cjfields at uiuc.edu  Wed Jun 21 03:50:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Jun 2006 22:50:55 -0500
Subject: [Bioperl-l] EUtilities interface
Message-ID: <002301c694e5$e5f3a750$15327e82@pyrimidine>

I'm working on a new eutilities interface which I hope to commit by late
summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
generic web database interface, which I call Bio::DB::WebDBI, and the
EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
NCBI for any information available via Entrez Utilities (i.e. taxonomy,
pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
info like Bio::DB::WebDBSeqI.  

My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
Does anyone think this will be an issue?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From bix at sendu.me.uk  Wed Jun 21 08:20:37 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 09:20:37 +0100
Subject: [Bioperl-l] Bio::RangeI intersection proposal
Message-ID: <44990155.6050501@sendu.me.uk>

Bio::Map::PositionI (in bioperl-live) needs intersections of a list of 
ranges. It inherits from Bio::RangeI but unlike RangeI's union, 
intersection does not take a list. PositionI currently calls 
intersection repeatedly to handle a list.

If there is no particular reason for this limitation, I propose making 
RangeI intersection handle lists natively. This won't do any harm to 
existing code at the time of the change, but its possible that someone 
has written a module that implements RangeI but overrides intersection 
(without making it accept a list), so that future code written that 
expects a RangeI to handle lists will break when getting a RangeI from 
that module.

So the question is, has anyone overridden intersection in RangeI? Is the 
small risk of possible breakage compensated by the benefit of 
intersections of a list of ranges (which is surely useful in lots of 
situations, not just for PositionI)?

I'm tempted to go ahead with this unless there are objections.


From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 07:19:12 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 09:19:12 +0200
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <002701c69477$9ffa7c10$c2987ca5@pc13>
References: <002701c69477$9ffa7c10$c2987ca5@pc13>
Message-ID: <4498F2F0.7010203@biomax.com>

Sohel Merchant wrote:

> 
>>Just curious if any other BioPerlers will be at the YAPC conference in 
> 
> 
>>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).

Not in chicago, but yesterday I got the OK from Biomax management to go 
the YAPC::Europe, http://www.birmingham2006.com/. So in the end of 
August I'll be in Birmingham. Yeah!

Is anybody interested in writing parsers for Perl 6 there?

CU, Bernhard


-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************


From cjfields at uiuc.edu  Wed Jun 21 15:08:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:08:28 -0500
Subject: [Bioperl-l] YAPC anyone?
In-Reply-To: <4498F2F0.7010203@biomax.com>
Message-ID: <000301c69544$8d537710$15327e82@pyrimidine>

Speaking of Perl6, there was interest here at one point in getting a
bioperl-experimental going, which at this point in the game should involve
Perl6.  If there were enough interest in it we could probably get it set up
via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
experimental modules in some way to prevent confusion (bioperl6-live???),
though I'm not up to speed Perl6-wise so I'm not sure about namespace
collisions and so on.

bioperl-experimental would be, like the name implies, a sort of testing
ground for ideas (good and bad).  It seemed like it was going to take off a
few years ago but it lost steam, I'm guess.

As for your parsers, would you build them from the ground up (i.e. from
Bio::Root::Root on up)?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernhard Schmalhofer
> Sent: Wednesday, June 21, 2006 2:19 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Sohel Merchant
> Subject: Re: [Bioperl-l] YAPC anyone?
> 
> Sohel Merchant wrote:
> 
> >
> >>Just curious if any other BioPerlers will be at the YAPC conference in
> >
> >
> >>Chicago next week ( <http://yapcchicago.org/> http://yapcchicago.org/).
> 
> Not in chicago, but yesterday I got the OK from Biomax management to go
> the YAPC::Europe, http://www.birmingham2006.com/. So in the end of
> August I'll be in Birmingham. Yeah!
> 
> Is anybody interested in writing parsers for Perl 6 there?
> 
> CU, Bernhard
> 
> 
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 15:16:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:16:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <44990155.6050501@sendu.me.uk>
Message-ID: <000401c69545$a4a3ad30$15327e82@pyrimidine>

I personally have no objections as long as it doesn't break API.  Don't know
how the senior guys feel (Jason, Brian, Heikki, Hilmar...); I'm not a user
of Bio::Map modules myself.

Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, June 21, 2006 3:21 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::RangeI intersection proposal
> 
> Bio::Map::PositionI (in bioperl-live) needs intersections of a list of
> ranges. It inherits from Bio::RangeI but unlike RangeI's union,
> intersection does not take a list. PositionI currently calls
> intersection repeatedly to handle a list.
> 
> If there is no particular reason for this limitation, I propose making
> RangeI intersection handle lists natively. This won't do any harm to
> existing code at the time of the change, but its possible that someone
> has written a module that implements RangeI but overrides intersection
> (without making it accept a list), so that future code written that
> expects a RangeI to handle lists will break when getting a RangeI from
> that module.
> 
> So the question is, has anyone overridden intersection in RangeI? Is the
> small risk of possible breakage compensated by the benefit of
> intersections of a list of ranges (which is surely useful in lots of
> situations, not just for PositionI)?
> 
> I'm tempted to go ahead with this unless there are objections.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed Jun 21 15:24:47 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:24:47 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000401c69545$a4a3ad30$15327e82@pyrimidine>
References: <000401c69545$a4a3ad30$15327e82@pyrimidine>
Message-ID: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>


On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:

> Actually, sounds weird to have me say "senior guys"; I'm 35 years old!

Actually, it doesn't go by age but by the amount of hair you still  
have. ;)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 15:28:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 10:28:58 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net>
Message-ID: <000501c69547$6a9f28b0$15327e82@pyrimidine>

Then I'm really a senior guy...

; {

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:25 AM
> To: Chris Fields
> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> 
> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> 
> > Actually, sounds weird to have me say "senior guys"; I'm 35 years old!
> 
> Actually, it doesn't go by age but by the amount of hair you still
> have. ;)
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed Jun 21 15:53:08 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Jun 2006 11:53:08 -0400
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <000501c69547$6a9f28b0$15327e82@pyrimidine>
References: <000501c69547$6a9f28b0$15327e82@pyrimidine>
Message-ID: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>

We could run a Mr Seniority competition at BOSC with the attendees  
judging who got the weirdest looking hair loss. You'd take the  
challenge? The judging panel would need to be gender-mixed though.

On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:

> Then I'm really a senior guy...
>
> ; {
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, June 21, 2006 10:25 AM
>> To: Chris Fields
>> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
>>
>>
>> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
>>
>>> Actually, sounds weird to have me say "senior guys"; I'm 35 years  
>>> old!
>>
>> Actually, it doesn't go by age but by the amount of hair you still
>> have. ;)
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 21 16:08:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 11:08:17 -0500
Subject: [Bioperl-l] Bio::RangeI intersection proposal
In-Reply-To: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net>
Message-ID: <000301c6954c$e89c7a60$15327e82@pyrimidine>

I'd love to be at BOSC but I can't go (finishing up my postdoc this year,
which is probably the primary cause of my hair loss).  Would the judges
accept a recent picture?

Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, June 21, 2006 10:53 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> 
> We could run a Mr Seniority competition at BOSC with the attendees
> judging who got the weirdest looking hair loss. You'd take the
> challenge? The judging panel would need to be gender-mixed though.
> 
> On Jun 21, 2006, at 11:28 AM, Chris Fields wrote:
> 
> > Then I'm really a senior guy...
> >
> > ; {
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Wednesday, June 21, 2006 10:25 AM
> >> To: Chris Fields
> >> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal
> >>
> >>
> >> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote:
> >>
> >>> Actually, sounds weird to have me say "senior guys"; I'm 35 years
> >>> old!
> >>
> >> Actually, it doesn't go by age but by the amount of hair you still
> >> have. ;)
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From Bernhard.Schmalhofer at biomax.com  Wed Jun 21 16:25:50 2006
From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer)
Date: Wed, 21 Jun 2006 18:25:50 +0200
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <000301c69544$8d537710$15327e82@pyrimidine>
References: <000301c69544$8d537710$15327e82@pyrimidine>
Message-ID: <4499730E.8090800@biomax.com>

Chris Fields wrote:
> Speaking of Perl6, there was interest here at one point in getting a
> bioperl-experimental going, which at this point in the game should involve
> Perl6.  If there were enough interest in it we could probably get it set up
> via CVS and moving along.  We might need to split the Perl6 stuff from Perl5
> experimental modules in some way to prevent confusion (bioperl6-live???),
> though I'm not up to speed Perl6-wise so I'm not sure about namespace
> collisions and so on.

As far as I understood it, the plan is to have a very smooth migration 
path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When 
new stuff is coming along, or when refactoring is done, you drop in

   use v6;

or

   use v6-pugs;

and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm 
or Audrey Tangs presentation at the Nordic Perl Workshop: 
http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
So I would argue against having a completely seperate Perl6 experimental
repository.

> bioperl-experimental would be, like the name implies, a sort of testing
> ground for ideas (good and bad).  It seemed like it was going to take off a
> few years ago but it lost steam, I'm guess.
> 
> As for your parsers, would you build them from the ground up (i.e. from
> Bio::Root::Root on up)?

I'm just a casual Bio::Perl user and never hacked on any internals. So I 
don't know whether the current Bio::Perl framework is a good fit.

The idea that is floating in my mind is to make a showcase of Perl 6 
parsing, by tackling the various sequences and alignment formats.
So this would involve shopping around for the cleanest parser 
implementations and porting that to Perl6.

Which repository to use is more a question of social engineering.
Are there more Pugs/Perl6 hackers interested in cool biological hacking,
or biologist aching to try out Perl6?

Regards,
   Bernhard Schmalhofer

-- 
**************************************************
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: Bernhard.Schmalhofer at biomax.com
Website: www.biomax.com
**************************************************


From cjfields at uiuc.edu  Wed Jun 21 18:01:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 13:01:02 -0500
Subject: [Bioperl-l] Perl 6 hacking  was   YAPC anyone?
In-Reply-To: <4499730E.8090800@biomax.com>
Message-ID: <000b01c6955c$ad0e6750$15327e82@pyrimidine>

> Chris Fields wrote:
> > Speaking of Perl6, there was interest here at one point in getting a
> > bioperl-experimental going, which at this point in the game should
> involve
> > Perl6.  If there were enough interest in it we could probably get it set
> up
> > via CVS and moving along.  We might need to split the Perl6 stuff from
> Perl5
> > experimental modules in some way to prevent confusion (bioperl6-
> live???),
> > though I'm not up to speed Perl6-wise so I'm not sure about namespace
> > collisions and so on.
> 
> As far as I understood it, the plan is to have a very smooth migration
> path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When
> new stuff is coming along, or when refactoring is done, you drop in
> 
>    use v6;
> 
> or
> 
>    use v6-pugs;
> 
> and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm
> or Audrey Tangs presentation at the Nordic Perl Workshop:
> http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf.
> So I would argue against having a completely seperate Perl6 experimental
> repository.

Makes sense.  I know Pugs is the Perl6 implementation in Haskell but I also
know eventually Parrot will be taking over as the compiler (hopefully).
Perl6 is pretty exciting since it's built to support OOP from the ground up,
unlike the bolted-on OOP for Perl5, and has several other features that make
it very useful (the new way regexes are handled).  I just haven't had time
to play around with it seriously enough.  I may try using Pugs a bit more,
though.

So, as long as Perl5-Perl6 work together a separate repository wouldn't be
necessary.  

> > bioperl-experimental would be, like the name implies, a sort of testing
> > ground for ideas (good and bad).  It seemed like it was going to take
> off a
> > few years ago but it lost steam, I'm guess.
> >
> > As for your parsers, would you build them from the ground up (i.e. from
> > Bio::Root::Root on up)?
>
> I'm just a casual Bio::Perl user and never hacked on any internals. So I
> don't know whether the current Bio::Perl framework is a good fit.
> 
> The idea that is floating in my mind is to make a showcase of Perl 6
> parsing, by tackling the various sequences and alignment formats.
> So this would involve shopping around for the cleanest parser
> implementations and porting that to Perl6.
> 
> Which repository to use is more a question of social engineering.
> Are there more Pugs/Perl6 hackers interested in cool biological hacking,
> or biologist aching to try out Perl6?

I suppose the best way is initially to use a non-bioperl approach using
Perl6, then try working the parsers in using 'use v6-pugs;'.  Bioperl is
heavily object-oriented so the code would probably need to be refactored
from the bottom up (or top down, depending on your view) to fit Perl6.
Having a perl5->perl6 translator helps, though.  And, again, having Perl5
and Perl6 work together helps as well.

Chris

> Regards,
>    Bernhard Schmalhofer
> 
> --
> **************************************************
> Dipl.-Physiker Bernhard Schmalhofer
> Senior Developer
> Biomax Informatics AG
> Lochhamer Str. 11
> 82152 Martinsried, Germany
> Tel: +49 89 895574-839
> Fax: +49 89 895574-825
> eMail: Bernhard.Schmalhofer at biomax.com
> Website: www.biomax.com
> **************************************************


From dwaner at scitegic.com  Wed Jun 21 18:14:00 2006
From: dwaner at scitegic.com (dwaner at scitegic.com)
Date: Wed, 21 Jun 2006 11:14:00 -0700
Subject: [Bioperl-l] EMBL release 87 format changes.
Message-ID: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>

With release 87 of EMBL (June 19th, 2006), there have been some minor 
changes to the flat file record format. In particular, the SV (sequence 
version) tag has been moved from its own line to a field in the ID line. 
See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.

Is somone already working on updating the SeqIO::embl parser, or should I 
volunteer?

- David


From bix at sendu.me.uk  Wed Jun 21 18:23:28 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 21 Jun 2006 19:23:28 +0100
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <002301c694e5$e5f3a750$15327e82@pyrimidine>
References: <002301c694e5$e5f3a750$15327e82@pyrimidine>
Message-ID: <44998EA0.1010406@sendu.me.uk>

Chris Fields wrote:
> I'm working on a new eutilities interface which I hope to commit by late
> summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> generic web database interface, which I call Bio::DB::WebDBI, and the
> EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can query
> NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only
> info like Bio::DB::WebDBSeqI.  
> 
> My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI.
> Does anyone think this will be an issue?

Well, I don't. Sounds good to me. What's the intended relationship 
between WebDBI and EUtilitiesI? Would your work end up in the removal of 
direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just 
convert the code that gets the XML to a one line statement or so?


From cjfields at uiuc.edu  Wed Jun 21 19:00:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 14:00:02 -0500
Subject: [Bioperl-l] EMBL release 87 format changes.
In-Reply-To: <OFB4434EF4.EE311C58-ON88257194.0063AA8F-88257194.00642918@scitegic.com>
Message-ID: <000c01c69564$e68b39b0$15327e82@pyrimidine>

That would be great!  Post a patch/fix via bugzilla:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

and we can add it and test it out.  Or if you have CVS access you can do it
yourself.  Not sure who's taking care of SeqIO::embl at the moment....

Added bit : you'll need to update both next_seq and write_seq.  next_seq
should probably handle both old and new EMBL format and write_seq should
only write new format (unless someone else disagrees???)

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of dwaner at scitegic.com
> Sent: Wednesday, June 21, 2006 1:14 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] EMBL release 87 format changes.
> 
> With release 87 of EMBL (June 19th, 2006), there have been some minor
> changes to the flat file record format. In particular, the SV (sequence
> version) tag has been moved from its own line to a field in the ID line.
> See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html.
> 
> Is somone already working on updating the SeqIO::embl parser, or should I
> volunteer?
> 
> - David
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 21 21:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Jun 2006 16:16:38 -0500
Subject: [Bioperl-l] EUtilities interface
In-Reply-To: <44998EA0.1010406@sendu.me.uk>
Message-ID: <001b01c69577$fc7068f0$15327e82@pyrimidine>

> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Wednesday, June 21, 2006 1:23 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] EUtilities interface
> 
> Chris Fields wrote:
> > I'm working on a new eutilities interface which I hope to commit by late
> > summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> > generic web database interface, which I call Bio::DB::WebDBI, and the
> > EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can
> query
> > NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> > pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-
> only
> > info like Bio::DB::WebDBSeqI.
> >
> > My only concern is confusion over names, particularly WebDBI vs.
> WebDBSeqI.
> > Does anyone think this will be an issue?
> 
> Well, I don't. Sounds good to me. What's the intended relationship
> between WebDBI and EUtilitiesI? Would your work end up in the removal of
> direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just
> convert the code that gets the XML to a one line statement or so?

Well, right now all it does is use URI to build queries, submit them to
Entrez Utilities, then grab the response; I've been hacking at it on and off
for a few months now.  It needs some error handling and added methods
(mainly for proxies and handling WebEnv/query_key), though once I have it in
decent enough shape I'll go ahead and add it to CVS.  

Theoretically once the response is returned it can be parsed like any stream
(see WebDBSeqI/NCBIHelper for an idea of how sequences are parsed and
returned using SeqIO).  This should work as long as there is an appropriate
class to handle the data stream and the appropriate 'plugin' to parse the
data into objects; i.e. dbSNP can be handled by ClusterIO::dbSNP, sequences
by SeqIO::genbank/fasta, pubmed by Bio::Biblio::IO::pubmedxml, and so on.
If you don't have an object or want the raw data stream, you could submit a
request using the various eutility (efetch, epost, esearch) and save as raw
format to an output file or STDOUT.  

Here's a rough diagram:

                      |------------------->Bio::DB::DBFetch (EBI
interface)----->plugins for Bio* classes
Bio::Root::Root       |
LWP::UserAgent ------Bio::DB::WebDBI------>Bio::DB::EUtilitiesI (NCBI
interface)----->plugins for Bio* classes
                      |
                      |------------------->others?

You probably don't need a Bio::*IO::plugin for each type; tax data in
Bioperl seems to primarily utilizes the NCBI Tax database, so
Bio::DB::Taxonomy::entrez shouldn't be too hard to adapt to act as a plugin.
Bio::DB::Taxonomy::entrez uses XML::Twig to parse everything into
Bio::Taxonomy::Node objects and is able to retrieve single and multiple ID's
using the same method, though I would probably use XML::SAX instead.  If I
remember correctly there were issues with Bio::DB::Taxonomy that you brought
up...

Chris


From bix at sendu.me.uk  Thu Jun 22 13:28:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Jun 2006 14:28:25 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <44985915.8010607@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk>
Message-ID: <449A9AF9.2000305@sendu.me.uk>

Sendu Bala wrote:
> Some initial changes have been made to some modules in Bio::Map to allow 
> Positions to have a range (Bio::Map::PositionI implements Bio::RangeI).
> (see http://bugzilla.open-bio.org/show_bug.cgi?id=1998)
> 
> Further changes are needed in some remaining Bio::Map modules for this 
> addition to be complete

Range is now done.

The next step is to tidy up all of Bio::Map*, which involves a major 
reimplementation of the whole system (but with no significant API 
change). Basically, the current system is a awkward mix of older 'marker 
has a single position on a map' and new 'markers have multiple positions 
on multiple maps'. This gives us strange things like SimpleMap's 
add_element method which adds a reference to the element to the map 
without the element itself knowing it is now on the map (because it is 
Position that defines what maps an element is on).

The reimplementation will make Position central to the model, allowing 
for lots of other things to work properly without anything becoming 
inconsistent (as is currently the case).

The general tidy up will involve redoing and perhaps even removing 
things. For instance, OrderedPositionWithDistance has never worked so 
will be deleted (with OrderedPosition gaining the distance functionality 
its docs says it already has).

But now is the time to speak up and change my mind if necessary!


From golharam at umdnj.edu  Thu Jun 22 21:05:00 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 22 Jun 2006 17:05:00 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package)
Message-ID: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>

Hi all,

I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
baseml in the PAML package to measure the distances of some non-coding
regions.  

I started with the coding regions, and used the script
bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
something similar for non-coding regions.  However, when I call
Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
meaning matrix was never defined.  

I wanted to find out if anyone on here has done this before or knows a
way to measure substitution frequencies of non-coding regions with the
PAML package.  The documentation with PAML is sparse so I'm not sure how
to interpret its output directly - that's why I'm using Bioperl.  

Hopefully someone can help me before I start digging into the
code...Thanks.

Ryan


From n.haigh at sheffield.ac.uk  Fri Jun 23 06:43:48 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 23 Jun 2006 07:43:48 +0100
Subject: [Bioperl-l] CVS Export
Message-ID: <000001c69690$61afb540$b07f6f58@nathan243dd61f>

I may have asked this previously, but I can?t find the answer to my question
anywhere so I?ll have to ask it again ? sorry.

Is it possible to export files/directories from cvs that have changed
between to tags/branches/head? Specifically, I?d like to export (as I don?t
want the cvs administrative directories) files that have been added to
Bioperl since the 1.4 release.

Cheers
Nath

----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh MPharmacol. Ph.D.
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????????????????????????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 
----------------------------------------------------------------------------
------


From cjfields at uiuc.edu  Fri Jun 23 14:58:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Jun 2006 09:58:24 -0500
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c696d5$7da6c640$15327e82@pyrimidine>

***sounds of crickets***

Ryan,

It's a pretty good possibility that Jason and the rest are on the road to
conferences and such.  There's been mention of a Durham, NC meeting and, of
course, YAPC is happening soon as well.  I wish I could help but I know
diddly about PAML besides the HOWTO on the wiki (though I may be using it
myself soon).  Sorry, you may have to be a bit patient for a more productive
response.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Thursday, June 22, 2006 4:05 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
> package)
> 
> Hi all,
> 
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
> 
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
> 
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not sure how
> to interpret its output directly - that's why I'm using Bioperl.
> 
> Hopefully someone can help me before I start digging into the
> code...Thanks.
> 
> Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Fri Jun 23 18:27:19 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Jun 2006 13:27:19 -0500
Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output
Message-ID: <CED81D34E37D5043A1211565277A51E50563FC85@exchkc02.stowers-institute.org>

Guy,

I've just downloaded and installed your latest 1.1.0 version of
exonerate but unfortunately did not find any mention in the ChangeLog of
addressing this bug, though I still see in the TODO:

    o Should GFF show all coordinates on the +ve strand? (jason_p2g eg)

I was half expecting to see this fixed in this version based on this old
thread.  

Can you please confirm that it has not yet been addressed, and accept my
request that you continue to keep this change on your list for future
versions...

Also, might you elaborate on this entry from the ChangeLog.  I don't see
it mentioned in the manpage.

    o Added %tcs etc to --ryo for dumping coding sequences 

Thanks,

Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at portal.open-bio.org 
>[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Guy Slater
>Sent: Friday, September 02, 2005 11:52 AM
>To: Cook, Malcolm
>Cc: bioperl-l
>Subject: RE: [Bioperl-l] methods, etc. for Bio::SearchIO on 
>exonerate output
>
>On Fri, 2 Sep 2005, Cook, Malcolm wrote:
>
>> Hmmmm - I'd better get some clarification from Guy too.  
>>  
>> Guy, if you don't mind reading the thread below and chiming in on our
>> discussion of interpreting the output of your excellent exonerate
>> program:
>>  
>> The sections of the manpage (
>> <http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html>
>> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear
>> relevant are these 2 excerpts:
>>  
>>  1) When an alignment is reported on the reverse complement of a
>> sequence, the coordinates are simply given on the reverse complement
>> copy of the sequence. Hence positions on the sequences are never
>> negative. Generally, the forward strand is indicated by '+', 
>the reverse
>> strand by '-', and an unknown or not-applicable strand (as 
>in the case
>> of a protein sequence) is indicated by '.' "
>>  
>> 2)  --forwardcoordinates <boolean> By default, all coordinates are
>> reported on the forward strand. Setting this option to false 
>reverts to
>> the old behaviour (pre-0.8.3) whereby alignments on the reverse
>> complement of a sequence are reported using coordinates on 
>the reverse
>> complement. 
>>  
>> We see GFF DUMP coordinates still reported on the reverse stand
>> regardless of the setting of --forwardcoordinates.  So these two
>> excerpts from you manpage seem contradictory to me.     Unless I
>> understand `--forwardcoordinates FALSE` to only effect the 
>coordinates
>> reported in the alignment section, not in the GFF DUMP 
>section, which is
>> what it appears to do in practice.
>>  
>> Guy, can you confirm that the --forwardcoordinates option 
>has no effect
>> on GFF output?
>>  
>
>Hi,
>
>Yes, it has no effect, and this is a bug
>(sorry - it was due to my misinterpretation of the GFF2 spec)
>- its on the list of things to be fixed for exonerate 1.1 (soon)
>
>> Further, can you tell us if you plan to comport more closely 
>to the GFF
>> spec, in particular in this case by reporting 
>forwardcoordinates in the
>> GFF DUMP section too?   I see 
>> I see in your TODO list "    o Should GFF show all coordinates on the
>> +ve strand? (jason_p2g eg)".  Hear hear!  I second the motion.
>>  
>> And TODO item " GFF3 support ? http://song.sf.net/" gets my 
>vote too....
>> though this is more of a sticky wicket....
>>  
>
>Yup, GFF3 support is on the list,
>but probably it will not be done in time for exonerate 1.1
>Of course, I'd welcome a patch ...    ;)
>
>(I'm mainly working on getting the cdna2genome
> and genome2genome models working properly for 1.1)
>
>Cheers,
>
>Guy.
>
>> Cheers and Thanks!
>>  
>> Malcolm Cook
>>  
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Friday, September 02, 2005 9:46 AM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> I've already talked to Guy about some of this and I assume 
>fixes will be
>> part of the next release, but it can't hurt to have more people
>> requesting.  The main problem right now is reverse strand hits in GFF
>> output are still screwed up even if you provide the 
>--forwardcoordinates
>> option. 
>> 
>> If someone wanted to write/donate a VULGAR to GFF subroutine (okay
>> VULGAR to a list of Bio::Search::HSP::GenericHSP).  We can also
>> reconstruct everything needed from that, I gave a stab at it 
>once, but
>> there was something missing (or maybe it was pre --forwardcoordinates
>> option).   
>> 
>> 
>> -jason 
>> 
>> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:
>> 
>> 
>> Jason,
>>  
>> Thanks for the scripts and clues (esp re: using the --ryo option to
>> inject the needed length into the exonerate output to compensate).
>>  
>> I'm considering asking exonerate author to comport with GFF spec.  Do
>> you think this is a road to take?
>>  
>> Cheers,
>>  
>> Malcolm
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Wednesday, August 31, 2005 12:35 PM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> 
>http://fungal.genome.duke.edu/~jes12/software/scripts/process_e
>xonerate_
>> gff3.pl
>> 
>> You may still want to massage it some, but I use the script in this
>> basic form, maybe with a few tweaks:
>> 
>> Note that it requires you to run exonerate with specific 
>--ryo options
>> so that it includes the length of the query and hit sequences in the
>> report output. should be covered in the perldoc in the script.
>> 
>> Without the ryo options enabled,  you'll need to modify the 
>script more
>> to have access to the original sequence db, use 
>Bio::DB::Fasta,  and put
>> in some $dbh->length($seqid) calls instead.
>> 
>> I don't think the part which writes HSP/match lines is 
>actually correct
>> - it is trying to roll gapped HSPs from the similarity features. 
>> 
>> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse
>> instance and/or grepping out the lines I really think I need.  
>> You may want to s/exon/CDS/ for the protein2genome output as well.
>> 
>> -jason
>> 
>> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
>> 
>> 
>> Jason, 
>> 
>> This message is in regards to an old thread  in which you offered to
>> shared a 'script for munging over' exonerate output for lading in
>> DB::GFF (c.f.
>> <http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html>
>> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)
>> 
>> Would you be willing to still share that script, if you've got it
>> around? 
>> 
>> Thanks, and regards, 
>> 
>> Malcolm Cook -  <mailto:mec at stowers-institute.org>
>> mec at stowers-institute.org - 816-926-4449
>> Database Applications Manager - Bioinformatics
>> Stowers Institute for Medical Research - Kansas City, MO  USA
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>
>-- 
>%!PS % <------ Guy St.C. Slater ------> 
>http://www.ebi.ac.uk/~guy/  <------
>210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 
>12/d{exch moveto}
>a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 
>f}a/h{setlinewidth newpath dup
>g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 
>0 108 arc d e
>18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From oldham at ucla.edu  Fri Jun 23 16:18:39 2006
From: oldham at ucla.edu (Michael Oldham)
Date: Fri, 23 Jun 2006 09:18:39 -0700
Subject: [Bioperl-l] Output a subset of FASTA data from a single
	largefile
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563F9F5@exchkc02.stowers-institute.org>
Message-ID: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>

Hello again,

I finally got it to work, using the following script.  However, it takes
about 5 hours to run on a fast computer.  Using grep (in bash), on the
other hand, takes about 5 minutes (see below if you are interested).
Thanks to everyone for your help!

SLOW perl script:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_all_X';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
print @ID;
chomp @ID;

while (my $line = <PROBES>) {
	foreach my $identifier (@ID) {
		if($line=~/^>probe:\w+:$identifier:/) {
				print OUT $line;
				print OUT scalar(<PROBES>);
		}
	}
}
exit;


FAST bash script:

#!/usr/bin/bash
exec<"ID_all_X"
while read line; do
	echo $line
	grep -A 1 :$line: HG_U95Av2_probe_fasta >>myresults.txt
done


-----Original Message-----
From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Wednesday, June 14, 2006 6:48 AM
To: Michael Oldham; Chris Fields
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
largefile


Did you try my one-liner?

Anyway, try this
	1) predeclare $idmatch before the while loop
	2) use ` select OUT` and print with no args to get $_ into it

like this:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_dat.txt';

unless (open(IDFILE, $IDs)) {
	print "Could not open file $IDs!\n";
	}

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
	print "Could not open file $probes!\n";
	}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
select OUT;

my @ID = <IDFILE>;
chomp @ID;
my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
and all values=1.
my $idmatch;
	while (<PROBES>) {
		$idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
		if ($idmatch){
			print ;
		}
	}
exit;


>-----Original Message-----
>From: Michael Oldham [mailto:oldham at ucla.edu]
>Sent: Tuesday, June 13, 2006 9:03 PM
>To: Cook, Malcolm; Chris Fields
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>single largefile
>
>Dear Malcolm, Chris, et al,
>
>Thanks to everyone for your helpful suggestions.  When I run the code
>below using an ID list ('ID_dat.txt') containing all 8175 IDs, the
>output file is still blank.  If I replace this list with a single ID
>("542_at"), it works:
>
>>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense;
>GCGCAGCAGCGAGAATTTCGACGAG
>>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense;
>GAATTTCGACGAGCTGCTGAAGGCA
>>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense;
>CGACGAGCTGCTGAAGGCACTGGGT
>........etc.
>
>If I try a list of two IDs ("542_at" and "31799_at"), only the last one
>is present in the output:
>
>>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086;
>Antisense;
>GTTCATCACAAATCTATTGTGCTTG
>>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126;
>Antisense;
>GTCCACTAAATGTAGTAACGAAATG
>>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127;
>Antisense;
>TCCACTAAATGTAGTAACGAAATGT
>........etc.
>
>The same thing seems to happen if I go to 3 IDs, or 4 IDs
>(only the last
>ID is present in the output file).  At this point I have no idea why
>this is happening, and I am not sure how to interpret
>Malcolm's comment:
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>Any ideas?  Thanks again................!
>
>Mike O.
>
>
>#!/usr/bin/perl -w
>
>use strict;
>
>my $IDs = 'ID_dat.txt';
>
>unless (open(IDFILE, $IDs)) {
>	print "Could not open file $IDs!\n";
>	}
>
>my $probes = 'HG_U95Av2_probe_fasta.txt';
>
>unless (open(PROBES, $probes)) {
>	print "Could not open file $probes!\n";
>	}
>
>open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
>my @ID = <IDFILE>;
>chomp @ID;
>my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs
>and all values=1.
>
>	while (<PROBES>) {
>		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>		if ($idmatch){
>			print OUT;
>			print OUT scalar(<PROBES>);
>		}
>	}
>exit;
>
>
>-----Original Message-----
>From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>Sent: Monday, June 12, 2006 8:48 AM
>To: Cook, Malcolm; Chris Fields; Michael Oldham
>Cc: bioperl-l at lists.open-bio.org
>Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single
>largefile
>
>
>oops,
>
>s/matches on of/matches one of/
>s/nothing that/noting that/
>
>--Malcolm
>
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>Cook, Malcolm
>>Sent: Monday, June 12, 2006 10:29 AM
>>To: Chris Fields; Michael Oldham
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>single largefile
>>
>>Michael,
>>
>>I don't think you can call perl's `print` on just a filehandle as you
>>are doing.  This is probably your problem.
>>
>>If you call `select OUT` after opeining it, print will print $_ to it.
>>And, every line in the fasta record whose header matches on of the IDS
>>will get printed, not just the fasta header lines.  Read the
>code again
>>nothing that $idmatch is only getting reset when a correctly formatted
>>fasta header line is matched.
>>
>>--Malcolm
>>
>>
>>>-----Original Message-----
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Saturday, June 10, 2006 11:32 PM
>>>To: Michael Oldham
>>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org
>>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a
>>>single large file
>>>
>>>What happens if you just print $idmatch or $1 (i.e. check to see if
>>>the regex matches anything)?  If there is nothing printed
>>then either
>>>the regex isn't working as expected or there is something logically
>>>wrong.  The problem may be that the captured string must
>>match the id
>>>exactly, the id being the key to the %ID hash; any extra characters
>>>picked up by the regex outside of your id key and you will not get
>>>anything.  Looking at Malcolm's regex it should work just fine, but
>>>we only had one example sequence to try here.
>>>
>>>If your while loop is set up like this won't it only print only the
>>>matched description lines to the outfile (no sequence) even if there
>>>is a match?  Or is this what you wanted?   If you want the sequence
>>>you should add 'print OUT <PROBES>;' after the 'print OUT;' line.
>>>
>>>Chris
>>>
>>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote:
>>>
>>>> Thanks to everyone for their helpful advice.  I think I am getting
>>>> closer,
>>>> but no cigar quite yet.  The script below runs quickly with no
>>>> errors--but
>>>> the output file is empty.  It seems that the problem must lie
>>>> somewhere in
>>>> the 'while' loop, and I'm sure it's quite obvious to a more
>>>> experienced
>>>> eye--but not to mine!  Any suggestions?  Thanks again for
>your help.
>>>>
>>>> --Mike O.
>>>>
>>>>
>>>> #!/usr/bin/perl -w
>>>>
>>>> use strict;
>>>>
>>>> my $IDs = 'ID.dat.txt';
>>>>
>>>> unless (open(IDFILE, $IDs)) {
>>>> 	print "Could not open file $IDs!\n";
>>>> 	}
>>>>
>>>> my $probes = 'HG_U95Av2_probe_fasta.txt';
>>>>
>>>> unless (open(PROBES, $probes)) {
>>>> 	print "Could not open file $probes!\n";
>>>> 	}
>>>>
>>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>>>>
>>>> my @ID = <IDFILE>;
>>>> chomp @ID;
>>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with
>>>> keys=PSIDs and
>>>> all values=1.
>>>>
>>>> 	while (<PROBES>) {
>>>> 		my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
>>>> 		if ($idmatch){
>>>> 			print OUT;
>>>> 		}
>>>> 	}
>>>> exit;
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
>>>> Sent: Friday, June 09, 2006 7:58 AM
>>>> To: Michael Oldham; bioperl-l at lists.open-bio.org
>>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a
>>>> single large
>>>> file
>>>>
>>>>
>>>>
>>>> I wouldn't bioperl for this, or create an index.  Perl would do
>>>> fine and
>>>> probably be faster.
>>>>
>>>> Assuming your ids are one per line in a file named id.dat
>>>looking like
>>>> this
>>>>
>>>> 1138_at
>>>> 1134_at
>>>> etc..
>>>>
>>>> this should work:
>>>>
>>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID =
>>>> <idfile>; chomp @ID; %ID = map {($_, 1)} @ID;}  $inmatch =
>>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat
>>>> mybigfile.fa
>>>>
>>>> good luck
>>>>
>>>> --Malcolm Cook
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>> Michael Oldham
>>>>> Sent: Thursday, June 08, 2006 9:08 PM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a
>>>>> single large file
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am a total Bioperl newbie struggling to accomplish a
>>>>> conceptually simple
>>>>> task.  I have a single large fasta file containing about 200,000
>>>>> probe
>>>>> sequences (from an Affymetrix microarray), each of which looks
>>>>> like this:
>>>>>
>>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>>>>> Antisense;
>>>>> TGGCTCCTGCTGAGGTCCCCTTTCC
>>>>>
>>>>> What I would like to do is extract from this file a subset of
>>>>> ~130,800
>>>>> probes (both the header and the sequence) and output this
>>>>> subset into a new
>>>>> fasta file.  These 130,800 probes correspond to 8,175
>probe set IDs
>>>>> ("1138_at" is the probe set ID in the header listed above); I
>>>>> have these
>>>>> 8,175 IDs listed in a separate file.  I *think* that I managed
>>>>> to create an
>>>>> index of all 200,000 probes in the original fasta file using
>>>>> the following
>>>>> script:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> # script 1: create the index
>>>>>
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> my $Index_File_Name = shift;
>>>>> my $inx = Bio::Index::Fasta->new(
>>>>>     -filename => $Index_File_Name,
>>>>>     -write_flag => 1);
>>>>> $inx->make_index(@ARGV);
>>>>>
>>>>> I'm not sure if this is the most sensible approach, and even
>>>>> if it is, I'm
>>>>> not sure what to do next.  Any help would be greatly appreciated!
>>>>>
>>>>> Many thanks,
>>>>> Mike O.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG Free Edition.
>>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>>> 6/8/2006
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date:
>>>> 6/8/2006
>>>>
>>>> --
>>>> No virus found in this outgoing message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date:
>>>> 6/9/2006
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date:
>6/11/2006
>
>--
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date:
>6/13/2006
>
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.9.2/373 - Release Date: 6/22/2006


From pmiguel at purdue.edu  Sat Jun 24 14:17:46 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 10:17:46 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <449D498A.9020107@purdue.edu>

Brian Osborne wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>   
I would be very disappointed to lose one part of bptutorial.pl--this was 
described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
only purpose I've ever used bptutorial.pl for--to find all the methods 
available to any given object. Eg:

bptutorial.pl 100 Bio::PrimarySeq

 ***Methods for Object Bio::PrimarySeq ********


 Methods taken from package Bio::IdentifiableI
 lsid_string   namespace_string

 Methods taken from package Bio::PrimarySeq
 accession   accession_number   alphabet   authority   can_call_new   desc
 description   direct_seq_set   display_id   display_name   id   is_circular
 length   namespace   new   object_id   primary_id   seq
 subseq   validate_seq   version

 Methods taken from package Bio::PrimarySeqI
 moltype   revcom   translate   trunc

 Methods taken from package Bio::Root::Root
 DESTROY   confess   debug   throw   verbose

 Methods taken from package Bio::Root::RootI
 carp   deprecated   stack_trace   stack_trace_dump   
throw_not_implemented   warn
 warn_not_implemented


Phillip SanMiguel


From sdavis2 at mail.nih.gov  Sat Jun 24 14:45:52 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 24 Jun 2006 10:45:52 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a singlelargefile
References: <KOEGJJHCCKFLICFGFLKDMEDFCKAA.oldham@ucla.edu>
Message-ID: <001a01c6979c$ff576dd0$6501a8c0@WATSON>


----- Original Message ----- 
From: "Michael Oldham" <oldham at ucla.edu>
To: "Cook, Malcolm" <MEC at stowers-institute.org>; "Chris Fields" 
<cjfields at uiuc.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Friday, June 23, 2006 12:18 PM
Subject: Re: [Bioperl-l] Output a subset of FASTA data from a 
singlelargefile


> Hello again,
>
> I finally got it to work, using the following script.  However, it takes
> about 5 hours to run on a fast computer.  Using grep (in bash), on the
> other hand, takes about 5 minutes (see below if you are interested).
> Thanks to everyone for your help!
>
> SLOW perl script:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID_all_X';
>
> unless (open(IDFILE, $IDs)) {
> print "Could not open file $IDs!\n";
> }
>
> my $probes = 'HG_U95Av2_probe_fasta';
>
> unless (open(PROBES, $probes)) {
> print "Could not open file $probes!\n";
> }
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> print @ID;
> chomp @ID;
>
> while (my $line = <PROBES>) {
> foreach my $identifier (@ID) {
> if($line=~/^>probe:\w+:$identifier:/) {
> print OUT $line;
> print OUT scalar(<PROBES>);
> }
> }
> }

This could probably be done MUCH faster using a hash on the sequence 
identifier.  (I have to admit that I didn't follow the first part of this 
conversation, so I could be misunderstanding some part of what you are 
trying to do.)  If you have a couple hundred-thousand sequences, my guess is 
that it could be done in under 30 seconds, but I could be wrong about the 
exact time.  The important part is to make a hash of your sequences with the 
key being the $identifier.  Then, loop through your @ID array doing 
something like (untested):

#open files as before and read in @ID as before

my %seq_hash;

while (my $line = <PROBES>) {
    if ($line =~/^>probe:\w+:$identifier:/) {
        $seq_hash{$identifier}=<PROBES>;
    }
}

foreach my $id (@ID) {
    print OUT ">$id\n" . $seq_hash{$id};
}


From arareko at campus.iztacala.unam.mx  Sat Jun 24 15:27:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 10:27:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D498A.9020107@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>
	<449D498A.9020107@purdue.edu>
Message-ID: <449D59C7.4030008@campus.iztacala.unam.mx>

Hi Philip,

Have you tried the Deobfuscator interface? It's a newer and better way 
to browse all the methods available in BioPerl:

http://bioperl.org/wiki/Deobfuscator
http://bioperl.org/cgi-bin/deob_interface.cgi

Regards,
Mauricio.

Phillip SanMiguel wrote:
> Brian Osborne wrote:
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>> don't want to have to maintain two bptutorials.
>>   
> I would be very disappointed to lose one part of bptutorial.pl--this was 
> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
> only purpose I've ever used bptutorial.pl for--to find all the methods 
> available to any given object. Eg:
> 
> bptutorial.pl 100 Bio::PrimarySeq
> 
>  ***Methods for Object Bio::PrimarySeq ********
> 
> 
>  Methods taken from package Bio::IdentifiableI
>  lsid_string   namespace_string
> 
>  Methods taken from package Bio::PrimarySeq
>  accession   accession_number   alphabet   authority   can_call_new   desc
>  description   direct_seq_set   display_id   display_name   id   is_circular
>  length   namespace   new   object_id   primary_id   seq
>  subseq   validate_seq   version
> 
>  Methods taken from package Bio::PrimarySeqI
>  moltype   revcom   translate   trunc
> 
>  Methods taken from package Bio::Root::Root
>  DESTROY   confess   debug   throw   verbose
> 
>  Methods taken from package Bio::Root::RootI
>  carp   deprecated   stack_trace   stack_trace_dump   
> throw_not_implemented   warn
>  warn_not_implemented
> 
> 
> Phillip SanMiguel
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From golharam at umdnj.edu  Sat Jun 24 14:43:29 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 10:43:29 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>
Message-ID: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>

I've managed to code three methods to calculate K into a perl script
using the algorithms as described in "Molecular Evolution" by Wen-Hsuing
Li.   I'd be happy to contribute it as a script...


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:40 AM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from 
> baseml in the PAML package to measure the distances of some non-coding

> regions.
>
> I started with the coding regions, and used the script 
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do 
> something similar for non-coding regions.  However, when I call 
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' 
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a

> way to measure substitution frequencies of non-coding regions with the

> PAML package.  The documentation with PAML is sparse so I'm not
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the 
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 16:59:21 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 12:59:21 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D59C7.4030008@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>
	<449D59C7.4030008@campus.iztacala.unam.mx>
Message-ID: <449D6F69.1090104@purdue.edu>

Yes I have. It is very useful.
But in situations where I don't have web access? Or I am working with 
Bioperl 1.5?

Mauricio Herrera Cuadra wrote:
> Hi Philip,
>
> Have you tried the Deobfuscator interface? It's a newer and better way 
> to browse all the methods available in BioPerl:
>
> http://bioperl.org/wiki/Deobfuscator
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Regards,
> Mauricio.
>
> Phillip SanMiguel wrote:
>   
>> Brian Osborne wrote:
>>     
>>> Jay,
>>>
>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>
>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>> don't want to have to maintain two bptutorials.
>>>   
>>>       
>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>> available to any given object. Eg:
>>
>> bptutorial.pl 100 Bio::PrimarySeq
>>
>>  ***Methods for Object Bio::PrimarySeq ********
>>
>>
>>  Methods taken from package Bio::IdentifiableI
>>  lsid_string   namespace_string
>>
>>  Methods taken from package Bio::PrimarySeq
>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>  length   namespace   new   object_id   primary_id   seq
>>  subseq   validate_seq   version
>>
>>  Methods taken from package Bio::PrimarySeqI
>>  moltype   revcom   translate   trunc
>>
>>  Methods taken from package Bio::Root::Root
>>  DESTROY   confess   debug   throw   verbose
>>
>>  Methods taken from package Bio::Root::RootI
>>  carp   deprecated   stack_trace   stack_trace_dump   
>> throw_not_implemented   warn
>>  warn_not_implemented
>>
>>
>> Phillip SanMiguel
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   


From arareko at campus.iztacala.unam.mx  Sat Jun 24 17:35:54 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 24 Jun 2006 12:35:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D6F69.1090104@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
Message-ID: <449D77FA.70103@campus.iztacala.unam.mx>

Currently I'm modifying the Deobfuscator so it'd be capable of browsing 
the different BioPerl packages as well as their respective releases, but 
haven't got many spare time to finish it :(

Dave and I committed the Deobfuscator into the bioperl-live source tree 
(in /doc directory), so it'd be included in future releases of BioPerl. 
I'm also working on a command line version which won't need a CGI 
environment to have the same functionality, this would address the web 
access situation that you mention.

Phillip SanMiguel wrote:
> Yes I have. It is very useful.
> But in situations where I don't have web access? Or I am working with 
> Bioperl 1.5?
> 
> Mauricio Herrera Cuadra wrote:
>> Hi Philip,
>>
>> Have you tried the Deobfuscator interface? It's a newer and better way 
>> to browse all the methods available in BioPerl:
>>
>> http://bioperl.org/wiki/Deobfuscator
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> Regards,
>> Mauricio.
>>
>> Phillip SanMiguel wrote:
>>   
>>> Brian Osborne wrote:
>>>     
>>>> Jay,
>>>>
>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>
>>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
>>>> don't want to have to maintain two bptutorials.
>>>>   
>>>>       
>>> I would be very disappointed to lose one part of bptutorial.pl--this was 
>>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the 
>>> only purpose I've ever used bptutorial.pl for--to find all the methods 
>>> available to any given object. Eg:
>>>
>>> bptutorial.pl 100 Bio::PrimarySeq
>>>
>>>  ***Methods for Object Bio::PrimarySeq ********
>>>
>>>
>>>  Methods taken from package Bio::IdentifiableI
>>>  lsid_string   namespace_string
>>>
>>>  Methods taken from package Bio::PrimarySeq
>>>  accession   accession_number   alphabet   authority   can_call_new   desc
>>>  description   direct_seq_set   display_id   display_name   id   is_circular
>>>  length   namespace   new   object_id   primary_id   seq
>>>  subseq   validate_seq   version
>>>
>>>  Methods taken from package Bio::PrimarySeqI
>>>  moltype   revcom   translate   trunc
>>>
>>>  Methods taken from package Bio::Root::Root
>>>  DESTROY   confess   debug   throw   verbose
>>>
>>>  Methods taken from package Bio::Root::RootI
>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>> throw_not_implemented   warn
>>>  warn_not_implemented
>>>
>>>
>>> Phillip SanMiguel
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>     
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Sat Jun 24 13:39:56 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 09:39:56 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
References: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1>
Message-ID: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org>

baseml is not well-supported to my knowledge - I think I started with  
attempt to capture a small amount of the data in the file.  There are  
some people who have made modifications to possible parse it in-house  
but I know of no submitted patches.   Many of the knowledgeable  
people are probably at the evolution meetings  this week.

I have no idea about the full set of information in the report files  
without going back to the Yang papers first.   It depends on how much  
of that information you really want to capture of just the  
substitution rates.

I'm Ccing Alisha in case she has ideas/solutions from her drosophila  
work+PAML.

-jason
On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:

> Hi all,
>
> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
> baseml in the PAML package to measure the distances of some non-coding
> regions.
>
> I started with the coding regions, and used the script
> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
> something similar for non-coding regions.  However, when I call
> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
> meaning matrix was never defined.
>
> I wanted to find out if anyone on here has done this before or knows a
> way to measure substitution frequencies of non-coding regions with the
> PAML package.  The documentation with PAML is sparse so I'm not  
> sure how
> to interpret its output directly - that's why I'm using Bioperl.
>
> Hopefully someone can help me before I start digging into the
> code...Thanks.
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From pmiguel at purdue.edu  Sat Jun 24 17:48:15 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 13:48:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D77FA.70103@campus.iztacala.unam.mx>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
Message-ID: <449D7ADF.3030604@purdue.edu>


Yes, that would be better than bptutorial.pl 100 then. For some modules 
bptutorial.pl 100 doesn't seem to give any of the methods they have 
access to. Whereas the deobfuscator does.

Mauricio Herrera Cuadra wrote:
> Currently I'm modifying the Deobfuscator so it'd be capable of 
> browsing the different BioPerl packages as well as their respective 
> releases, but haven't got many spare time to finish it :(
>
> Dave and I committed the Deobfuscator into the bioperl-live source 
> tree (in /doc directory), so it'd be included in future releases of 
> BioPerl. I'm also working on a command line version which won't need a 
> CGI environment to have the same functionality, this would address the 
> web access situation that you mention.
>
> Phillip SanMiguel wrote:
>> Yes I have. It is very useful.
>> But in situations where I don't have web access? Or I am working with 
>> Bioperl 1.5?
>>
>> Mauricio Herrera Cuadra wrote:
>>> Hi Philip,
>>>
>>> Have you tried the Deobfuscator interface? It's a newer and better 
>>> way to browse all the methods available in BioPerl:
>>>
>>> http://bioperl.org/wiki/Deobfuscator
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Phillip SanMiguel wrote:
>>>  
>>>> Brian Osborne wrote:
>>>>    
>>>>> Jay,
>>>>>
>>>>> Excellent! Now we need to answer a few more questions for ourselves:
>>>>>
>>>>> - Do we remove the file bptutorial.pl from the package now? I'd 
>>>>> say yes, we
>>>>> don't want to have to maintain two bptutorials.
>>>>>         
>>>> I would be very disappointed to lose one part of 
>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for 
>>>> Bioinformatics_. It is the only purpose I've ever used 
>>>> bptutorial.pl for--to find all the methods available to any given 
>>>> object. Eg:
>>>>
>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>
>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>
>>>>
>>>>  Methods taken from package Bio::IdentifiableI
>>>>  lsid_string   namespace_string
>>>>
>>>>  Methods taken from package Bio::PrimarySeq
>>>>  accession   accession_number   alphabet   authority   
>>>> can_call_new   desc
>>>>  description   direct_seq_set   display_id   display_name   id   
>>>> is_circular
>>>>  length   namespace   new   object_id   primary_id   seq
>>>>  subseq   validate_seq   version
>>>>
>>>>  Methods taken from package Bio::PrimarySeqI
>>>>  moltype   revcom   translate   trunc
>>>>
>>>>  Methods taken from package Bio::Root::Root
>>>>  DESTROY   confess   debug   throw   verbose
>>>>
>>>>  Methods taken from package Bio::Root::RootI
>>>>  carp   deprecated   stack_trace   stack_trace_dump   
>>>> throw_not_implemented   warn
>>>>  warn_not_implemented
>>>>
>>>>
>>>> Phillip SanMiguel
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>   
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From jason at bioperl.org  Sat Jun 24 18:42:57 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 14:42:57 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
References: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>

You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sat Jun 24 19:07:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 14:07:06 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <449D7ADF.3030604@purdue.edu>
References: <C0A31929.89F9%osborne1@optonline.net>	<449D498A.9020107@purdue.edu>	<449D59C7.4030008@campus.iztacala.unam.mx>
	<449D6F69.1090104@purdue.edu>
	<449D77FA.70103@campus.iztacala.unam.mx>
	<449D7ADF.3030604@purdue.edu>
Message-ID: <EF5998FD-BA4F-439C-873E-71E55DBA0F4D@uiuc.edu>

As a quickie method I use the script from the FAQ; you have to  
install Class::Inspector:

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector->methods 
($class,'full','public')}), "\n";

Works well, though doesn't have the links and so on like  
Deobfuscator; I use HTML-generated ActiveState docs:

glaciers-115 chris$ methods.pl Bio::SeqIO
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::debug
Bio::Root::Root::except
Bio::Root::Root::finally
Bio::Root::Root::otherwise
Bio::Root::Root::throw
Bio::Root::Root::try
Bio::Root::Root::verbose
Bio::Root::Root::with
Bio::Root::RootI::carp
Bio::Root::RootI::confess
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::SeqIO::DESTROY
Bio::SeqIO::PRINT
Bio::SeqIO::READLINE
Bio::SeqIO::TIEHANDLE
Bio::SeqIO::alphabet
Bio::SeqIO::fh
Bio::SeqIO::location_factory
Bio::SeqIO::new
Bio::SeqIO::newFh
Bio::SeqIO::next_seq
Bio::SeqIO::object_factory
Bio::SeqIO::sequence_builder
Bio::SeqIO::sequence_factory
Bio::SeqIO::write_seq


Chris

On Jun 24, 2006, at 12:48 PM, Phillip SanMiguel wrote:

>
> Yes, that would be better than bptutorial.pl 100 then. For some  
> modules
> bptutorial.pl 100 doesn't seem to give any of the methods they have
> access to. Whereas the deobfuscator does.
>
> Mauricio Herrera Cuadra wrote:
>> Currently I'm modifying the Deobfuscator so it'd be capable of
>> browsing the different BioPerl packages as well as their respective
>> releases, but haven't got many spare time to finish it :(
>>
>> Dave and I committed the Deobfuscator into the bioperl-live source
>> tree (in /doc directory), so it'd be included in future releases of
>> BioPerl. I'm also working on a command line version which won't  
>> need a
>> CGI environment to have the same functionality, this would address  
>> the
>> web access situation that you mention.
>>
>> Phillip SanMiguel wrote:
>>> Yes I have. It is very useful.
>>> But in situations where I don't have web access? Or I am working  
>>> with
>>> Bioperl 1.5?
>>>
>>> Mauricio Herrera Cuadra wrote:
>>>> Hi Philip,
>>>>
>>>> Have you tried the Deobfuscator interface? It's a newer and better
>>>> way to browse all the methods available in BioPerl:
>>>>
>>>> http://bioperl.org/wiki/Deobfuscator
>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>
>>>> Regards,
>>>> Mauricio.
>>>>
>>>> Phillip SanMiguel wrote:
>>>>
>>>>> Brian Osborne wrote:
>>>>>
>>>>>> Jay,
>>>>>>
>>>>>> Excellent! Now we need to answer a few more questions for  
>>>>>> ourselves:
>>>>>>
>>>>>> - Do we remove the file bptutorial.pl from the package now? I'd
>>>>>> say yes, we
>>>>>> don't want to have to maintain two bptutorials.
>>>>>>
>>>>> I would be very disappointed to lose one part of
>>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for
>>>>> Bioinformatics_. It is the only purpose I've ever used
>>>>> bptutorial.pl for--to find all the methods available to any given
>>>>> object. Eg:
>>>>>
>>>>> bptutorial.pl 100 Bio::PrimarySeq
>>>>>
>>>>>  ***Methods for Object Bio::PrimarySeq ********
>>>>>
>>>>>
>>>>>  Methods taken from package Bio::IdentifiableI
>>>>>  lsid_string   namespace_string
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeq
>>>>>  accession   accession_number   alphabet   authority
>>>>> can_call_new   desc
>>>>>  description   direct_seq_set   display_id   display_name   id
>>>>> is_circular
>>>>>  length   namespace   new   object_id   primary_id   seq
>>>>>  subseq   validate_seq   version
>>>>>
>>>>>  Methods taken from package Bio::PrimarySeqI
>>>>>  moltype   revcom   translate   trunc
>>>>>
>>>>>  Methods taken from package Bio::Root::Root
>>>>>  DESTROY   confess   debug   throw   verbose
>>>>>
>>>>>  Methods taken from package Bio::Root::RootI
>>>>>  carp   deprecated   stack_trace   stack_trace_dump
>>>>> throw_not_implemented   warn
>>>>>  warn_not_implemented
>>>>>
>>>>>
>>>>> Phillip SanMiguel
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pmiguel at purdue.edu  Sat Jun 24 19:37:08 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sat, 24 Jun 2006 15:37:08 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <449D9464.6030508@purdue.edu>

Here is an example bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1682

It was a bug fixed in a module in BioPerl 1.4  back in October of 2004. 
The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the 
module. However the version of the module currently available from CPAN 
is 1.6. (That is the current "stable" release, BioPerl 1.4.0)

I've written a script that relies on that bug being fixed. How should I 
deal with this when I want to give the script to others to use? Just 
tell them "You must have BioPerl 1.5 installed". Give them instructions 
for patching the module code?

How long before the next "stable" release? Maybe a year? Should not a 
BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or 
would that be very difficult?

By the way, I think the revision graph viewer is great for someone, at 
best, peripherally involved in BioPerl to figure out which module 
version is associated with which BioPerl version, for example:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/QualI.pm?graph=1
Phillip SanMiguel


From golharam at umdnj.edu  Sat Jun 24 18:57:52 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 14:57:52 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <C27D6CF7-4CD0-4A31-B23B-0EAE425EEF01@bioperl.org>
Message-ID: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>

Hi Jason,

It looks like DNAStatistics is only for coding sequences.  I'm trying to
calculate the Ks of exons and the K (or Ki) of introns.  All the methods
in bioperl are based on coding sequences.  Only the  PAUP package (that
I've found) does non-coding sequences.   I would have used it but you
need to pay for it and we don't have the funding to purchase much at the
moment.

I brielfy looked at PHYLIP and EMBOSS but it didn't look as
straight-forward as I was hoping it would be.  Either that, or I was
getting fustrated looking for a simple solution.  

In the end, I found a molecular evolution book that talks about several
methods used for non-coding sequences so I went ahead and implemented
them.  They seem to work well.  

Ryan


-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 2:43 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


You should look at the Align::DNAStatistics module if you just want  
pairwise DNA distance.  I put in several different distance methods.   
Or you can use the distance methods implemented in PHYLIP or EMBOSS  
programs -- I thought you wanted the somewhat more sophisticated ML  
approaches that are implemented in PAML?

--jason

On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:

> I've managed to code three methods to calculate K into a perl script 
> using the algorithms as described in "Molecular Evolution" by Wen- 
> Hsuing
> Li.   I'd be happy to contribute it as a script...
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 9:40 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> baseml is not well-supported to my knowledge - I think I started with
> attempt to capture a small amount of the data in the file.  There are
> some people who have made modifications to possible parse it in-house
> but I know of no submitted patches.   Many of the knowledgeable
> people are probably at the evolution meetings  this week.
>
> I have no idea about the full set of information in the report files
> without going back to the Yang papers first.   It depends on how much
> of that information you really want to capture of just the
> substitution rates.
>
> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
> work+PAML.
>
> -jason
> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>> baseml in the PAML package to measure the distances of some non- 
>> coding
>
>> regions.
>>
>> I started with the coding regions, and used the script
>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>> something similar for non-coding regions.  However, when I call
>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>> meaning matrix was never defined.
>>
>> I wanted to find out if anyone on here has done this before or  
>> knows a
>
>> way to measure substitution frequencies of non-coding regions with  
>> the
>
>> PAML package.  The documentation with PAML is sparse so I'm not
>> sure how
>> to interpret its output directly - that's why I'm using Bioperl.
>>
>> Hopefully someone can help me before I start digging into the
>> code...Thanks.
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From golharam at umdnj.edu  Sat Jun 24 22:37:15 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 18:37:15 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
Message-ID: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>

I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
alignments and parsing the resulting alignments.

The ClustalW output is being sent to STDOUT.  Is there a way I can
redirect the output to STDERR instead?

Here's how I'm using it:

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

(Forgive me if it in the docs - I've been coding for a week straight now
including saturday)

Thanks, Ryan


From cjfields at uiuc.edu  Sun Jun 25 00:16:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 19:16:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>

On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

A BioPerl module version is not the same as the distribution  
version.  All the modules have different version numbers  
corresponding to CVS commits for various code changes.  If you want  
to see the version for the distribution, read this:

http://www.bioperl.org/wiki/ 
FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

Many 'bug fixes', you'll find, have less to do with problems/bugs in  
BioPerl code than they do with outside code changes beyond our  
control.  By that I mean changes to other programs modify output so  
parsers break (BLAST, PAML, etc), or changes to API for remote  
databases that break queries (recent changes in EBI database  
concerning Swissprot, for example).  So, the code is considered  
'stable' at the time of release, but past that point issues beyond  
our control may break certain modules parsing output, accessing  
remote databases, and so on, at any time. This link:

http://www.bioperl.org/wiki/FAQ#BioPerl_in_General

should answer a few more questions you may have.  The FAQ is very  
helpful...

In general, if there are problems with code you could look at the  
latest developer's release (1.5.1, released in Oct 2005) to see if  
any bugs have been fixed.  They may be fixed post-1.5.1 and will be  
in CVS; you can always suggest using 1.5.1 (it's pretty stable) and  
updating only the fixed modules from CVS if needed.

> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

No, it's not that easy.  BioPerl isn't like most CPAN modules with  
one or two developers.  See the wiki page for details on planning  
releases to see why:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

It takes a lot of effort and coordination, much more so than the  
average CPAN module.  I believe some of the core developers are  
meeting this weekend; maybe something will come of that and we'll get  
an idea of a next release.

Chris

> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sun Jun 25 01:02:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 24 Jun 2006 21:02:36 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449D9464.6030508@purdue.edu>
References: <449D9464.6030508@purdue.edu>
Message-ID: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>


On Jun 24, 2006, at 3:37 PM, Phillip SanMiguel wrote:

> Here is an example bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1682
>
> It was a bug fixed in a module in BioPerl 1.4  back in October of  
> 2004.
> The module was Bio::Seq::QualI.  The patch resulted in v. 1.7 of the
> module. However the version of the module currently available from  
> CPAN
> is 1.6. (That is the current "stable" release, BioPerl 1.4.0)
>
> I've written a script that relies on that bug being fixed. How  
> should I
> deal with this when I want to give the script to others to use? Just
> tell them "You must have BioPerl 1.5 installed". Give them  
> instructions
> for patching the module code?

Either way. If the patch is trivial you could also provide the patch  
as an option. Generally we don't support that though. (Not everything  
that we don't support we don't support because it doesn't work.  
Sometimes it's just a statement along 'it-probably-works-but-don't- 
bug-us-if-it-doesn't'.)

>
> How long before the next "stable" release? Maybe a year? Should not a
> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
> one? Or
> would that be very difficult?

1.5.1 fixes a number of other problems too, so there isn't really  
much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,  
so investing time into creating 1.4.1 we think is not the best  
investment we can make.

Our current goal is to release 1.5.2 and possibly more development  
versions all leading on a steady path to 1.6.0. There's very few (but  
significant) stumbling blocks on this path that will require I  
believe some dedicated time from a couple  of people and after that  
there shouldn't be any real obstacles. It's quite possible that at  
BOSC or as early as next week at the GMOD meeting we could see a leap  
forward, typically it's those meetings that pull the respective  
people away from their daily obligations (short of an actual  
hackathons).

Some time back in spring 1.6 was put in proximity to BOSC, but that's  
probably not going to happen, but quite possibly not that much  
afterwards.

	-hilmar

>
> By the way, I think the revision graph viewer is great for someone, at
> best, peripherally involved in BioPerl to figure out which module
> version is associated with which BioPerl version, for example:
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ 
> QualI.pm?graph=1
> Phillip SanMiguel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Jun 25 01:21:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:21:56 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <000301c697f5$c08d1150$15327e82@pyrimidine>

According to the docs ( ;> ) the default behaviour is to return "a BioPerl
Bio::SimpleAlign object which can then be printed and/or saved in multiple
formats using the AlignIO.pm module"; you should be able to do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight now
> including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Sun Jun 25 01:38:06 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 24 Jun 2006 21:38:06 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
References: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>

they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm  
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as
> straight-forward as I was hoping it would be.  Either that, or I was
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about  
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Sun Jun 25 01:40:49 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 24 Jun 2006 20:40:49 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <A9574964-E315-4BC6-9485-AED8A79B71D5@gmx.net>
Message-ID: <000401c697f8$62d41e70$15327e82@pyrimidine>

...
> > I've written a script that relies on that bug being fixed. How
> > should I
> > deal with this when I want to give the script to others to use? Just
> > tell them "You must have BioPerl 1.5 installed". Give them
> > instructions
> > for patching the module code?
> 
> Either way. If the patch is trivial you could also provide the patch
> as an option. Generally we don't support that though. (Not everything
> that we don't support we don't support because it doesn't work.
> Sometimes it's just a statement along 'it-probably-works-but-don't-
> bug-us-if-it-doesn't'.)

The bug was fixed post-1.4 release according to the link, so Phillip should
use v1.5.1 or newer.

Hilmar's right.  It's hard to address every single complaint about code not
working or method not implemented w/o having patches or fixes submitted.
It's not my top priority to fix bugs in modules submitted by other authors
when I don't know the code.  I'll try if I have the free time, but that's
getting to be a precious commodity lately...

> > How long before the next "stable" release? Maybe a year? Should not a
> > BioPerl 1.4.1 be released so CPAN would get bug fixes like this
> > one? Or
> > would that be very difficult?
> 
> 1.5.1 fixes a number of other problems too, so there isn't really
> much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1,
> so investing time into creating 1.4.1 we think is not the best
> investment we can make.
> 
> Our current goal is to release 1.5.2 and possibly more development
> versions all leading on a steady path to 1.6.0. There's very few (but
> significant) stumbling blocks on this path that will require I
> believe some dedicated time from a couple  of people and after that
> there shouldn't be any real obstacles. It's quite possible that at
> BOSC or as early as next week at the GMOD meeting we could see a leap
> forward, typically it's those meetings that pull the respective
> people away from their daily obligations (short of an actual
> hackathons).
> 
> Some time back in spring 1.6 was put in proximity to BOSC, but that's
> probably not going to happen, but quite possibly not that much
> afterwards.
> 
> 	-hilmar
...

Nice to know.  I guess a Release Pumpkin will be picked as well.  BOSC is
right around the corner so I guess we can expect something announced soon as
to a possible roadmap (we can't talk about 'timelines' in the States, it's
not patriotic).  

Chris


From golharam at umdnj.edu  Sun Jun 25 03:03:01 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:03:01 -0400
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000301c697f5$c08d1150$15327e82@pyrimidine>
Message-ID: <000301c69803$df899f20$2f01a8c0@GOLHARMOBILE1>

Thanks Chris.  It is in fact when you call align() that clustalw
generates the output that you see on the console.  The alignment is
generates I'm parsing right away.  Here's the output (an example) of
what I'm referring to:

-- BEGIN --
 CLUSTAL W (1.83) Multiple Sequence Alignments


Sequence format is Pearson
Sequence 1: human           271 aa
Sequence 2: mouse           264 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  90
Guide tree        file created:   [/tmp/TX4yxP9uKQ/80W87TkT5Z.dnd]
Start of Multiple Alignment
There are 1 groups
Aligning...
Group 1: Sequences:   2      Score:5469
Alignment Score 1480
GCG-Alignment file created      [/tmp/TX4yxP9uKQ/xE4GNyY7Rc]
-- END --

How do I get this to do to stderr instead of stdout? 

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Saturday, June 24, 2006 9:22 PM
To: golharam at umdnj.edu; bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
redirect?


According to the docs ( ;> ) the default behaviour is to return "a
BioPerl Bio::SimpleAlign object which can then be printed and/or saved
in multiple formats using the AlignIO.pm module"; you should be able to
do something
like:

use Bio::AlignIO;
...
my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
my $aa_aln = $aln_factory->align(\@aa_seq);

my $al_out = Bio::AlignIO::new(-format => 'clustalw',
                               -fh     => \*STDERR);

$al_out->write_aln($aa_aln);

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Saturday, June 24, 2006 5:37 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output 
> redirect?
> 
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some 
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can 
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);
> 
> (Forgive me if it in the docs - I've been coding for a week straight 
> now including saturday)
> 
> Thanks, Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Sun Jun 25 03:05:41 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sat, 24 Jun 2006 23:05:41 -0400
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
	package)
In-Reply-To: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org>
Message-ID: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>

>>they make no assumption about coding sequence,
>>where do you get that impression

I get that information from the 1.5 api docs:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/

Its documented under the description section.  

Oh well, I have it coded and working...might as well use it.

Ryan
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Saturday, June 24, 2006 9:38 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
package)


they make no assumption about coding sequence,where do you get that  
impression.  the ka,ks are for coding but the tamura/nei kimura,  
jukes-cantor are all for any type of sequence.

the phylip and emboss are pretty straightforward IMHO - you give it  
an alignment and you get out a matrix of pairwise numbers....
\
but whatever makes sense to you - we are using the same methods as  
are in Li's book (that is where I took the equations from).
-j
On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:

> Hi Jason,
>
> It looks like DNAStatistics is only for coding sequences.  I'm
> trying to
> calculate the Ks of exons and the K (or Ki) of introns.  All the  
> methods
> in bioperl are based on coding sequences.  Only the  PAUP package  
> (that
> I've found) does non-coding sequences.   I would have used it but you
> need to pay for it and we don't have the funding to purchase much  
> at the
> moment.
>
> I brielfy looked at PHYLIP and EMBOSS but it didn't look as 
> straight-forward as I was hoping it would be.  Either that, or I was 
> getting fustrated looking for a simple solution.
>
> In the end, I found a molecular evolution book that talks about
> several
> methods used for non-coding sequences so I went ahead and implemented
> them.  They seem to work well.
>
> Ryan
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
> Jason
> Stajich
> Sent: Saturday, June 24, 2006 2:43 PM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from  
> PAML
> package)
>
>
> You should look at the Align::DNAStatistics module if you just want
> pairwise DNA distance.  I put in several different distance methods.
> Or you can use the distance methods implemented in PHYLIP or EMBOSS
> programs -- I thought you wanted the somewhat more sophisticated ML
> approaches that are implemented in PAML?
>
> --jason
>
> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>
>> I've managed to code three methods to calculate K into a perl script
>> using the algorithms as described in "Molecular Evolution" by Wen-
>> Hsuing
>> Li.   I'd be happy to contribute it as a script...
>>
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>> Jason
>> Stajich
>> Sent: Saturday, June 24, 2006 9:40 AM
>> To: golharam at umdnj.edu
>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>> PAML
>> package)
>>
>>
>> baseml is not well-supported to my knowledge - I think I started with
>> attempt to capture a small amount of the data in the file.  There are
>> some people who have made modifications to possible parse it in-house
>> but I know of no submitted patches.   Many of the knowledgeable
>> people are probably at the evolution meetings  this week.
>>
>> I have no idea about the full set of information in the report files
>> without going back to the Yang papers first.   It depends on how much
>> of that information you really want to capture of just the
>> substitution rates.
>>
>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>> work+PAML.
>>
>> -jason
>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>> baseml in the PAML package to measure the distances of some non-
>>> coding
>>
>>> regions.
>>>
>>> I started with the coding regions, and used the script
>>> bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>> something similar for non-coding regions.  However, when I call
>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>> meaning matrix was never defined.
>>>
>>> I wanted to find out if anyone on here has done this before or
>>> knows a
>>
>>> way to measure substitution frequencies of non-coding regions with
>>> the
>>
>>> PAML package.  The documentation with PAML is sparse so I'm not
>>> sure how
>>> to interpret its output directly - that's why I'm using Bioperl.
>>>
>>> Hopefully someone can help me before I start digging into the
>>> code...Thanks.
>>>
>>> Ryan
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Sun Jun 25 11:33:58 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Jun 2006 12:33:58 +0100
Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output
	redirect?
In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
References: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <449E74A6.3020709@sendu.me.uk>

Ryan Golhar wrote:
> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some
> alignments and parsing the resulting alignments.
> 
> The ClustalW output is being sent to STDOUT.  Is there a way I can
> redirect the output to STDERR instead?
> 
> Here's how I'm using it:
> 
> my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> my $aa_aln = $aln_factory->align(\@aa_seq);

You can suppress the output completely using
$aln_factory->quiet(1);

(supplying quiet => 1 to new() should also work according to the docs, 
but doesn't seem to be implemented, though I could be wrong)

If you really want the messages on STDERR you could try redirecting 
STDOUT to STDERR before calling align():
open(OLDOUT, ">&STDOUT");
open(STDOUT, ">&STDERR");
my $aa_aln = $aln_factory->align(\@aa_seq);
open(STDOUT, ">&OLDOUT");

I haven't tested either of these ideas, but I think they should both 
work - try them out and let us know.

Ideally there would be a saner way of doing this, but it isn't readily 
apparent to me.


From jason at bioperl.org  Sun Jun 25 12:37:11 2006
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 25 Jun 2006 08:37:11 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with
	(baseml from PAML package)]
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>


On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:

>>> they make no assumption about coding sequence,
>>> where do you get that impression
>
> I get that information from the 1.5 api docs:
>
> http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
great - I would also always point people to the LIVE code  
documentation not the 1.5.0-RC1 which is +1 years old, but nothing  
particular has changed in this module since 1.5.0 that I know of.   
Someday someone will put a new ball of docs up on the site, but I  
hope that will come with the next development or stable release.

> Its documented under the description section.
>
i don't really see what you refer to since there is a lot of  
documentation, but perhaps it should be clarified - I had hoped this  
was a sufficient description:
"This object contains routines for calculating various statistics and  
distances for DNA alignments."

> Oh well, I have it coded and working...might as well use it.
>
Sounds like your best bet for your situation.

For the record and in the mailing list archives - as long as you  
don't call a method that contains "KaKs" it will work fine.  You can  
calculate distances using the currently implemented distance methods:

    JukesCantor
    Uncorrected
    F81
    Kimura
    Tamura
    F84 (Felsenstien 84)
    TajimaNei
    JinNei


It will be more productive is to just drop the discussion since you  
seem to be fine without all of this anyways  - if you decide you  
would like to use it and contribute new distances methods or doc  
fixes I am sure we'll enjoy your contributions.


-jason
--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Sun Jun 25 17:05:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 12:05:34 -0500
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
	with(baseml from PAML package)]
In-Reply-To: <D245656C-9366-457A-80A8-5E949F832923@bioperl.org>
Message-ID: <000901c69879$97b7d5b0$15327e82@pyrimidine>

> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing
> particular has changed in this module since 1.5.0 that I know of.
> Someday someone will put a new ball of docs up on the site, but I
> hope that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as it's
the most up-to-date, that fact isn't really emphasized much on the docs
page; the link is along with the other toolkits at the bottom of the page
and is listed as Bioperl Core Code (some users don't seem to get that, in
general, bioperl=bioperl core).  Could be this is causing a bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or that I
should be able to!).  Maybe the best way to go is to have a wiki page for
this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or at
least link to the Release Pumpkin page which indicates the code release date
for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.  Work
is underway to correct them, but do not expect this code to give you the
right answer currently!  Use dnadist/distmat in the PHLYIP or EMBOSS
packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some point
relatively soon myself so I may be up to some testing on one/both of these
modules if needed.

Chris

....


From golharam at umdnj.edu  Sun Jun 25 17:20:12 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Sun, 25 Jun 2006 13:20:12 -0400
Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML
 with(baseml from PAML package)]
In-Reply-To: <000901c69879$97b7d5b0$15327e82@pyrimidine>
Message-ID: <000801c6987b$9e65f840$2f01a8c0@GOLHARMOBILE1>

Exactly.  Also on the page it says (in the descriptionfor
Bio::Align::DNAStatistics):

In order to use these methods there are
several pre-requisites for the alignment.

   1
   DNA alignment must be based on protein alignment. Use the subroutine
aa_to_dna_aln    in Bio::Align::Utilities to achieve this.

 Etc etc etc


The rest of the pre-reqs also mention that the sequences should be
coding sequences.  Because of this, I thought DNAStatistics was only for
coding sequences and could not be used for non-coding sequences...

Anyway, I've gotten past my troubles and am on to finish this project.
I think the isssues I ran into others might run into as well.  I'd be
happy to contribue what I can but need to finish this stuff first...
Thanks for all your help Jason, Chris, Sendu!

Ryan


-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Sunday, June 25, 2006 1:06 PM
To: 'Jason Stajich'; golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] DNA distance methods [was
Bio::Tools::Phylo::PAML with(baseml from PAML package)]


> On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote:
> 
> >>> they make no assumption about coding sequence,
> >>> where do you get that impression
> >
> > I get that information from the 1.5 api docs:
> >
> > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
> >
> great - I would also always point people to the LIVE code 
> documentation not the 1.5.0-RC1 which is +1 years old, but nothing 
> particular has changed in this module since 1.5.0 that I know of. 
> Someday someone will put a new ball of docs up on the site, but I hope

> that will come with the next development or stable release.

Though I agree that the bioperl-live code is the best place for docs as
it's the most up-to-date, that fact isn't really emphasized much on the
docs page; the link is along with the other toolkits at the bottom of
the page and is listed as Bioperl Core Code (some users don't seem to
get that, in general, bioperl=bioperl core).  Could be this is causing a
bit of confusion
for some.   The page hasn't really been updated in a while so that could
explain a lot; I'm not sure I can actually do any updating myself (or
that I should be able to!).  Maybe the best way to go is to have a wiki
page for this instead...(thinking aloud, sorry).

I was also thinking we should add something to http://doc.bioperl.org
indicating the age of the various docs as most are over 2 years old, or
at least link to the Release Pumpkin page which indicates the code
release date for the various releases:

http://www.bioperl.org/wiki/Release_pumpkin

Besides that, I agree with pretty much everything that you said; the
Bio::Align::DNAStatistics docs seem self-explanatory.

BTW, is the following still true (from Bio::Align::DNAStatistics):

"The routines are not well tested and do contain errors at this point.
Work is underway to correct them, but do not expect this code to give
you the right answer currently!  Use dnadist/distmat in the PHLYIP or
EMBOSS packages to calculate the distances."

I'll likely be using this and Bio::Align::ProteinStatistics at some
point relatively soon myself so I may be up to some testing on one/both
of these modules if needed.

Chris

....


From pmiguel at purdue.edu  Sun Jun 25 19:02:14 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sun, 25 Jun 2006 15:02:14 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
Message-ID: <449EDDB6.8020401@purdue.edu>

Chris Fields wrote:
> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>
> [...]
>> How long before the next "stable" release? Maybe a year? Should not a
>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or
>> would that be very difficult?
>
> No, it's not that easy.  BioPerl isn't like most CPAN modules with one 
> or two developers.  See the wiki page for details on planning releases 
> to see why:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> It takes a lot of effort and coordination, much more so than the 
> average CPAN module.  I believe some of the core developers are 
> meeting this weekend; maybe something will come of that and we'll get 
> an idea of a next release.
>
> Chris
Hi Chris,
   Thanks for the information--the key part being that a bug fix from a 
couple of years ago has not propagated into the current stable release. 
Below I'll try to convince you that this is a serious problem. (Not 
because it is your fault, of course. I'm just trying to deliver my take 
on the situation to the bioperl-programmer-warriors who happen to be 
listening...)
   It isn't a problem for me to edit the offending statement in the 
QualI.pm module on systems I generally use. Or even install a 
developer's release of bioperl. My problem is one of advocacy. Maybe I 
have a warped view of the world, but it seems that except for those 
directly involved in the bioperl or GMOD projects, everyone looks to 
CPAN when they install bioperl.
    I write scripts that I sometimes want to send to biologists even 
less programming-capable than I am. I can just barely envision those 
biologists pestering their sysadmin to do a CPAN install of bioperl 
modules so that my script will work. But installing a non-CPAN set of 
modules probably isn't going to happen.
    So, this being the case, how can I, with a clear conscious, advocate 
bioperl to the junior bioinformaticians with whom I happen to interact?
    My take, for what it is worth, is that 1.5 has become an unratified 
stable release. How hard would it be to take 1.5.1--as is--and deposit 
that in CPAN? What would be the downside?

Phillip SanMiguel
   

From hlapp at gmx.net  Sun Jun 25 19:42:20 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 25 Jun 2006 15:42:20 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>

We did not and will not deposit 1.5.1 into CPAN due to the API issues  
in some (rather central) interfaces. These issues are changes over  
the 1.4 API and some of those changes are going to go away. Once we  
deposit it into CPAN we would sanction the changed API as the new  
'official' API and would open a huge can of backward liability worms.  
If you just continue to use the 1.4 API on the 1.5.1 release you  
don't need to be concerned about an API method you're using going away.

As I said, the people from the core group of developers who have  
traditionally shepherded releases all think that doing a 1.4.1  
release wouldn't be the best investment of their time. You are most  
welcome to disagree and volunteer your time to coordinate the 1.4.1  
release, and a lot of people will appreciate your efforts - including  
the bioperl developers and 'core'. It shouldn't be much work  
theoretically.

	-hilmar

On Jun 25, 2006, at 3:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one
>> or two developers.  See the wiki page for details on planning  
>> releases
>> to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the
>> average CPAN module.  I believe some of the core developers are
>> meeting this weekend; maybe something will come of that and we'll get
>> an idea of a next release.
>>
>> Chris
> Hi Chris,
>    Thanks for the information--the key part being that a bug fix  
> from a
> couple of years ago has not propagated into the current stable  
> release.
> Below I'll try to convince you that this is a serious problem. (Not
> because it is your fault, of course. I'm just trying to deliver my  
> take
> on the situation to the bioperl-programmer-warriors who happen to be
> listening...)
>    It isn't a problem for me to edit the offending statement in the
> QualI.pm module on systems I generally use. Or even install a
> developer's release of bioperl. My problem is one of advocacy. Maybe I
> have a warped view of the world, but it seems that except for those
> directly involved in the bioperl or GMOD projects, everyone looks to
> CPAN when they install bioperl.
>     I write scripts that I sometimes want to send to biologists even
> less programming-capable than I am. I can just barely envision those
> biologists pestering their sysadmin to do a CPAN install of bioperl
> modules so that my script will work. But installing a non-CPAN set of
> modules probably isn't going to happen.
>     So, this being the case, how can I, with a clear conscious,  
> advocate
> bioperl to the junior bioinformaticians with whom I happen to  
> interact?
>     My take, for what it is worth, is that 1.5 has become an  
> unratified
> stable release. How hard would it be to take 1.5.1--as is--and deposit
> that in CPAN? What would be the downside?
>
> Phillip SanMiguel
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Jun 25 20:20:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Jun 2006 15:20:20 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <449EDDB6.8020401@purdue.edu>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
Message-ID: <7C28EA28-031A-4B1C-9625-A643247445FD@uiuc.edu>


On Jun 25, 2006, at 2:02 PM, Phillip SanMiguel wrote:

> Chris Fields wrote:
>> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote:
>>
>> [...]
>>> How long before the next "stable" release? Maybe a year? Should  
>>> not a
>>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this  
>>> one? Or
>>> would that be very difficult?
>>
>> No, it's not that easy.  BioPerl isn't like most CPAN modules with  
>> one or two developers.  See the wiki page for details on planning  
>> releases to see why:
>>
>> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>>
>> It takes a lot of effort and coordination, much more so than the  
>> average CPAN module.  I believe some of the core developers are  
>> meeting this weekend; maybe something will come of that and we'll  
>> get an idea of a next release.
>>
>> Chris
> Hi Chris,
>   Thanks for the information--the key part being that a bug fix  
> from a couple of years ago has not propagated into the current  
> stable release. Below I'll try to convince you that this is a  
> serious problem. (Not because it is your fault, of course. I'm just  
> trying to deliver my take on the situation to the bioperl- 
> programmer-warriors who happen to be listening...)
>   It isn't a problem for me to edit the offending statement in the  
> QualI.pm module on systems I generally use. Or even install a  
> developer's release of bioperl. My problem is one of advocacy.  
> Maybe I have a warped view of the world, but it seems that except  
> for those directly involved in the bioperl or GMOD projects,  
> everyone looks to CPAN when they install bioperl.

Again, it's not as easy as you make it seem.  The idea is to upgrade  
the CPAN version to stable releases (even numbered) and that odd- 
numbered releases would be developer versions.  Yes, it has been a  
while since the last stable version; it could be a while until the  
next as there have been suggestions of an interim 1.5.x release or so  
before that occurs (though he did say 1.6 could be soon after BOSC  
which is in August).  Hilmar has explained that there are some  
stumbling blocks to get around before the next major release (if  
those 'stumbling blocks' are what I think they are, I agree).  It's  
very likely implementation of changes that he mentions may require  
refactoring code, changing API, etc.  Not easy in a project like  
this, a large core of contributors and with the developers scattered  
all over the world, all with different priorities (we all have $jobs  
after all).

That's why we have a Release Pumpkin, akin to the Pumpkings that have  
ushered forth regular perl releases.  It requires a large,  
coordinated effort with one person acting as overseer, pushing  
everybody to meet deadlines.  Not easy and not, by a long shot, your  
typical CPAN module.

>    I write scripts that I sometimes want to send to biologists even  
> less programming-capable than I am. I can just barely envision  
> those biologists pestering their sysadmin to do a CPAN install of  
> bioperl modules so that my script will work. But installing a non- 
> CPAN set of modules probably isn't going to happen.
>    So, this being the case, how can I, with a clear conscious,  
> advocate bioperl to the junior bioinformaticians with whom I happen  
> to interact?

Give those biologists some credit. Quite frankly, I would expect any  
bioinformaticist or computational biologist, junior or otherwise, to  
know or at least learn how to install from CPAN or from CVS,  
otherwise they need to change their job title.  And, as a  
microbiologist myself (i.e. one of those biologists you mention) and  
as one who regularly interacts with biologists with little to no  
computer science experience, I believe I can speak from experience.   
I find the install documents that come with BioPerl and available on  
the wiki pretty much cover everything, from how to install to the  
dependencies required to problems one may encounter.   The web site  
has a tone of documentation, including the FAQ (*cough* which covers  
this ground *cough*).

If they are running perl scripts and using a system that requires  
sysadmin privileges they probably know what thy are doing anyway.  If  
not they probably have students/employees that do know what's going  
on (and who may be the ones actually running the scripts).  You can't  
please everybody, so I think you can proceed with a clear conscious  
knowing you did the best that you can to help!

>    My take, for what it is worth, is that 1.5 has become an  
> unratified stable release. How hard would it be to take 1.5.1--as  
> is--and deposit that in CPAN? What would be the downside?

Ah I see Hilmar has responded.  I think he adequately answers this.   
API is everything; changing API suddenly is bad bad bad.

> Phillip SanMiguel

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From akholloway at ucdavis.edu  Mon Jun 26 04:15:16 2006
From: akholloway at ucdavis.edu (Alisha Holloway)
Date: Sun, 25 Jun 2006 21:15:16 -0700
Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
 package)
In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1>
Message-ID: <a06230932c0c50e71ad97@[10.0.1.2]>

Hi Ryan & Jason,

Sorry I didn't get back to you sooner.  I escaped the central valley 
heat (108!) and went to the coast for the weekend.  I do have a 
script that will call baseml and then parse the results.  Here it is 
and, Ryan, I can show you how to retrieve other parts of the data as 
well, but you may already know how to do this.  I know it's ugly, I 
got it working and didn't clean it up.  Just let me know if you need 
more info.

Alisha

At 11:05 PM -0400 6/24/06, Ryan Golhar wrote:
>  >>they make no assumption about coding sequence,
>>>where do you get that impression
>
>I get that information from the 1.5 api docs:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/
>
>Its documented under the description section. 
>
>Oh well, I have it coded and working...might as well use it.
>
>Ryan
>-----Original Message-----
>From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
>Stajich
>Sent: Saturday, June 24, 2006 9:38 PM
>To: golharam at umdnj.edu
>Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML
>package)
>
>
>they make no assumption about coding sequence,where do you get that 
>impression.  the ka,ks are for coding but the tamura/nei kimura, 
>jukes-cantor are all for any type of sequence.
>
>the phylip and emboss are pretty straightforward IMHO - you give it 
>an alignment and you get out a matrix of pairwise numbers....
>\
>but whatever makes sense to you - we are using the same methods as 
>are in Li's book (that is where I took the equations from).
>-j
>On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote:
>
>>  Hi Jason,
>>
>>  It looks like DNAStatistics is only for coding sequences.  I'm
>>  trying to
>>  calculate the Ks of exons and the K (or Ki) of introns.  All the 
>>  methods
>>  in bioperl are based on coding sequences.  Only the  PAUP package 
>>  (that
>>  I've found) does non-coding sequences.   I would have used it but you
>>  need to pay for it and we don't have the funding to purchase much 
>>  at the
>>  moment.
>>
>>  I brielfy looked at PHYLIP and EMBOSS but it didn't look as
>>  straight-forward as I was hoping it would be.  Either that, or I was
>>  getting fustrated looking for a simple solution.
>>
>>  In the end, I found a molecular evolution book that talks about
>>  several
>>  methods used for non-coding sequences so I went ahead and implemented
>>  them.  They seem to work well.
>>
>>  Ryan
>>
>>
>>  -----Original Message-----
>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>  Jason
>>  Stajich
>>  Sent: Saturday, June 24, 2006 2:43 PM
>>  To: golharam at umdnj.edu
>>  Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway'
>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from 
>>  PAML
>>  package)
>>
>>
>>  You should look at the Align::DNAStatistics module if you just want
>>  pairwise DNA distance.  I put in several different distance methods.
>>  Or you can use the distance methods implemented in PHYLIP or EMBOSS
>>  programs -- I thought you wanted the somewhat more sophisticated ML
>>  approaches that are implemented in PAML?
>>
>>  --jason
>>
>>  On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote:
>>
>>>  I've managed to code three methods to calculate K into a perl script
>>>  using the algorithms as described in "Molecular Evolution" by Wen-
>>>  Hsuing
>>>  Li.   I'd be happy to contribute it as a script...
>>>
>>>
>>>
>>>  -----Original Message-----
>>>  From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of
>>>  Jason
>>>  Stajich
>>>  Sent: Saturday, June 24, 2006 9:40 AM
>>>  To: golharam at umdnj.edu
>>>  Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway
>>>  Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from
>>>  PAML
>>>  package)
>>>
>>>
>>>  baseml is not well-supported to my knowledge - I think I started with
>>>  attempt to capture a small amount of the data in the file.  There are
>  >> some people who have made modifications to possible parse it in-house
>>>  but I know of no submitted patches.   Many of the knowledgeable
>>>  people are probably at the evolution meetings  this week.
>>>
>>>  I have no idea about the full set of information in the report files
>>>  without going back to the Yang papers first.   It depends on how much
>>>  of that information you really want to capture of just the
>>>  substitution rates.
>>>
>>>  I'm Ccing Alisha in case she has ideas/solutions from her drosophila
>>>  work+PAML.
>>>
>>>  -jason
>>>  On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote:
>>>
>>>>  Hi all,
>>>>
>>>>  I'm trying to use Bio::Tools::Phylo::PAML to parse the results from
>>>>  baseml in the PAML package to measure the distances of some non-
>>>>  coding
>>>
>>>>  regions.
>>>>
>>>>  I started with the coding regions, and used the script
>>>>  bp_pairwise_kaks.pl without much trouble.  Now, I'm trying to do
>>>>  something similar for non-coding regions.  However, when I call
>>>>  Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef'
>>>>  meaning matrix was never defined.
>>>>
>>>>  I wanted to find out if anyone on here has done this before or
>>>>  knows a
>>>
>>>>  way to measure substitution frequencies of non-coding regions with
>>>>  the
>>>
>>>>  PAML package.  The documentation with PAML is sparse so I'm not
>>>>  sure how
>>>>  to interpret its output directly - that's why I'm using Bioperl.
>>>>
>>>>  Hopefully someone can help me before I start digging into the
>>>>  code...Thanks.
>>>>
>>>>  Ryan
>>>>
>>>>  _______________________________________________
>>>>  Bioperl-l mailing list
>>>>  Bioperl-l at lists.open-bio.org
>>>>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  --
>>>  Jason Stajich
>>>  Duke University
>>>  http://www.duke.edu/~jes12
>>>
>>
>>  --
>>  Jason Stajich
>>  Duke University
>>  http://www.duke.edu/~jes12
>>
>
>--
>Jason Stajich
>Duke University
>http://www.duke.edu/~jes12


-- 
Alisha Holloway

Postdoctoral Fellow
Section of Evolution & Ecology
3347 Storer Hall
University of California
Davis, CA  95616

530-754-9551 Office
512-297-3958 Cell
530-752-1449 Fax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: batch_baseml_50nt.pl
Type: application/octet-stream
Size: 5395 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: baseml.ctl
Type: application/octet-stream
Size: 1699 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0009.obj>

From fernan at iib.unsam.edu.ar  Mon Jun 26 12:47:30 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 09:47:30 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
Message-ID: <20060626124730.GA53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
|
| We did not and will not deposit 1.5.1 into CPAN due to the API issues  
| in some (rather central) interfaces. These issues are changes over  
| the 1.4 API and some of those changes are going to go away. Once we  
| deposit it into CPAN we would sanction the changed API as the new  
| 'official' API and would open a huge can of backward liability worms.  
| If you just continue to use the 1.4 API on the 1.5.1 release you  
| don't need to be concerned about an API method you're using going away.
| 
| As I said, the people from the core group of developers who have  
| traditionally shepherded releases all think that doing a 1.4.1  
| release wouldn't be the best investment of their time. You are most  
| welcome to disagree and volunteer your time to coordinate the 1.4.1  
| release, and a lot of people will appreciate your efforts - including  
| the bioperl developers and 'core'. It shouldn't be much work  
| theoretically.
| 
| 	-hilmar
|
+----]

I understand that, being a volunteer project, people can
decide where to best invest their time. If core developers
are no longer using 1.4 in their production setups, it is
reasonable to expect that they invest all of their time in
1.5 or any other bioperl version that they're using.

However, when view as an issue related to the setting of a
policy for the whole project, then it makes sense to have a
policy saying for how long a stable release will be
supported, and when and in which case bugfixes that are committed
to and tested in the development branch (as it should be)
will get merged back to stable. 

I'm not knowledgeable enough about the bioperl release
engineering process, nor about the internal development
process, but just guessing I'd expect that whenever anyone
submits a bugfix, it should be the responsibility of
the committer to check (against the project policy,
(written or implicit) or with the core developers in a
difficult case) whether the fix should be committed to more
than one branch.

A patch like the one that started this thread, should have
been committed to the 1.4 branch without too much thinking.
And it would have cost the committer only a few seconds more
of her/his time. 

But you only get this by setting and enforcing a policy.

After a number of these fixes has accumulated, then making a
new release shouldn't represent too much effort, nor it
should be expected that the tests that passed before would
break now. And in the worst case (no tarball release),
people can be directed to obtain the most current 'stable'
code from the repository, containing all bugfixes. 

I guess that this is what was meant by Phillip.

Fernan


From hlapp at gmx.net  Mon Jun 26 13:59:00 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 09:59:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.

Sure. But for some reason he or she forgot. So what do you suggest we  
do - and I mean as a community, because this is a community project.  
Come after the guy  until he commits it to the branch? Or post an  
email to the list saying what you think is the right way and then do  
it (yourself)?

>
> But you only get this by setting and enforcing a policy.

Man, this is not a company. Take a step back and think again. What do  
you suggest we - again we as a community - do to enforce a policy?  
Take increasing levels of disciplinary action if someone keeps  
forgetting to commit to the branch?

While there are clearly some rules everybody needs to follow and if  
you violate them deliberately and repeatedly you will get your CVS  
privileges withdrawn, by and large we as a community need to accept  
some responsibility for making the project what we think it should be  
- and do so not by invoking disciplinary action but by living by  
example and by taking action yourself when you think action is due.

If Bioperl were a company and you asked for a 1.4.1 release and the  
customer service rep told you nope there's a 1.5.1 that you should  
use instead and that will do just fine, what will you do? Argue with  
him about the company policies and whether they are properly enforced  
or not?

Obviously doing so will be a waste of your time. In Bioperl it is at  
the bottom of it no less waste of your time, because instead you now  
have the opportunity to make happen what you believe needs to happen.  
We have had a history of rapidly and un-bureaucratically putting  
people in power of what they wanted to do. We have also had a history  
of not listening much to people who don't want to put their feet  
where their mouth is.

I'm sorry if what I'm saying puts people off, but really this is an  
open-source project and if you ask me it's one with the least  
barriers of entry for new developers or 'activists' that you can find  
in the open source arena. This doesn't come without some degree of  
anarchy, but really IMHO that's more of an advantage than a  
disadvantage.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Mon Jun 26 14:13:00 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Jun 2006 10:13:00 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <BDC70861-52D3-4389-9073-07F456661B14@bioperl.org>

fair enough - we can certainly merge fixes onto the branch -  I am  
not sure why that is such a big deal.

once the changes are made to the branch, If someone then wants to  
update to the latest code on  1.4 branch,  they would  to volunteer  
to do the last step of:
cvs export -r branch-1-4 -d bioperl-1-4-1 bioperl-live

then validate it, then make a tar ball, we can submit a 1.4.x to  
CPAN, but honestly a lot of other fixes have accumulated since the  
1.4 branch and I don't think we want to keep merging back to it, we'd  
rather move forward. the not-so-compatible changes that got checked  
in after the 1.4 branch (having to do with Annotateable) has been  
part of the problem as this has not been fully fixed to make things  
backwards compatable.

Nathan asked earlier on the list about how to get a list of modules  
added since 1.4 and I can only say how to generate a diff to the  
current version of the code which might be more than what he is  
asking for. read the docs on cvs diff where you specify the two tags  
you want to diff between.


We certainly have a problem of meeting the needs of several different  
user groups - developers who need latest code, and users who want  
stable releases.  We either get funding to support stable releases  
more deliberately, things that don't seem to be on the main radar  
screen of primary developers or people who are tied to working with  
older stable releases.  Since most of us who are coding and making  
changes are just working from a CVS checkout we don't have a lot of  
pressure to make a release -- and we don't want to dump newly buggy  
(or broken interfaces) into CPAN on purpose.  It also seems like many  
reported bugs have already been fixed on the latest branch but people  
are less interested in back-fixing on the old branch.


Our hope is that 1.6 would be a good replacement for 1.4 - presumably  
API consistent for the most part, but we are suffering from lack of  
time of people willing to do the work to make this happen.

I have mentioned in the past that I cannot be the release master for  
the project and it is time for someone else to step up and make this  
happen.  Chris Fields has done a phenomenal job answering questions,  
fixing bugs, and helping run the project as some of us have started  
to have too busy of a schedule to keep daily tabs on Bioperl.  But he  
too will probably have to cycle off as his career responsibilities  
(and job search) takes more time.   I don't have a good answer for  
anyone on how to make this happen more smoothly, I am hopeful that  
the gmod mtg will spur some more commits and a roadplan for releasing  
the next dev release and seeing what can happen with 1.6.  If we  
funded a Bioperl coordinator I am sure that would help things more  
and manage the different sets of priorities of the user groups.

I think a dedicated hackathon to bioperl work could get 1.6 out after  
one week of solid work with some bug squashing followup.

Barring that we'll have to see what everyone else wants to see done  
to get the next release out.  The person leading the release doesn't  
have to really program things they just need to organize people  
around a time-frame, a set of features that need to be tested and  
fixed, and commitments from people of what they will do.

Much of the release process is documented on the bioperl wiki site,  
if this is not clear enough please make a note on the page/talk page  
and we can start .  My hope is that the wiki can be a good repository  
of the thought process behind the project.  right now too much of it  
is floating in the minds of former and current project coordinators.

...just some of my thoughts as I get ready to be off-line starting  
next week for 4 weeks...

-jason


On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:

> +----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 07:22):
> |
> | We did not and will not deposit 1.5.1 into CPAN due to the API  
> issues
> | in some (rather central) interfaces. These issues are changes over
> | the 1.4 API and some of those changes are going to go away. Once we
> | deposit it into CPAN we would sanction the changed API as the new
> | 'official' API and would open a huge can of backward liability  
> worms.
> | If you just continue to use the 1.4 API on the 1.5.1 release you
> | don't need to be concerned about an API method you're using going  
> away.
> |
> | As I said, the people from the core group of developers who have
> | traditionally shepherded releases all think that doing a 1.4.1
> | release wouldn't be the best investment of their time. You are most
> | welcome to disagree and volunteer your time to coordinate the 1.4.1
> | release, and a lot of people will appreciate your efforts -  
> including
> | the bioperl developers and 'core'. It shouldn't be much work
> | theoretically.
> |
> | 	-hilmar
> |
> +----]
>
> I understand that, being a volunteer project, people can
> decide where to best invest their time. If core developers
> are no longer using 1.4 in their production setups, it is
> reasonable to expect that they invest all of their time in
> 1.5 or any other bioperl version that they're using.
>
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.
>
> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.
>
> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
>
> But you only get this by setting and enforcing a policy.
>
> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.
>
> I guess that this is what was meant by Phillip.
>
> Fernan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From bix at sendu.me.uk  Mon Jun 26 14:44:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 15:44:55 +0100
Subject: [Bioperl-l] Tests
Message-ID: <449FF2E7.3040101@sendu.me.uk>

What level of testing is expected to be done in a test file? Is there 
such a thing as too many tests? Tests for every possible (documented) 
way of achieving a result with a module's method? Tests for every 
conceivable way of misusing a method?

If I come across a test for a module that doesn't test for everything 
the module can do, should I add tests as a matter of course? Would this 
be beneficial, or a waste of time (given that the module probably is 
bug-free already)?


From cjfields at uiuc.edu  Mon Jun 26 15:24:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:24:00 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar>
Message-ID: <001301c69934$909f83c0$15327e82@pyrimidine>

...
> However, when view as an issue related to the setting of a
> policy for the whole project, then it makes sense to have a
> policy saying for how long a stable release will be
> supported, and when and in which case bugfixes that are committed
> to and tested in the development branch (as it should be)
> will get merged back to stable.

In a project this large which relies on a lot of outside resources
maintaining API and availability at all times, having a completely bug-free
fix for any reasonable length of time is impossible.  As a small example,
almost every time NCBI changes BLAST output, it breaks our text parsers, and
though we recommend using the BLAST XML format parser (which is much more
stable), almost everybody continues using text parsing and wants that fixed.
Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
notification, so remote BLAST parsing can break at any time.  Fold into that
any software changes that change output or API (PAML comes to mind).  Fold
into that remote database changes (EBI interface to Swissprot).  Oh, let's
not forget sequence format changes (recent SwissProt and GenBank changes).
And, worst of all, we can't expect them to maintain API or output b/c
they're updating based on user input/suggestions or bug fixes which require
them to make changes.  What's 'stable' about that?

It's very easy to say you want something and then not volunteer to do it; if
you want something then put forth the time and effort to get it done.  Put
your money where your mouth is (as they say in my home state).

Again (for the third or fourth time now), putting together a release takes
some time and effort.  I actually think it takes more effort than Hilmar
suggests; either way, it requires someone to act as the leader (release
pumpkin) to handle changes, and I don't see anybody stepping forward.
Personally, if I have the time, maybe I'll handle an interim release, but
I'm looking for a job starting in the fall as well as finishing up research
for publication so that will take up almost all the time I have.  As Hilmar
says, if you want to do it, fine.  Realize, though, many many changes have
been made since 1.4 and many more will likely be made on the road to 1.6

> I'm not knowledgeable enough about the bioperl release
> engineering process, nor about the internal development
> process, but just guessing I'd expect that whenever anyone
> submits a bugfix, it should be the responsibility of
> the committer to check (against the project policy,
> (written or implicit) or with the core developers in a
> difficult case) whether the fix should be committed to more
> than one branch.

This is a large open-source project with a ton of developers all over the
world.  Check out the AUTHORS file; it's at best incomplete and still has
about 100 contributors.  

(Hey, my name's not on there!!!)

> A patch like the one that started this thread, should have
> been committed to the 1.4 branch without too much thinking.
> And it would have cost the committer only a few seconds more
> of her/his time.
> 
> But you only get this by setting and enforcing a policy.

You need to realize what this project is, what it is not, and how it
evolved.  A little history lesson might get you (and others) to understand
just how complex it all is (and how old some of the code is).

http://www.bioperl.org/wiki/FAQ#Can_you_explain_the_Object_Model_design_and_
rationale.3F

explains a bit on the project design.

http://www.bioperl.org/wiki/History_of_BioPerl

explains how BioPerl came to be.  

This is not a job or a company but an open-source project; it's origins are
based in the scientific community.  You're probably right about the person
not committing the change to the 1.4 branch.  We probably should have a
policy for commits to stable releases.  But how can we logically rationalize
doing so now for 1.4, almost three years hence?  We're post 1.5.1 and likely
going into 1.6 as we speak.  It's too late for 1.4 changes IMHO, frankly,
but you're welcome to try.  I don't think it's worth the effort.

As for policy enforcement, what would you want us to do?  This is a
volunteer effort.  Fire him/her?  Frankly they should be commended for
getting the fix committed in the first place, and if someone points out that
it should be committed to the 1.4 branch then fine; it shouldn't be hard to
do so even long after the commit to the main branch is made.  It just
requires someone to do so.

Again, this is NOT your typical CPAN module with one or two developers or a
project that relies on doing one thing very well.  This project has over 100
developers and is supposed to do everything adequately (and many things very
well). 

> After a number of these fixes has accumulated, then making a
> new release shouldn't represent too much effort, nor it
> should be expected that the tests that passed before would
> break now. And in the worst case (no tarball release),
> people can be directed to obtain the most current 'stable'
> code from the repository, containing all bugfixes.

You can download a tarball from the latest CVS code at any time.  There is a
link for doing just that at the bottom of the anonymous CVS page:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/

Chris


From hlapp at gmx.net  Mon Jun 26 15:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 26 Jun 2006 11:30:05 -0400
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
References: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>


On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:

> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests?

No, not really.

> Tests for every possible (documented)
> way of achieving a result with a module's method?

Ideally that's the minimum.

> Tests for every conceivable way of misusing a method?

If some or known already (from reports) or you think can be  
anticipated, yes. Generally, if a method documents what are invalid  
values for its input it's a good idea to test what the method does if  
supplied with such values. The one thing it shouldn't do is silently  
ignore them, or produce a result anyway (which presumably would be a  
wrong result by definition).

>
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would  
> this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?

It would certainly be beneficial. It'd be great if you were willing  
to volunteer for this.

Note that a module being bug free now doesn't mean it always will be.  
The main point of tests is not only to weed out bugs at the time it  
is written, but also to make sure that future changes to the module  
itself, or to other modules it interacts with or inherits from, don't  
break it.

	-hilmar

>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Mon Jun 26 15:39:25 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:39:25 +0100
Subject: [Bioperl-l] Tests
In-Reply-To: <AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
References: <449FF2E7.3040101@sendu.me.uk>
	<AD92E6BE-BD4C-4055-9FDD-48E55F9CF031@gmx.net>
Message-ID: <449FFFAD.40506@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote:
>
>> If I come across a test for a module that doesn't test for everything
>> the module can do, should I add tests as a matter of course? Would this
>> be beneficial, or a waste of time (given that the module probably is
>> bug-free already)?
> 
> It would certainly be beneficial. It'd be great if you were willing to 
> volunteer for this.

I doubt I have time to do this on the global scale[*], but certainly I 
will for the modules I work on.


Cheers,
Sendu.

* Though... it would certainly be a good way of getting to know all of 
Bioperl intimately!


From bix at sendu.me.uk  Mon Jun 26 15:42:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 16:42:33 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00069.6010107@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).
> 
> The general tidy up will involve redoing and perhaps even removing 
> things.

Does anyone know what the intent behind the split Bio::Map::MappableI 
and Bio::Map::MarkerI was? I somehow get the impression these started as 
one interface but then became two. The split /seems/ to be MappableI as 
a map element with one position on one map, whilst MarkerI is a map 
element with multiple positions on multiple maps. But MarkerI has no 
synopsis or description, and MappableI says it does what MarkerI does 
(but doesn't). So I'm left guessing atm.

Do we want to keep the split? If yes, what exactly should be the 
difference between the two? If no, would it be ok to just get rid of 
MarkerI (folding it back into MappableI)?


From cjfields at uiuc.edu  Mon Jun 26 15:45:51 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 10:45:51 -0500
Subject: [Bioperl-l] Tests
In-Reply-To: <449FF2E7.3040101@sendu.me.uk>
Message-ID: <001a01c69937$9a1c1320$15327e82@pyrimidine>

My opinion: tests should cover methods and expected results and are based on
what the module actually accomplishes.  Some classes (like SeqIO, SearchIO)
are normally relatively easy to build tests for b/c the expected results are
in the file being parsed.  Tests which check calculated results from modules
(Bio::Align::DNAStatictics for instance) I would think are trickier since
you should confirm the calculations are correct through independent means.

Links:

http://www.bioperl.org/wiki/Advanced_BioPerl#Designing_Good_Tests

http://search.cpan.org/~mschwern/Test-Simple-0.62/lib/Test/Tutorial.pod

The link above uses Test::Simple or Test::More; we use Test (but have
considered moving to Test::More using Devel::Cover).

My 2c

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, June 26, 2006 9:45 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Tests
> 
> What level of testing is expected to be done in a test file? Is there
> such a thing as too many tests? Tests for every possible (documented)
> way of achieving a result with a module's method? Tests for every
> conceivable way of misusing a method?
> 
> If I come across a test for a module that doesn't test for everything
> the module can do, should I add tests as a matter of course? Would this
> be beneficial, or a waste of time (given that the module probably is
> bug-free already)?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Mon Jun 26 16:15:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Jun 2006 17:15:32 +0100
Subject: [Bioperl-l] Bio::Map changes
In-Reply-To: <449A9AF9.2000305@sendu.me.uk>
References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk>
Message-ID: <44A00824.20002@sendu.me.uk>

Sendu Bala wrote:
> The reimplementation will make Position central to the model, allowing 
> for lots of other things to work properly without anything becoming 
> inconsistent (as is currently the case).

To do this I actually need to make some slightly more significant API 
changes than I had hoped. To make Position central, all maps, mappables 
and markers need to be able to add and remove Positions (and similar 
things). As I see it, we can say that such methods are fundamental to 
the coordination required between Bio::Map modules. I feel that I'm 
therefore justified in implementing these kinds of methods in the 
interfaces (which would allow all the downstream modules that implement 
those interfaces to work in the new system without much/any alteration).

Am I justified? Should I try harder to do it without implementations in 
the interfaces?


From pmiguel at purdue.edu  Mon Jun 26 16:53:56 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Mon, 26 Jun 2006 12:53:56 -0400
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <001301c69934$909f83c0$15327e82@pyrimidine>
References: <001301c69934$909f83c0$15327e82@pyrimidine>
Message-ID: <44A01124.5040102@purdue.edu>

Chris Fields wrote:
> ...
>   
>> However, when view as an issue related to the setting of a
>> policy for the whole project, then it makes sense to have a
>> policy saying for how long a stable release will be
>> supported, and when and in which case bugfixes that are committed
>> to and tested in the development branch (as it should be)
>> will get merged back to stable.
>>     
>
> In a project this large which relies on a lot of outside resources
> maintaining API and availability at all times, having a completely bug-free
> fix for any reasonable length of time is impossible.  As a small example,
> almost every time NCBI changes BLAST output, it breaks our text parsers, and
> though we recommend using the BLAST XML format parser (which is much more
> stable), almost everybody continues using text parsing and wants that fixed.
> Now, NCBI routinely changes their BLAST version about every 3-6 months w/o
> notification, so remote BLAST parsing can break at any time.  Fold into that
> any software changes that change output or API (PAML comes to mind).  Fold
> into that remote database changes (EBI interface to Swissprot).  Oh, let's
> not forget sequence format changes (recent SwissProt and GenBank changes).
> And, worst of all, we can't expect them to maintain API or output b/c
> they're updating based on user input/suggestions or bug fixes which require
> them to make changes.  What's 'stable' about that?
>
> It's very easy to say you want something and then not volunteer to do it; if
> you want something then put forth the time and effort to get it done.  Put
> your money where your mouth is (as they say in my home state).
>
> Again (for the third or fourth time now), putting together a release takes
> some time and effort.  I actually think it takes more effort than Hilmar
> suggests; either way, it requires someone to act as the leader (release
> pumpkin) to handle changes, and I don't see anybody stepping forward.
> Personally, if I have the time, maybe I'll handle an interim release, but
> I'm looking for a job starting in the fall as well as finishing up research
> for publication so that will take up almost all the time I have.  As Hilmar
> says, if you want to do it, fine.  Realize, though, many many changes have
> been made since 1.4 and many more will likely be made on the road to 1.6
>
>   
Hi Chris et al.,

    I was just reporting the situation from where I sit. I think this 
issue was important enough to bring to everyones attention. I've done so 
and I'm more than satisfied with the response. I hope my emails were not 
too abrasive.
    I've have now read the wiki about coordinating a release. You are 
right, that does sound hard. At least to me--I've never even used CVS, 
nor contributed a module to CPAN. I just don't see myself as being 
qualified to coordinate a 1.4.1 release. So since I'm not, for that 
reason, able to volunteer to do it myself, I'll withdraw my request for 
a new release to CPAN.
    That being said, I think Fernan's suggestion bears keeping in mind 
once 1.6 has been released and bug fixes are being committed. By that 
time, I hope I'll be savvy enough to help out in the process.
    Thanks for your attention,

Phillip SanMiguel
   

From fernan at iib.unsam.edu.ar  Mon Jun 26 19:24:51 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Mon, 26 Jun 2006 16:24:51 -0300
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
References: <449D9464.6030508@purdue.edu>
	<87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu>
	<449EDDB6.8020401@purdue.edu>
	<6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net>
	<20060626124730.GA53298@iib.unsam.edu.ar>
	<ABB7EDCA-AD47-4D57-BDA6-523852F9CDD6@gmx.net>
Message-ID: <20060626192451.GB53298@iib.unsam.edu.ar>

+----[ Hilmar Lapp <hlapp at gmx.net> (26.Jun.2006 11:01):
| 
| On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote:
| 
| >I'm not knowledgeable enough about the bioperl release
| >engineering process, nor about the internal development
| >process, but just guessing I'd expect that whenever anyone
| >submits a bugfix, it should be the responsibility of
| >the committer to check (against the project policy,
| >(written or implicit) or with the core developers in a
| >difficult case) whether the fix should be committed to more
| >than one branch.
| >
| >A patch like the one that started this thread, should have
| >been committed to the 1.4 branch without too much thinking.
| >And it would have cost the committer only a few seconds more
| >of her/his time.
| 
| Sure. But for some reason he or she forgot. So what do you suggest we  
| do - and I mean as a community, because this is a community project.  
| Come after the guy until he commits it to the branch? 

No, I never said or implied that.

| Or post an email to the list saying what you think is the
| right way and then do  it (yourself)?

Of course I could volunteer some of my time to
do that (that is, go over the commit history and see what
changes could be merged back to 1.4, if that seems to be
useful), provided I get a polite reply to my 'email
to the list saying what [I] think is the right way'.

I'm a volunteer in other open source, community projects,
and I do contribute regularly so I see no problem except the
obvious scarcity of free time in doing the same for bioperl.

| >But you only get this by setting and enforcing a policy.
| 
| Man, this is not a company. Take a step back and think again. What do  
| you suggest we - again we as a community - do to enforce a policy?  
| Take increasing levels of disciplinary action if someone keeps  
| forgetting to commit to the branch?

Seems like you were pissed off by what I said ...

What I was just trying to say is that merely by formulating
and communicating a policy you could be taking steps towards
making it a reality. Maybe 'enforcing' was an unfortunate
word to use here ... 

You don't have to punish anyone, just sending a polite email
to the list reminding people about the policy once in a
while, should be enough. It's OK if some committer doesn't
care, or just forgets about doing the right thing once in a
while ...

But of course, you might be pissed off by me talking about
something that I know nothing about (the devleopment of
bioperl), given that I'm just a bioperl user.

Perhaps my mistake was to bring here ideas from
other projects (in which I do contribute regularly) without
realizing that, not being a contributor, I could be
punished for suggesting how things could be done better.

| While there are clearly some rules everybody needs to follow and if  
| you violate them deliberately and repeatedly you will get your CVS  
| privileges withdrawn, by and large we as a community need to accept  
| some responsibility for making the project what we think it should be  
| - and do so not by invoking disciplinary action but by living by  
| example and by taking action yourself when you think action is due.

I completely agree. When I said 'setting a policy' I just
meant something along the lines of clearly stating what are
those 'rules everybody needs to follow'. My suggestion was
to add a 'merge trivial fixes back to stable' rule to that
list.

I agree with Jason: why is that such a big deal. 

| If Bioperl were a company and you asked for a 1.4.1 release and the  
| customer service rep told you nope there's a 1.5.1 that you should  
| use instead and that will do just fine, what will you do? Argue with  
| him about the company policies and whether they are properly enforced  
| or not?
| 
| Obviously doing so will be a waste of your time. In Bioperl it is at  
| the bottom of it no less waste of your time, because instead you now  
| have the opportunity to make happen what you believe needs to happen.

Right, but first i have to realize what needs to happen. I
realized it when I read your reply to Philips message.

I then proceeded to write my thoughts and send them to the
list, to see what kind of feedback I get. 

Hopefully, someone with commit privileges would think that
what I said makes sense and just proceed to doing it (saving
me from the task :)

Or perhaps, someone, as Jason did, would say that it's
not worth to try to merge back things to 1.4 and move
forward instead. In his message he even explained what the
problems and needs are (lack of man-time, need for
volunteers) and politely asked for help.

| We have had a history of rapidly and un-bureaucratically putting  
| people in power of what they wanted to do. We have also had a history  
| of not listening much to people who don't want to put their feet  
| where their mouth is.

I would call your reply (this message) a barrier of entry
for new developers. In the above paragraph I guess you are
referring to the bioperl motto: 'whoever codes it wins'.
That is true in any open source project. But at least to me,
that doesn't say that you should not listen to people just
because they haven't contributed a single line of code.

| I'm sorry if what I'm saying puts people off, but really this is an  
| open-source project and if you ask me it's one with the least  
| barriers of entry for new developers or 'activists' that you can find  
| in the open source arena. 


Let me disagree. The barriers of entry are not just the
giving away of a developer accounts and/or repository write
privileges. 

I'm a regular contributor in another open source, community
project (FreeBSD) that has more and higher barriers of entry
with respect to giving away privileges (for example for
committing changes to the repository). Nonetheless FreeBSD
has historically shown to have few and low barriers of entry
for incorporating people to the project (without the need to
give away commit privileges, making them responsible for
parts of the FreeBSD source code/documentation/ports/etc).

IMO, that comes from a very good communication of the
direction of the project, what needs to be done, how to do
it, and a tendency of privileged and older members to listen
to people's suggestions, inviting and helping people
to jump the fence and become part of the project. It's not
an untought occurrence that FreeBSD has ?mentors? that
introduce new members, help them to get acquainted with how
the project works, policies, etc. and supervise their
actions.

| This doesn't come without some degree of  
| anarchy, but really IMHO that's more of an advantage than a  
| disadvantage.
| 	-hilmar
|
+----]

Fernan

PS: finally, let me just add that english is not my native
language. Although I'm quite familiar with it, once in a
while, an unfortunate choice of words might blur my intented
meaning or the strength I wanted to convey. In case that has
been the case, let me put clearly that it has not been my
intention to criticize the way the project does things, but
to suggest ideas for the future (merge back trivial changes
to a 'stable' branch as a policy) based on my experience
with other projects. Whether that fits bioperl or not was
what I would have expected as a reply.


From cjfields at uiuc.edu  Mon Jun 26 20:18:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Jun 2006 15:18:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060626192451.GB53298@iib.unsam.edu.ar>
Message-ID: <002701c6995d$b738f790$15327e82@pyrimidine>


> | >A patch like the one that started this thread, should have
> | >been committed to the 1.4 branch without too much thinking.
> | >And it would have cost the committer only a few seconds more
> | >of her/his time.
> |
> | Sure. But for some reason he or she forgot. So what do you suggest we
> | do - and I mean as a community, because this is a community project.
> | Come after the guy until he commits it to the branch?
> 
> No, I never said or implied that.

Right, you didn't say that.  But you didn't clarify your statements either.
I think you're treading into dangerous waters when you come in and criticize
something w/o bothering to read up on how things have been done here.  As
you say yourself below, it's 'something that I know nothing about (the
devleopment of bioperl), given that I'm just a bioperl user'.  It's akin to
"I don't think you're coding things correctly, here's the right way to do
it" w/o knowing what the code is used for.

> | Or post an email to the list saying what you think is the
> | right way and then do  it (yourself)?
> 
> Of course I could volunteer some of my time to
> do that (that is, go over the commit history and see what
> changes could be merged back to 1.4, if that seems to be
> useful), provided I get a polite reply to my 'email
> to the list saying what [I] think is the right way'.

You will get a polite email when you respond politely.  I actually agree
with many things you say, but you sure aren't making any friends here by the
way you consistently take the opposite stance and judge what other people
do.  I think you have a point about having a stable release be supported for
a period of time.  My point is, how long?  We didn't really get an idea of
that from you, did we?

> I'm a volunteer in other open source, community projects,
> and I do contribute regularly so I see no problem except the
> obvious scarcity of free time in doing the same for bioperl.

And others here also volunteer elsewhere (GMOD, DAS, Ensembl, etc).  Don't
presume we don't have experience in open-source.  That's being pretty
judgmental.  

> | >But you only get this by setting and enforcing a policy.
> |
> | Man, this is not a company. Take a step back and think again. What do
> | you suggest we - again we as a community - do to enforce a policy?
> | Take increasing levels of disciplinary action if someone keeps
> | forgetting to commit to the branch?
> 
> Seems like you were pissed off by what I said ...

????Ya think????  

You know, okay, forget it.  This is completely non-productive.  We'll all
agree to disagree, argue, whatever.  The points made here, as I see them:

1)  Commits should be made to stable releases (as well as to the main branch
in CVS) to fix bugs as long as that release is supported.  I agree with
this, but someone has to volunteer, and the length of time a release is
supported also worked out.  Almost would be better going to a regular
release schedule (once every 3-6 months or so) where the code is given as is
to CPAN, whether it passes tests or not.

2)  More communication about the direction Bioperl is heading; personally I
haven't see a problem with this as much as there is no information about a
roadmap.  That is being alleviated soon I believe, thought people out there
need to be patient.

3)  Volunteer.  If you have something you believe needs to be done and you
believe so fervently, then put up or shut up.  Make (nice polite)
suggestions otherwise.  Don't judge code or "the way things are done" and
don't presume what kind of experience people have that you don't know and
haven't met.  End of story.

Chris


From torsten.seemann at infotech.monash.edu.au  Tue Jun 27 02:57:47 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 27 Jun 2006 12:57:47 +1000
Subject: [Bioperl-l] Comments on new PDOC documentation
Message-ID: <44A09EAB.2030401@infotech.monash.edu.au>

Hello all,

I am very happy to see the PDOC software has been improved, as I use the 
  online web documentation frequently. Thanks to Jason, Raphael and 
Patrick for making this happen.

http://doc.bioperl.org/bioperl-live/

Now for some comments...

1. CSS

It uses CSS which is excellent, reducing HTML size and allowing easy 
tweaks to the design. However its current implementation has some issues:

A. it seems to only use ID, rather than CLASS, to specify styles.
    ID values must be unique in a page, and are for one-off styles.
    CLASS may be re-used throught a page. eg "sub" and "subArea".
    Many browsers do not enforce this however...

B. it seems to be doing unusual, but possibly deliberate, things with
    the POD when determining what CSS ID to give it, but perhaps this is
    more to do with how Bioperl formats the POD on some subheadings
    eg.
    <a name="_pod_Reporting Bugs" id="_pod_Reporting Bugs">
    <a name="_pod_AUTHOR - Ewan Birney" id="_pod_AUTHOR - Ewan Birney">

C. the "Description" sections etc are in a proportional font, but
    I think it should be "font-family: monospace" as many authors have
    exploited the traditional monospace of most editors to format
    their comments, which are now lost

2. FRAMES

I notice it still uses HTML Frames. Although this reduces code size 
also, it makes it impossible to LINK directly to a specific 
documentation page with all the frames intact. It may be better to use 3 
DIV elements which are part of each page, and they could be server-side 
included so there is no HTML duplication.

3. MERGING OF BIOPERL DOCS

One facet of the docs I find frustrating is that bioperl-live and 
bioperl-run (and the others) are separate! This means that you have to 
keep switching between them, and more importantly, class-names to 
classes in other packages are not present; this is particularly bad when 
browsing bioperl-run.

Is there any chance of creating a "merged" bioperl-doc page somehow?

4. STYLE

Choice of colours and layouts is such a personal thing.
I guess people can download http://doc.bioperl.org/css/perl.css
and re-edit it, and get their Browser to over-ride the supplied CSS with 
  their version.

5. CONCLUSION

Please don't get the wrong idea, I love the new PDOC, I would just like 
to love it more. And yes I understand the nightmare that is parsing 
Perl/POD and generating compatible CSS :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From bix at sendu.me.uk  Tue Jun 27 10:21:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 11:21:57 +0100
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <44A106C5.9040706@sendu.me.uk>

Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
Is the idea of a Bio::Score of interest? See bug, but basically an 
object that can handle multiple kinds of scores effectively.

I would like to use such a thing in Bioperl, but what standard needs to 
be met before Bioperl gets a new kind of object?


From hlapp at gmx.net  Tue Jun 27 12:24:16 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 08:24:16 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A106C5.9040706@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
Message-ID: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>

So you basically want to attach semantic information to a number, and  
type the number thereby?

If so, an ontology would be the more natural choice (and in the end  
more flexible one) for expressing this kind of information.

Have you looked at the concept of 'quantitation types', e.g. in MAGE  
(the XML [MGAE-ML] or the object model [MAGE-OM])?

There is no quantitation type ontology at a repository I know of. I  
have used my own ones in the past and they have been pretty useful.

	-hilmar

On Jun 27, 2006, at 6:21 AM, Sendu Bala wrote:

> Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
> Is the idea of a Bio::Score of interest? See bug, but basically an
> object that can handle multiple kinds of scores effectively.
>
> I would like to use such a thing in Bioperl, but what standard  
> needs to
> be met before Bioperl gets a new kind of object?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 12:52:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 13:52:05 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A129F5.3030500@sendu.me.uk>

Hilmar Lapp wrote:
> So you basically want to attach semantic information to a number, and 
> type the number thereby?

Basically, I want to be able to stick a bunch of (different kinds of) 
numbers into an object, and later get the 'best' one out (of a 
particular kind), or sort multiple of those objects.


> If so, an ontology would be the more natural choice (and in the end more 
> flexible one) for expressing this kind of information.

I'm not really sure I understand 'and type the number', or what (useful) 
flexibility doing it with an ontology would provide.


> Have you looked at the concept of 'quantitation types', e.g. in MAGE 
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

I had a quick look, but not really sure what you intended to suggest here.


> There is no quantitation type ontology at a repository I know of. I have 
> used my own ones in the past and they have been pretty useful.

Can you provide a brief example of what you mean?

If it would be appropriate to implement a Bio::Score with an ontology 
that's fine. Would we want a Bio::Score implemented though? Or are you 
suggesting each module make it's own quantitation type ontology when it 
wants to deal with numerous scores?

I like the idea of a Bio::Score because then you can compare complex 
scores from multiple different unrelated modules.


From cjfields at uiuc.edu  Tue Jun 27 14:08:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 09:08:57 -0500
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
Message-ID: <001e01c699f3$3b6cda50$15327e82@pyrimidine>

> Hilmar Lapp wrote:
> > So you basically want to attach semantic information to a number, and
> > type the number thereby?
> 
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.

The 'best one' might be tricky when dealing with different kinds of scores,
esp. scores calculated different ways.  For instance, I run RNA motif
programs quite frequently (RNAMotif, ERPIN, Infernal), but all generate
'scores' based on different criteria (algorithms, different parameters, how
the author slept, and so on).  RNAMotif in particular is hard to deal with
(though a great program) b/c the scores are based on criteria in the
descriptor file (the file used to describe the motif), so aren't comparable
to other descriptors, which may have their own method of generating scores,
let alone output from other programs.  Which one would be 'the best?'  It's
a bit subjective since the scores are predictive based upon your input,
various program limitations, specific program parameter implementations,
etc.  

I do like the idea of grouping together scores for comparison, such as when
a particular region of DNA has multiple hits from different programs with
different scores.  It would at least suffice as a test on how various
programs or experimental data would compare with one another.

> > If so, an ontology would be the more natural choice (and in the end more
> > flexible one) for expressing this kind of information.
> 
> I'm not really sure I understand 'and type the number', or what (useful)
> flexibility doing it with an ontology would provide.

I'm not sure, but maybe something along the lines of what the number (the
score) actually means, especially when compared to other scores.  In other
words, how you could compare one score or number versus the other.  An
ontology would allow more complex information to be included along with the
score information so one could make more informed choices based on how the
score was obtained, the algorithm used, the program involved, etc.  Hence
flexible.  Is that close, Hilmar?

To use my RNA program example above, I could include the information about
how the scores were obtained, the programs involved, parameters used, the
various raw scores, the time it took to run the program, etc. (i.e. you
could make it as specific as you wanted).  This could also be extended to
other data types as well besides program, such as wet bench experimental
data and so on, which I deal with quite a bit.  I think there are a few XML
specs out there besides MAGE that do this as well but I can't think of any
off the top of my head.

> > Have you looked at the concept of 'quantitation types', e.g. in MAGE
> > (the XML [MGAE-ML] or the object model [MAGE-OM])?
> 
> I had a quick look, but not really sure what you intended to suggest here.

I think the idea is that MAGE, strictly as an example, deals with microarray
data from different sources or different data systems for comparison.
Sounds a little like what you want to do.

> > There is no quantitation type ontology at a repository I know of. I have
> > used my own ones in the past and they have been pretty useful.
> 
> Can you provide a brief example of what you mean?
> 
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology when it
> wants to deal with numerous scores?
> 
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.

Which is what MAGE does in a way, but more specifically, i.e. just
microarray data from different sources.  So the array data may be calculated
in different ways based upon the specs for different machines, the way array
slides were prepared, how the experimenter slept, etc.

Chris


From hlapp at gmx.net  Tue Jun 27 14:27:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 10:27:55 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A129F5.3030500@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
Message-ID: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>

I would have suggested initiating a quantitation type ontology, not  
one individual per module.

An ontology would capture all your semantic information (min/max or  
range, higher or lower is better, what is a reasonable default [not  
sure there would be one], etc) and you would have a hierarchical  
structure.

You type a score by associating it with an ontology term:

	BLAST_e-value is-a expectation_value
	expectation_value has-min-value 0
	expectation_value has-max-value positive_infinity
	BLAST_p-value is-a probability_value
	probability_value has-min-value 0
	probability_value has-max-value 1
	
etc and then something being an expectation_value for instance would  
imply several attributes laid down in the ontology (probably through  
has-a statements).

It seems to me that essentially what you are trying to do is  
capturing knowledge for particular types of scores, which you would  
then use in more general purpose programs to sort from more to less  
significant, and possibly filter? If so, then hard-coding this into  
objects (all over the place or in a single place) is typically not  
the best practice; rather, the usual best-practice approach is using  
(and if necessary, constructing) an ontology. This is also the most  
re-usable approach.

	-hilmar

On Jun 27, 2006, at 8:52 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> So you basically want to attach semantic information to a number, and
>> type the number thereby?
>
> Basically, I want to be able to stick a bunch of (different kinds of)
> numbers into an object, and later get the 'best' one out (of a
> particular kind), or sort multiple of those objects.
>
>
>> If so, an ontology would be the more natural choice (and in the  
>> end more
>> flexible one) for expressing this kind of information.
>
> I'm not really sure I understand 'and type the number', or what  
> (useful)
> flexibility doing it with an ontology would provide.
>
>
>> Have you looked at the concept of 'quantitation types', e.g. in MAGE
>> (the XML [MGAE-ML] or the object model [MAGE-OM])?
>
> I had a quick look, but not really sure what you intended to  
> suggest here.
>
>
>> There is no quantitation type ontology at a repository I know of.  
>> I have
>> used my own ones in the past and they have been pretty useful.
>
> Can you provide a brief example of what you mean?
>
> If it would be appropriate to implement a Bio::Score with an ontology
> that's fine. Would we want a Bio::Score implemented though? Or are you
> suggesting each module make it's own quantitation type ontology  
> when it
> wants to deal with numerous scores?
>
> I like the idea of a Bio::Score because then you can compare complex
> scores from multiple different unrelated modules.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Jun 27 15:25:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 16:25:06 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
Message-ID: <44A14DD2.7000402@sendu.me.uk>

Hilmar Lapp wrote:
> I would have suggested initiating a quantitation type ontology, not one 
> individual per module.

Where would such a thing 'live'? Would it be some static file somewhere 
that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology 
that can added to by a module when it needs extra terms to describe its 
particular kind of scores?


> An ontology would capture all your semantic information [snip]

Thanks, I agree that an ontology would be the way to do it...


> It seems to me that essentially what you are trying to do is capturing 
> knowledge for particular types of scores, which you would then use in 
> more general purpose programs to sort from more to less significant, and 
> possibly filter?

Yes.


> If so, then hard-coding this into objects (all over the 
> place or in a single place) is typically not the best practice; rather, 
> the usual best-practice approach is using (and if necessary, 
> constructing) an ontology. This is also the most re-usable approach.

Not having any experience with ontolgies, I can't think how this would 
all be done in practice though. Don't we need some central module 
(Bio::Score) to create the ontology (or read it in) and then present 
some suitable interface to it? For example, modules that wanted to store 
some scores might just ask Bio::Score for the ontology and type their 
scores by associating with an available ontology term, creating new 
terms if necessary (or is that something you would never do; the 
ontology needed to have been set up to cover all possible terms?). Then 
when the user has a bunch of these typed scores, surely he doesn't want 
to deal with going through the ontology himself to work out what it all 
means? Well, he could if he needs that level of control, but also he 
just wants to say Bio::Score->sort(x y z) or something.


From bix at sendu.me.uk  Tue Jun 27 16:13:46 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 17:13:46 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>
Message-ID: <44A1593A.809@sendu.me.uk>

Cook, Malcolm wrote:
>
> All this semantic cruft is overkill for a moving target and will never
> settle down until your analysis results are no longer relevant.

I'm not sure what you mean by that. What moves? An evalue will always be 
an evalue. Once you know that you are in fact dealing with an evalue, 
and once your sorting algorithm knows that lower evalues are better, 
nothing changes. Likewise for other kinds of scores.

Instead of having to discover that a particular program is giving you an 
evalue, and then writing code to deal with an evalue appropriately, I 
thought it would be nicer to have a single module that knew how to deal 
with it already.


From MEC at stowers-institute.org  Tue Jun 27 16:01:45 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 27 Jun 2006 11:01:45 -0500
Subject: [Bioperl-l] Bio::Score of interest?
Message-ID: <CED81D34E37D5043A1211565277A51E50563FD40@exchkc02.stowers-institute.org>

For the use case of TFBS analysis demonstrated in the attachment to the
bug, I would expect to find potentially three scores, ala, {evalue,
bitscore, and percentmatch}.  To deal with this in existing framework
(i.e. GFF/bioperl analysis modules/TFBS), I would try to make GFFx eat
scalars as scores and pack the three values into a string and unpack
them as needed for sorting, etc.  Else put the one score I know I'm
going to 'use' in a particular analysis into 'score' and adorn column 9
with the rest.

All this semantic cruft is overkill for a moving target and will never
settle down until your analysis results are no longer relevant.

my $.02

--Malcolm


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>Sent: Tuesday, June 27, 2006 5:22 AM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::Score of interest?
>
>Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033
>Is the idea of a Bio::Score of interest? See bug, but basically an 
>object that can handle multiple kinds of scores effectively.
>
>I would like to use such a thing in Bioperl, but what standard 
>needs to 
>be met before Bioperl gets a new kind of object?
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bix at sendu.me.uk  Tue Jun 27 18:07:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Jun 2006 19:07:44 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
References: <001e01c699f3$3b6cda50$15327e82@pyrimidine>
Message-ID: <44A173F0.4040302@sendu.me.uk>

Chris Fields wrote:
>> Hilmar Lapp wrote:
>>> So you basically want to attach semantic information to a number, and
>>> type the number thereby?
>> Basically, I want to be able to stick a bunch of (different kinds of)
>> numbers into an object, and later get the 'best' one out (of a
>> particular kind), or sort multiple of those objects.
> 
> The 'best one' might be tricky when dealing with different kinds of scores,
> esp. scores calculated different ways.

I didn't make myself very clear, but you don't compare different kinds 
of scores. When you want to compare two different Score objects, each of 
which may contain multiple different kinds of scores, you pick the kind 
of score you're interested in, and for that kind of score ask which 
object has the 'best' score. I can't readily think of any exceptions to 
the rule that 'best' is either the higher score or the lower score, 
depending on what kind of score you've chosen.

I may not have made myself clear in another way. One of the ideas behind 
a Bio::Score is to have a container object for multiple different kinds 
of scores (and even multiple values per kind) all generated by one 
program in one analysis on one data set.
The container then lets you pick the kind of score you want to work with 
and compare its scores with those in other Bio::Score objects that 
contain the same kind of score (most probably, ones made by the same 
analysis program but on different data sets).

Furthermore, the kind of score you want to work with could have multiple 
values from that single analysis. So the container also lets you 
summarise these values (eg. average them) before trying to compare with 
another Score object. Often, it may be that for a certain kind of score 
it makes sense (it is intended by the score-generating program) to 
always summarise the values in a certain way. So the container needs to 
know about that and 'do the right thing' so the user can just compare 
things without having to trouble himself.

So this is why I feel that to just 'use an ontology' isn't enough. 
Certainly one ought to be used when defining the kinds, but you need 
some single interface with useful methods that lets you deal with the 
actual score values easily.


From cjfields at uiuc.edu  Tue Jun 27 18:56:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Jun 2006 13:56:40 -0500
Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN?
In-Reply-To: <20060627181439.GD51742@iib.unsam.edu.ar>
Message-ID: <000a01c69a1b$6d0338c0$15327e82@pyrimidine>

> | 1)  Commits should be made to stable releases (as well as to the main
> branch
> | in CVS) to fix bugs as long as that release is supported.  I agree with
> | this, but someone has to volunteer, and the length of time a release is
> | supported also worked out.
> 
> I volunteer to do that (merge approved changes/fixes back to
> a stable branch), though as said by others, 1.4 may not be
> the most appropriate 'stable' branch, as too many changes
> have accumulated, and maybe it's not worth it. But I could
> do that for the next 'stable' release, 1.6 or 2.0 whichever
> comes next.
> 
> As per the length of time, I would say that a stable release
> should be supported at least until another 'stable' release
> is made. Or until it's no longer being used in production
> setups, which is only feasible to know in small
> communities.

I'm posting this to the mail list so that others can respond.

Kevin Brown (in a response to me) made some good points about updating and
maintaining stable releases in that only bug fixes are committed (i.e. no
refactoring, no new modules or features).  I personally wouldn't have a
problem in someone doing this, releasing periodic updates to stable or
developer releases to fix bugs only but I may be in the minority here.  The
rest of the core guys and others need to also speak their thoughts.  I hate
forwarding this to Jason since he's in the middle of getting ready for a
move but I think this is important enough to do so.

I can say that I am unequivocally against updating 1.4.  Too much has
changed since then and I think it would be a mess trying to figure out what
bug fixes to include, etc.  

I also am very much against placing developer's releases in CPAN; those
releases are not intended to be completely stable as they may be
implementing new features that haven't been tested completely and may
contain various other bugs.  v 1.5.1 is remarkably stable for a developer's
release but several bug fixes have been made since.  If someone wants to try
out the developer's versions or bioperl-live they are most welcome to it;
the web site docs give all the instructions one needs to install from pretty
much any platform.

Beyond that, I'm spent on this thread.

Chris 


From lstein at cshl.edu  Tue Jun 27 22:35:08 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 27 Jun 2006 18:35:08 -0400
Subject: [Bioperl-l] Output a subset of FASTA data from a single large
	file
In-Reply-To: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
References: <KOEGJJHCCKFLICFGFLKDEENACJAA.oldham@ucla.edu>
Message-ID: <200606271835.09558.lstein@cshl.edu>

Hi All,

This is rather late, but just for future reference on the mailing list,  here 
is how I would do the task using Bio::DB::Fasta.

Script 1: index the file for future use:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	
	my $filename = shift;  # name of file to index on command line
	Bio::DB::Fasta->new($filename,-makeid=>\&make_my_id)
		or die "Indexing failed";
	print "Indexing succeeded!\n";
	exit 0;

	sub make_my_id {
		my $description_line = shift;
		$description_line =~ /(\d+_at)/ or die "malformed description line";
		return $1;
	}

Run this script once to create a reusable index of the file. The index will be 
stored in the same directory as the FASTA file.

Script 2: extract the sequences using the IDs stored in a second file:

	#!/usr/bin/perl
	use strict;
	use Bio::DB::Fasta;
	use Bio::SeqIO;
	use IO::File;

	my $indexed_fasta_file = shift;
	my $probe_id_file         = shift;

	# open up the indexed fasta file
	my $db = Bio::DB::Fasta->new($indexed_fasta_file) or die;
	# open up a FASTA writer
	my $out = Bio::SeqIO->new(-format=>'Fasta',-fh=>\*STDOUT) or die;
	# open the probe id file
	my $in   = IO::File->new($probe_id_file) or die;

	# do the work
	while (my $id = <$in>) {
		chomp $id;
		my $seq = $db->get_Seq_by_id($id) or die;
		$out->write_seq($seq);
	}

	exit 0;

Bio::Index::Fasta will work in almost exactly the same way. The only 
difference is that the Bio::DB::Fasta will allow you to retrieve subsequences 
efficiently.

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From awitney at sgul.ac.uk  Tue Jun 27 14:08:20 2006
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 27 Jun 2006 15:08:20 +0100
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
Message-ID: <44A13BD4.60802@sgul.ac.uk>


> Have you looked at the concept of 'quantitation types', e.g. in MAGE  
> (the XML [MGAE-ML] or the object model [MAGE-OM])?

the MGED Ontology has a concept of quantitation type if that helps

http://mged.sourceforge.net/ontologies/MGEDontology.php#QuantitationType


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From william.hsiao at gmail.com  Tue Jun 27 19:52:03 2006
From: william.hsiao at gmail.com (William Hsiao)
Date: Tue, 27 Jun 2006 12:52:03 -0700
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
Message-ID: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>

Hi all,
   I've encountered a strange problem while parsing a gff file from
NCBI using perl.  I'm hoping that someone on the list may have a
solution even though this is not a bioperl issue.  Maybe someone
familiar with gff3 parsing can help :)  Essentially, I'm parsing a gff
file into a nested hash structure using the following functions:

sub parse_gff {
    my $file = shift;
    my %hash_gff;
    open (INFILE, $file) or die "Cannot find file $file\n";
    while(<INFILE>){
	next if (/^\#/);
	chomp;
	my ($seqid, $source, $type, $start, $end, $score, $strand, $phase,
$attributes) = split /\t/;
	my $attri_ref = &process_attributes($attributes);
	my %record = ('seqid'     => $seqid,
		      'source'    => $source,
		      'type'      => $type,
		      'start'     => $start,
		      'end'       => $end,
		      'score'     => $score,
		      'strand'    => $strand,
		      'phase'     => $phase,
		      'attribute' => $attri_ref);
	push @{$hash_gff{$type}}, \%record;
    }
    close INFILE;
    print Dumper %hash_gff;
    return \%hash_gff;
}

sub process_attributes {
    my $attr_string = shift;
    my @attributes = split (/\;/, $attr_string);
    my %attr;
    foreach (@attributes){
	my ($key, $value) = split /=/;
	if ($value=~/\:/){
	    my ($subkey, $subvalue) = split (/:/, $value);
	    $attr{$key}{$subkey}=$subvalue;
	}
	else{
	    $attr{$key}=$value;
	}
    }
    return \%attr;
}

   It works for all the gff files we downloaded from NCBI's microbial
genomes refseq ftp repository.  However, 3 lines from one particular
file NC_005966.gff (of Acinetobacter_sp_ADP1) can not be parsed
properly.  These lines are:

NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	start_codon	636487	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	stop_codon	635833	635835	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

   They generate an error: Can't use string
("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
 The strange part is that all I have to do is replace the word
"function" in front of "=adaptation%20to%20stress;" with another word
or simply change it to functions or functio or Function, etc, then the
line parses properly.  If I retype the word "function", it doesn't
solve the problem.  For some strange reason, when the word "function"
is there, perl tried to use "adaptation%20to%20stress" as the hash key
and failed.  The word "function" is used in other lines as well so I
don't think the problem is not caused by the word alone.
    Any suggestion on what might be happening would be greatly
appreciated.  Thank you.

Cheers,

Will

-- 
William Hsiao
PhD Student, Brinkman Laboratory
Department of Molecular Biology and Biochemistry
Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6
Phone: 604-291-4206 Fax: 604-291-5583


From bix at sendu.me.uk  Wed Jun 28 08:25:52 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 09:25:52 +0100
Subject: [Bioperl-l] strange error parsing a specific NCBI gff file
In-Reply-To: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
References: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com>
Message-ID: <44A23D10.1010308@sendu.me.uk>

William Hsiao wrote:
>
> sub process_attributes {
>     my $attr_string = shift;
>     my @attributes = split (/\;/, $attr_string);
>     my %attr;
>     foreach (@attributes){
> 	my ($key, $value) = split /=/;
> 	if ($value=~/\:/){
> 	    my ($subkey, $subvalue) = split (/:/, $value);
             # assign hashref to $key, assign key => value pair to that
> 	    $attr{$key}{$subkey}=$subvalue;
> 	}
> 	else{
             # assign scalar $key
> 	    $attr{$key}=$value;
> 	}
>     }
>     return \%attr;
> }

> NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

>    They generate an error: Can't use string
> ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
>  The strange part is that all I have to do is replace the word
> "function" in front of "=adaptation%20to%20stress;" with another word
> or simply change it to functions or functio or Function, etc, then the
> line parses properly.

The problem is that these lines contain function=x twice, where the 
second x contains a colon.
So your code first assigns $attr{function} = $scalar, and then tries to
do $attr{function}{before_colon} = "after_colon".

Normally the latter would auto-vivicate $attr{function} as a hash 
reference: $attr{function} == HASH(xyz) and then set before_colon => 
after_colon as a key value pair of HASH(xyz). But in this case, 
$attr{function} already exists: $attr{function} == 
"adaptation%20to%20stress". But you try and set before_colon => 
after_colon as a key value pair of that string. Which you can't do.

Basically, your data structure isn't so great, mixing scalars and hash 
references as values of %attr.

The solution may be to parse using Bioperl instead ;).


From selvik at ufl.edu  Tue Jun 27 12:54:48 2006
From: selvik at ufl.edu (Kadirvel, Selvi)
Date: Tue, 27 Jun 2006 08:54:48 -0400 (EDT)
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
Message-ID: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>

All,

(I am new to Bioinformatics and Bioperl, so please apologize if I 
get my terminology wrong)

I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
report consists of three sections namely;

1. A ranked list of the best scoring HMMs
2. A list of the best scoring domains in order of their occurrence 
in the sequence
3. Alignments for all the best scoring domains.

Section 3 can be truncated to a specific number using the ??A? 
option when building the report.

Though the Bio::SearchIO::hmmer module parses through the entire 
HMMER report (Section 1, 2 and 3), the set of values made 
available through Bio::Search::Result::ResultI seem to be using 
Section 3 alone. So when we use the ?A option to truncate, we lose 
otherwise useful information in Section 1. This information is 
lost (only) for those models that do not have any of their domains 
in the top ?A number of? best scoring domains. The fields that are 
not available are:

1.	Description of a model
2.	Score of a model
3.	Evalue of a model

If I use the older Bio::Tools::HMMER:Results module, NEITHER 
Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to 
retrieve the above listed values. Scores and Evalues are available 
for each domain but not for the model it belongs to.

I was wondering if there is any other method to access these 
values or do I have to write my own module to do this?

Any ideas/suggestions would be greatly appreciated.

Thank you!


Selvi Kadirvel

Graduate Research Assistant
High Performance Computing Center
University of Florida


From hlapp at gmx.net  Wed Jun 28 00:18:36 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 27 Jun 2006 20:18:36 -0400
Subject: [Bioperl-l] Bio::Score of interest?
In-Reply-To: <44A14DD2.7000402@sendu.me.uk>
References: <44A106C5.9040706@sendu.me.uk>
	<43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net>
	<44A129F5.3030500@sendu.me.uk>
	<46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net>
	<44A14DD2.7000402@sendu.me.uk>
Message-ID: <E4565670-479B-4247-A3CB-3DA998AF8456@gmx.net>


On Jun 27, 2006, at 11:25 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> I would have suggested initiating a quantitation type ontology,  
>> not one
>> individual per module.
>
> Where would such a thing 'live'? Would it be some static file  
> somewhere
> that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology
> that can added to by a module when it needs extra terms to describe  
> its
> particular kind of scores?

For instance, yes. Once you read in an ontology (through  
Bio::OntologyIO indeed) it sits essentially in memory.

> [...]
> Not having any experience with ontolgies, I can't think how this would
> all be done in practice though. Don't we need some central module
> (Bio::Score) to create the ontology (or read it in) and then present
> some suitable interface to it?

Possibly - the problem is how to get the ontology=typed term given an  
analysis program and attribute name (e.g. 'score' of a feature  
object). There is no method for doing this on a feature object and  
bolting one on would be a bad idea I think.

So, the Bio::Score would be a little hybrid between an objectified  
score value that now doesn't just have a numeric value but also a  
type term, and a factory for creating the ontology (e.g., by reading  
it in from a specified or default location). I.e., you'd have

	my $value = $score->value();
	my $type = $score->type();
	# $type is-a Bio::Ontology::TermI
	my $quant_ont = $type->ontology();
	
	# see what type of score we have
	my @ancestors = $quant_ont->get_ancestor_terms($type);
	if (grep {$_->name eq 'expectation_value'} @ancestors) {
		# it's an e-value
	} elsif ( ...test for some other type...) {
		# etc
	}


> For example, modules that wanted to store
> some scores might just ask Bio::Score for the ontology and type their
> scores by associating with an available ontology term, creating new
> terms if necessary (or is that something you would never do; the
> ontology needed to have been set up to cover all possible terms?).

Yes. You'd extend it as you encounter types that aren't in the  
ontology yet, until the ontology fully captures the knowledge domain.

> Then
> when the user has a bunch of these typed scores, surely he doesn't  
> want
> to deal with going through the ontology himself to work out what it  
> all
> means? Well, he could if he needs that level of control, but also he
> just wants to say Bio::Score->sort(x y z) or something.

See above for a quick example of the logic. I'd separate that into  
its own module, like Bio::Score::Utils.

	-hilmar

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Jun 28 14:29:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:29:17 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>

Selvi, 

Can you send me the report you are trying to parse as an attachment?  I'll
give it a look.

Judging by the pdoc this is mapped for the event handler so it should be
there.  From the %MAPPING hash:

                 'HMMER_program'   => 'RESULT-algorithm_name',
                 'HMMER_version'   => 'RESULT-algorithm_version',
                 'HMMER_query-def' => 'RESULT-query_name',
                 'HMMER_query-len' => 'RESULT-query_length',
                 'HMMER_query-acc' => 'RESULT-query_accession',
                 'HMMER_querydesc' => 'RESULT-query_description',
                 'HMMER_hmm'       => 'RESULT-hmm_name',                 
                 'HMMER_seqfile'   => 'RESULT-sequence_file',
	           'HMMER_db'        => 'RESULT-database_name',

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> Sent: Tuesday, June 27, 2006 7:55 AM
> To: bioperl-l at lists.open-bio.org
> Cc: selvik at ufl.edu
> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> evalue, description)
> 
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A?
> option when building the report.
> 
> Though the Bio::SearchIO::hmmer module parses through the entire
> HMMER report (Section 1, 2 and 3), the set of values made
> available through Bio::Search::Result::ResultI seem to be using
> Section 3 alone. So when we use the ?A option to truncate, we lose
> otherwise useful information in Section 1. This information is
> lost (only) for those models that do not have any of their domains
> in the top ?A number of? best scoring domains. The fields that are
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model
> 
> If I use the older Bio::Tools::HMMER:Results module, NEITHER
> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> retrieve the above listed values. Scores and Evalues are available
> for each domain but not for the model it belongs to.
> 
> I was wondering if there is any other method to access these
> values or do I have to write my own module to do this?
> 
> Any ideas/suggestions would be greatly appreciated.
> 
> Thank you!
> 
> 
> 
> 
> Selvi Kadirvel
> 
> Graduate Research Assistant
> High Performance Computing Center
> University of Florida
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 14:55:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 09:55:31 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <002b01c69abf$3cc0fef0$15327e82@pyrimidine>
Message-ID: <003501c69ac2$e70623b0$15327e82@pyrimidine>

I hate responding to myself!!  Forgot to add that there is also
Bio::Tools::Hmmpfam :

http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam

I'll check if Bio::SearchIO catches this data and let you know what I find
out.  It should at least some according to the mapping.

Chris

> Selvi,
> 
> Can you send me the report you are trying to parse as an attachment?  I'll
> give it a look.
> 
> Judging by the pdoc this is mapped for the event handler so it should be
> there.  From the %MAPPING hash:
> 
>                  'HMMER_program'   => 'RESULT-algorithm_name',
>                  'HMMER_version'   => 'RESULT-algorithm_version',
>                  'HMMER_query-def' => 'RESULT-query_name',
>                  'HMMER_query-len' => 'RESULT-query_length',
>                  'HMMER_query-acc' => 'RESULT-query_accession',
>                  'HMMER_querydesc' => 'RESULT-query_description',
>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
> 	           'HMMER_db'        => 'RESULT-database_name',
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
> > Sent: Tuesday, June 27, 2006 7:55 AM
> > To: bioperl-l at lists.open-bio.org
> > Cc: selvik at ufl.edu
> > Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
> > evalue, description)
> >
> > All,
> >
> > (I am new to Bioinformatics and Bioperl, so please apologize if I
> > get my terminology wrong)
> >
> > I am currently using Bio::SearchIO to parse HMMPFAM reports. This
> > report consists of three sections namely;
> >
> > 1. A ranked list of the best scoring HMMs
> > 2. A list of the best scoring domains in order of their occurrence
> > in the sequence
> > 3. Alignments for all the best scoring domains.
> >
> > Section 3 can be truncated to a specific number using the ??A?
> > option when building the report.
> >
> > Though the Bio::SearchIO::hmmer module parses through the entire
> > HMMER report (Section 1, 2 and 3), the set of values made
> > available through Bio::Search::Result::ResultI seem to be using
> > Section 3 alone. So when we use the ?A option to truncate, we lose
> > otherwise useful information in Section 1. This information is
> > lost (only) for those models that do not have any of their domains
> > in the top ?A number of? best scoring domains. The fields that are
> > not available are:
> >
> > 1.	Description of a model
> > 2.	Score of a model
> > 3.	Evalue of a model
> >
> > If I use the older Bio::Tools::HMMER:Results module, NEITHER
> > Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
> > retrieve the above listed values. Scores and Evalues are available
> > for each domain but not for the model it belongs to.
> >
> > I was wondering if there is any other method to access these
> > values or do I have to write my own module to do this?
> >
> > Any ideas/suggestions would be greatly appreciated.
> >
> > Thank you!
> >
> >
> >
> >
> > Selvi Kadirvel
> >
> > Graduate Research Assistant
> > High Performance Computing Center
> > University of Florida
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Jun 28 15:04:29 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 16:04:29 +0100
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
 evalue, description)
In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
Message-ID: <44A29A7D.7020602@sendu.me.uk>

Kadirvel, Selvi wrote:
> All,
> 
> (I am new to Bioinformatics and Bioperl, so please apologize if I 
> get my terminology wrong)
> 
> I am currently using Bio::SearchIO to parse HMMPFAM reports. This 
> report consists of three sections namely;
> 
> 1. A ranked list of the best scoring HMMs
> 2. A list of the best scoring domains in order of their occurrence 
> in the sequence
> 3. Alignments for all the best scoring domains.
> 
> Section 3 can be truncated to a specific number using the ??A? 
> option when building the report.

What do you mean by this? What is ??A? ?
Is this an option you're supplying to hmmpfam or a bioperl module?


> Though the Bio::SearchIO::hmmer module parses through the entire 
> HMMER report (Section 1, 2 and 3), the set of values made 
> available through Bio::Search::Result::ResultI seem to be using 
> Section 3 alone. So when we use the ?A option to truncate, we lose 
> otherwise useful information in Section 1. This information is 
> lost (only) for those models that do not have any of their domains 
> in the top ?A number of? best scoring domains. The fields that are 
> not available are:
> 
> 1.	Description of a model
> 2.	Score of a model
> 3.	Evalue of a model

Each hit you get back from each result of the SearchIO is a 
Bio::Search::Hit::HMMERHit and represents the results of a particular 
model (you can also say $result->next_model).

So you can say:
$hit->name, " ", $hit->description, " ", $hit->significance, " ", 
$hit->score;

To get the information you want.
General information about the result can be had like so:
print $result->query_name, " ", $result->algorithm, " ", 
$result->hmm_name, "\n";

I have another problem (or the same one as you? I'm can't tell...) in 
that I can only get a single result, hit and hsp from my hmmpfam file!
It is doing my head in, but I might be doing something wrong so will 
look into it further before posting a bug report.


From bix at sendu.me.uk  Wed Jun 28 16:46:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Jun 2006 17:46:57 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A29A7D.7020602@sendu.me.uk>
References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu>
	<44A29A7D.7020602@sendu.me.uk>
Message-ID: <44A2B281.7030806@sendu.me.uk>

Sendu Bala wrote:
[ from thread Bio::SearchIO - Accessing Model parameters (score, evalue, 
description) ]
[ concerning hmmpfam output ]
> I have another problem (or the same one as you? I'm can't tell...) in 
> that I can only get a single result, hit and hsp from my hmmpfam file!
> It is doing my head in, but I might be doing something wrong so will 
> look into it further before posting a bug report.

I was just doing something wrong, but...

Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report 
a single HSP per Hit so domains with multiple alignments get separate 
Hits (more FASTA like) since they aren't really HSPs'

Strangely 1.25 (Bioperl 1.4) seems to behave like that already.

In any case, this is extremely counter-intuitive, especially given that 
next_domain is a synonym of next_hsp. I think either the synonym 
relationship remains and hits have multiple hsps (and there is only one 
hit per model), or next_domain goes off and finds the hsp that is the 
next domain of the current model. But that would be incredibly broken in 
the current model since it would be found in a different hit object...

What hmmpfam does is take a database of models which can be thought of 
as database sequences. Then it aligns each one against your query 
sequences. A model could align in multiple locations along a query 
sequence. Each one of these locations is called a domain of the model. A 
user of hmmpfam is model-centric (wants to know which models are on his 
query), and so you want to know all about how well the model did in one 
go. So you should be able to get the results for a model ($hit = 
$result->next_model), get overall info about it ($hit->score etc.), then 
get more detailed information about each domain of it (while ($hsp = 
$hit->next_domain) {...}). But right now you only get one domain and you 
have to go searching through all your other hits to find a hit with the 
same ->name() as your model of interest to get the next domain of your 
model.

In my view this is less than ideal. What do people think? Should it be 
changed?


From selvik at ufl.edu  Wed Jun 28 15:21:37 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 11:21:37 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <003501c69ac2$e70623b0$15327e82@pyrimidine>
References: <003501c69ac2$e70623b0$15327e82@pyrimidine>
Message-ID: <2679E8D1-E225-4414-8925-1EB73B83523B@ufl.edu>

Thanks for your reply Chris.

I am attaching a part of the report I am trying to parse.

Also I see that, Bio::SearchIO::hmmer.pm is parsing all three  
sections. I am not sure how (or whether) fields from Section 1 are  
actually being made available through Bio::SearchIO or Bio::Search:: 
[Hit | Hsp | Result].

I'll look into Bio::Tools::Hmmpfam and let you know if that works for  
me.

-Selvi


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ManyQueries.hmmer
Type: application/octet-stream
Size: 3684451 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060628/53dcc875/attachment-0004.obj>
-------------- next part --------------


On Jun 28, 2006, at 10:55 AM, Chris Fields wrote:

> I hate responding to myself!!  Forgot to add that there is also
> Bio::Tools::Hmmpfam :
>
> http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam
>
> I'll check if Bio::SearchIO catches this data and let you know what  
> I find
> out.  It should at least some according to the mapping.
>
> Chris
>
>> Selvi,
>>
>> Can you send me the report you are trying to parse as an  
>> attachment?  I'll
>> give it a look.
>>
>> Judging by the pdoc this is mapped for the event handler so it  
>> should be
>> there.  From the %MAPPING hash:
>>
>>                  'HMMER_program'   => 'RESULT-algorithm_name',
>>                  'HMMER_version'   => 'RESULT-algorithm_version',
>>                  'HMMER_query-def' => 'RESULT-query_name',
>>                  'HMMER_query-len' => 'RESULT-query_length',
>>                  'HMMER_query-acc' => 'RESULT-query_accession',
>>                  'HMMER_querydesc' => 'RESULT-query_description',
>>                  'HMMER_hmm'       => 'RESULT-hmm_name',
>>                  'HMMER_seqfile'   => 'RESULT-sequence_file',
>> 	           'HMMER_db'        => 'RESULT-database_name',
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi
>>> Sent: Tuesday, June 27, 2006 7:55 AM
>>> To: bioperl-l at lists.open-bio.org
>>> Cc: selvik at ufl.edu
>>> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters  
>>> (score,
>>> evalue, description)
>>>
>>> All,
>>>
>>> (I am new to Bioinformatics and Bioperl, so please apologize if I
>>> get my terminology wrong)
>>>
>>> I am currently using Bio::SearchIO to parse HMMPFAM reports. This
>>> report consists of three sections namely;
>>>
>>> 1. A ranked list of the best scoring HMMs
>>> 2. A list of the best scoring domains in order of their occurrence
>>> in the sequence
>>> 3. Alignments for all the best scoring domains.
>>>
>>> Section 3 can be truncated to a specific number using the ??A?
>>> option when building the report.
>>>
>>> Though the Bio::SearchIO::hmmer module parses through the entire
>>> HMMER report (Section 1, 2 and 3), the set of values made
>>> available through Bio::Search::Result::ResultI seem to be using
>>> Section 3 alone. So when we use the ?A option to truncate, we lose
>>> otherwise useful information in Section 1. This information is
>>> lost (only) for those models that do not have any of their domains
>>> in the top ?A number of? best scoring domains. The fields that are
>>> not available are:
>>>
>>> 1.	Description of a model
>>> 2.	Score of a model
>>> 3.	Evalue of a model
>>>
>>> If I use the older Bio::Tools::HMMER:Results module, NEITHER
>>> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to
>>> retrieve the above listed values. Scores and Evalues are available
>>> for each domain but not for the model it belongs to.
>>>
>>> I was wondering if there is any other method to access these
>>> values or do I have to write my own module to do this?
>>>
>>> Any ideas/suggestions would be greatly appreciated.
>>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>> Selvi Kadirvel
>>>
>>> Graduate Research Assistant
>>> High Performance Computing Center
>>> University of Florida
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From akarger at CGR.Harvard.edu  Wed Jun 28 19:49:54 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 28 Jun 2006 15:49:54 -0400
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
Message-ID: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>

>perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
-e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: acc (P09651) does not exist
---------------------------------------------------
>perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
$format="fasta";' -e '$sequence = get_sequence($database, $id);'

-------------------- WARNING ---------------------
MSG: id (ROA1_HUMAN) does not exist
---------------------------------------------------

But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
Same error for a couple other proteins.
Works for a GenBank protein.

perl 5.8.6
Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp

This worked a few months ago.
What's going on?

-Amir Karger


From cjfields at uiuc.edu  Wed Jun 28 20:27:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 15:27:15 -0500
Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E498E78@huls5.nucleus.harvard.edu>
Message-ID: <006901c69af1$412c3590$15327e82@pyrimidine>

This was a recent bug due to recent changes in EBI's remote database; they
changed the name of the database from 'swall' to 'uniprot'.  Update to
bioperl-live from CVS (or just Bio::DB::SwissProt) and that should fix it.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, June 28, 2006 2:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work
> 
> >perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";'
> -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: acc (P09651) does not exist
> ---------------------------------------------------
> >perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN",
> $format="fasta";' -e '$sequence = get_sequence($database, $id);'
> 
> -------------------- WARNING ---------------------
> MSG: id (ROA1_HUMAN) does not exist
> ---------------------------------------------------
> 
> But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN
> Same error for a couple other proteins.
> Works for a GenBank protein.
> 
> perl 5.8.6
> Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp
> 
> This worked a few months ago.
> What's going on?
> 
> -Amir Karger
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed Jun 28 20:39:43 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 28 Jun 2006 13:39:43 -0700
Subject: [Bioperl-l] FW:  How to handle bugs in bioperl 1.4 on CPAN?
Message-ID: <1A4207F8295607498283FE9E93B775B4019719A4@EX02.asurite.ad.asu.edu>

This was supposed to go to the list...  Still not used to Outlook...

> The points made here, as I see them:
> 
> 1)  Commits should be made to stable releases (as well as to 
> the main branch in CVS) to fix bugs as long as that release is
supported.  I 
> agree with this, but someone has to volunteer, and the length of time
a 
> release is supported also worked out.  Almost would be better going to
a regular
> release schedule (once every 3-6 months or so) where the code is given
as is
> to CPAN, whether it passes tests or not.

What I've seen in other projects is that stable is supported and bug
patched up till the next stable release.  After that support is dropped.
Once a branch was tagged stable the ONLY thing that went into it was
fixes for bugs based on the code already present.  No new features, no
refactoring of any code or modules.  I'm not certain how often things
like a stable patch release happened since most of the bugs were worked
on long before while it was still tagged as dev.  I could see, worst
case a .x release to stable every 6 months to a year until the next
stable came out if there were patches to it.  It looks like the wiki has
most of this kind of stuff documented in the previously posted link:
http://www.bioperl.org/wiki/Making_a_BioPerl_release.  I guess it would
just need a pumpkin/monkey/whatever to step up to keep things rolling...

> 2)  More communication about the direction Bioperl is 
> heading; personally I
> haven't see a problem with this as much as there is no 
> information about a
> roadmap.  That is being alleviated soon I believe, thought 
> people out there
> need to be patient.
> 
> 3)  Volunteer.  If you have something you believe needs to be 
> done and you
> believe so fervently, then put up or shut up.  Make (nice polite)
> suggestions otherwise.  Don't judge code or "the way things 
> are done" and
> don't presume what kind of experience people have that you 
> don't know and
> haven't met.  End of story.
> 
> Chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Jun 28 22:14:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:14:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A2B281.7030806@sendu.me.uk>
Message-ID: <007e01c69b00$2e091410$15327e82@pyrimidine>

> Sendu Bala wrote:
> [ from thread Bio::SearchIO - Accessing Model parameters (score, evalue,
> description) ]
> [ concerning hmmpfam output ]
> > I have another problem (or the same one as you? I'm can't tell...) in
> > that I can only get a single result, hit and hsp from my hmmpfam file!
> > It is doing my head in, but I might be doing something wrong so will
> > look into it further before posting a bug report.
> 
> I was just doing something wrong, but...
> 
> Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report
> a single HSP per Hit so domains with multiple alignments get separate
> Hits (more FASTA like) since they aren't really HSPs'
> 
> Strangely 1.25 (Bioperl 1.4) seems to behave like that already.
> 
> In any case, this is extremely counter-intuitive, especially given that
> next_domain is a synonym of next_hsp. I think either the synonym
> relationship remains and hits have multiple hsps (and there is only one
> hit per model), or next_domain goes off and finds the hsp that is the
> next domain of the current model. But that would be incredibly broken in
> the current model since it would be found in a different hit object...
>
> What hmmpfam does is take a database of models which can be thought of
> as database sequences. Then it aligns each one against your query
> sequences. A model could align in multiple locations along a query
> sequence. Each one of these locations is called a domain of the model. A
> user of hmmpfam is model-centric (wants to know which models are on his
> query), and so you want to know all about how well the model did in one
> go. So you should be able to get the results for a model ($hit =
> $result->next_model), get overall info about it ($hit->score etc.), then
> get more detailed information about each domain of it (while ($hsp =
> $hit->next_domain) {...}). But right now you only get one domain and you
> have to go searching through all your other hits to find a hit with the
> same ->name() as your model of interest to get the next domain of your
> model.
> 
> In my view this is less than ideal. What do people think? Should it be
> changed?

The model (hit-like) table scores are retained and can be retrieved via
$model->significance and the individual domain (hsp-like) evalues via
$model->evalue.  The reason you don't get all the individual domain evalues
is that only five alignments are returned by default.  You might try
changing the 'A' parameter to see if you can get more alignments; that may
work around the problem of missing domains for now.  You'll note that the
Model/Domain results returned are not based on top score but what looks like
the position of the domain in the sequence (seq-t in the last table); that's
what is stated in the hmmpfam docs.  Anyway, I tried this loop with the
reports Selvi sent and it works, but only for the ones that return
alignments:

my $result_count = 1;
while ( my $result = $searchio->next_result() ) {
  print "Result $result_count : ",$result->query_name,"\n";
  print "Result models: ",$result->num_hits,"\n";
  while (my $model = $result->next_hit) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_hsp) {
	  print "\t\tDomain : ",$domain->name,"\n";
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
  $result_count++;
}

>From the HMMER docs: "Say you have a new sequence that, according to a BLAST
analysis, shows a slew of hits to receptor tyrosine kinases. Before you
decide to call your sequence an RTK homologue, you suspiciously recall that
RTK's are, like many proteins, composed of multiple functional domains, and
these domains are often found promiscuously in proteins with a wide variety
of functions. Is your sequence really an RTK? Or is it a novel sequence that
just happens to have a protein kinase catalytic domain or fibronectin type
III domain?"

Model/domain pairs really aren't Hits/HSPs by definition, like the CVS
commit from Jason states.  The way Pfam is set up, you actually have your
query(ies) scanned using a database of Pfam domains (HMM's, built from
protein alignments for various protein families), hence the alignment in the
report is not a HSP since HSPs come from pairwise sequence alignments.  An
HSP is a pair of sequences which, when aligned, meet or exceed a maximal
cutoff.  The hmmpfam report has alignments of the sequence and the consensus
for the alignment the HMM is based on (not another sequence, so not an HSP).
This is also the same reason you can't get alignments from
Bio::Search::HSP::HMMERHSP objects since the model 'sequence' isn't a true
sequence but a consensus of sequences, so it's 'inappropriate' to use that
as an actual alignment.  Bad Bioperl user!  Bad!

I think the reasoning for keeping single model-domain pairs is that you
should consider each domain's location in the sequence as well as the number
of times they appear, regardless of whether they belong to the same model or
not.  One protein could have three ATP-binding domains and another two, and
they could be located in different positions on the sequence.  But where
they are on the sequence in relation to other domains and to each other
(i.e. positional information) is just as important, maybe more so, than how
many times that domain appears.  

Well, that and SearchIO is set up as a SAX-like parser, so I believe it
processes the model-domain alignments as the file is parsed.

My 2c: there should be a way to get all model-domain pairs in the "parsed
for domains" table (which is like a list of HSPs).  Seems the last few w/o
alignments are not retained; this may be the way the parser is set up.  I
would try getting the handler to return just evalues and similar stuff for
those and leave out sequence/alignment info, if that's possible.  Not sure
how this is handled with BLAST reports where there are more hits reported
than alignments...

Chris
_____________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Jun 28 22:16:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 17:16:38 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
Message-ID: <000001c69b00$86adcc00$15327e82@pyrimidine>

Arghhhh!  Made a mistake:

> my $result_count = 1;
> while ( my $result = $searchio->next_result() ) {
>   print "Result $result_count : ",$result->query_name,"\n";
>   print "Result models: ",$result->num_hits,"\n";
>   while (my $model = $result->next_hit) {
> 	print "\tModel : ",$model->name,"\n";
> 	print "\tSignif: ",$model->significance,"\n";
> 	while (my $domain = $model->next_hsp) {
> 	  print "\t\tDomain : ",$domain->name,"\n";
                              ^^^^^^^
Should be:                    $model

> 	  print "\t\tEvalue : ",$domain->evalue,"\n";
> 	}
>   }
>   $result_count++;
> }

My bad!

Chris


From bix at sendu.me.uk  Wed Jun 28 23:00:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 00:00:11 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <007e01c69b00$2e091410$15327e82@pyrimidine>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
Message-ID: <44A309FB.2050009@sendu.me.uk>

Chris Fields wrote:
>> Sendu Bala wrote:
[snip]
>> In any case, this is extremely counter-intuitive, especially given
>> that next_domain is a synonym of next_hsp. I think either the
>> synonym relationship remains and hits have multiple hsps (and there
>> is only one hit per model)
[snip]

> The model (hit-like) table scores are retained and can be retrieved
> via $model->significance and the individual domain (hsp-like) evalues
> via $model->evalue.

I know, see my earlier post.

> The reason you don't get all the individual domain evalues is that
> only five alignments are returned by default.  You might try changing
> the 'A' parameter to see if you can get more alignments; that may 
> work around the problem of missing domains for now.

[I'm using my own data, not the OP's]
No, I have all the alignments: 'A' isn't a problem. And I can get all
the domains. The problem is I have to check multiple different hits to
find them all.


> You'll note that the Model/Domain results returned are not based on 
> top score but what looks like the position of the domain in the
> sequence (seq-t in the last table); that's what is stated in the
> hmmpfam docs.
[...]
> Well, that and SearchIO is set up as a SAX-like parser, so I believe 
> it processes the model-domain alignments as the file is parsed.

Yes, this is the problem. The parser does the obvious thing, but in my 
view it does not do the correct thing.


> Model/domain pairs really aren't Hits/HSPs by definition, like the
> CVS commit from Jason states.  The way Pfam is set up, you actually
> have your query(ies) scanned using a database of Pfam domains (HMM's,
> built from protein alignments for various protein families), hence
> the alignment in the report is not a HSP since HSPs come from
> pairwise sequence alignments.  An HSP is a pair of sequences which,
> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> has alignments of the sequence and the consensus for the alignment
> the HMM is based on (not another sequence, so not an HSP).

But this is just semantics. It doesn't /matter/ that its not really 
truly a sequence that's being aligned. The parser needs to present to 
the user the information in the file. As we see in the OP's example, it 
simply fails to do this because the parser isn't model-centric while the 
file it is parsing /is/.

And in any case, your argument doesn't hold because even the current 
parser /does/ store domains in hsp objects! It just only stores one hsp 
per hit, repeatedly, which is nonsensical.

[to avoid confusion, in the following the use of 'model' is in the 
programming sense, whilst 'Model' refers to the things generated by hmmer]

The correct model to describe the file being parsed is one that is able 
provide to the user all the available results for all Models that hit a 
query sequence, even when there are no alignments in the file. To make 
this fit the SearchIO scheme, we must have one hit per Model. The hit 
has hsps which are the domains. This perfectly matches the information 
in the file. It matches something like a Blast, where you have one hit 
per database sequence/query sequence combo.

A hit could end up with no hsps (no domains), but we may not even care. 
Sometimes you really do just want to know if a particular model hit at 
all, and with what evalue/score. The current parsing model isn't 
guaranteed to tell you this even when you can read it yourself in the 
file being parsed.

You can guess at the intent of the original authors, I think, just by 
looking at those method synonyms. next_hit == next_model. next_hsp == 
next_domain. This makes perfect sense. This is the way to correctly 
model the information in the file. The problem is that next_model 
doesn't give you the next Model (because each Model has multiple hits), 
and next_domain doesn't give you the next domain (because each hit only 
has one domain).


> I think the reasoning for keeping single model-domain pairs is that
> you should consider each domain's location in the sequence as well as
> the number of times they appear, regardless of whether they belong to
> the same model or not.  One protein could have three ATP-binding
> domains and another two, and they could be located in different
> positions on the sequence.  But where they are on the sequence in
> relation to other domains and to each other (i.e. positional
> information) is just as important, maybe more so, than how many times
> that domain appears.

Well, that's for the user to decide. But the way the results are 
presented needs to make sense. If blast results came back with all hsps 
listed out in sequence position order, would you have multiple hits per 
database sequence each with one hsp? No, because the meaning is 
completely wrong. The 'hit' is the collection of alignments of a 
particular database sequence hitting a query sequence. The alignments 
are stored in a bunch of hsps. It is absurd to have more than one hit 
object for a database+query sequence combo, because then we have 
multiple hit objects duplicating the exact same information, and 'hit' 
no longer has any meaning - it is a collection of /some/ of the 
alignments? Yet this is exactly what we have with hmmpfam result parsing.


From selvik at ufl.edu  Wed Jun 28 20:11:56 2006
From: selvik at ufl.edu (Selvi Kadirvel)
Date: Wed, 28 Jun 2006 16:11:56 -0400
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
References: <mailman.4896.1151519013.2084.bioperl-l@lists.open-bio.org>
Message-ID: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>

Sendu,

>
> What do you mean by this? What is ??A? ?
> Is this an option you're supplying to hmmpfam or a bioperl module?

I was referring to the '-A' option when running hmmpfam. So if I were  
to use  '-A 5', Section 3 will have only the top scoring (first) five  
HSPs.

>
> So you can say:
> $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> $hit->score;
>
> To get the information you want.
> General information about the result can be had like so:
> print $result->query_name, " ", $result->algorithm, " ",
> $result->hmm_name, "\n";

I do use the same methods that you have suggested. Let me try to  
explain my problem in detail. Lets say I have a report that was  
generated using this "-A 5" option. I want to get the description,  
score, evalue of a model that *does not* have a domain in the top 5  
high scoring HSPs. This information *exists* in the report in Section  
1 but neither $result->next_hit or $hit->next_hsp can see it.

Details of ALL domains  are available through:

     foreach $domain ($result->each_Domain)
     {
            $domain-> [ hmmname, hmmacc, start, end, hstart, hend,  
evalue ]
     }

where $result is a Bio::Tools::HMMER::Results object. But this again  
represents information in Section 2. It gives us domain scores and  
evalues (and not model scores and evalues.)

I am working around this by finding the sum of scores (evalues) of  
all domains in a model. But there seems to be no work-around to  
retrieve the description. $domain->hmmacc contains only the first  
string of the description.

-Selvi


From jason at bioperl.org  Thu Jun 29 02:53:25 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 28 Jun 2006 22:53:25 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>
	<44A309FB.2050009@sendu.me.uk>
Message-ID: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>

I don't have any time to really debate this sadly - I definitely went  
back and forth on how to solve this and not many people ever spoke up  
about what the WANTED.  So glad to hear there are opinions out there  
now.

I think the bug fix you refer to had to do with not returning things  
ordered by E-value -- the creation machinery only only builds Hit  
objects when there are HSP objects being built.  Basically the  
parsing is linear in terms of the file, we read "Model" (Hit) data  
first and store them in a hash keyed by the name of the domain, but  
we only >>build<< the "Hits" when seen HSPs, hence the problem when  
the -A option limits alignments but reports Hits that don't have  
individual alignments.  This has to do with the order of things not  
syncing up and/or dealing with the -A option when there is leftover  
Hit data but no HSPs to populate them.  We also had this problem in  
BLAST reports and had to work around that, but I never bothered  
solving it in HMMER I guess.  Glad there are other people who are  
going to fix the problems!

The one "alignment" (HSP) per hit was a workaround to the problem  
that Hits were being returned in the order the HSPs came in (Sequence  
order) -- because that is the order they were being built in -- not  
in the sorted order of the Hits as seen in the report.

Feel free to propose an alternative implement for parser as you see  
fit as long as the API is preserved.  you can contibute a new  
SearchIO plugin and HMMERSearchResultListener to deal with it - or I  
guess do what I also do and just run hmmer2table and deal with things  
in a tab-delimited format.

Personally my interests lie in the actual domains so the Hit objects  
are superfluous in my own work so it never bothered me to have one  
per Hit and it flows more naturally to things like GFF, etc.  You can  
aggregate them however you like after the fact pretty simply so I  
don't find this too hard to deal with, but if this a major deterrent  
for people I guess have at it ( I think the speed of object creation  
is a larger problem that I hope that someone will work on soon).

I'd appreciate you including the salient points of how the report is  
interpreted on the wiki at some point (with 8X10 glossy pictures and  
circles and arrows on the back...http://en.wikipedia.org/wiki/Alice% 
27s_Restaurant) so the debate can be archived too.

-jason

On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:

> Chris Fields wrote:
>>> Sendu Bala wrote:
> [snip]
>>> In any case, this is extremely counter-intuitive, especially given
>>> that next_domain is a synonym of next_hsp. I think either the
>>> synonym relationship remains and hits have multiple hsps (and there
>>> is only one hit per model)
> [snip]
>
>> The model (hit-like) table scores are retained and can be retrieved
>> via $model->significance and the individual domain (hsp-like) evalues
>> via $model->evalue.
>
> I know, see my earlier post.
>
>> The reason you don't get all the individual domain evalues is that
>> only five alignments are returned by default.  You might try changing
>> the 'A' parameter to see if you can get more alignments; that may
>> work around the problem of missing domains for now.
>
> [I'm using my own data, not the OP's]
> No, I have all the alignments: 'A' isn't a problem. And I can get all
> the domains. The problem is I have to check multiple different hits to
> find them all.
>
>
>> You'll note that the Model/Domain results returned are not based on
>> top score but what looks like the position of the domain in the
>> sequence (seq-t in the last table); that's what is stated in the
>> hmmpfam docs.
> [...]
>> Well, that and SearchIO is set up as a SAX-like parser, so I believe
>> it processes the model-domain alignments as the file is parsed.
>
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.
>
>
>> Model/domain pairs really aren't Hits/HSPs by definition, like the
>> CVS commit from Jason states.  The way Pfam is set up, you actually
>> have your query(ies) scanned using a database of Pfam domains (HMM's,
>> built from protein alignments for various protein families), hence
>> the alignment in the report is not a HSP since HSPs come from
>> pairwise sequence alignments.  An HSP is a pair of sequences which,
>> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
>> has alignments of the sequence and the consensus for the alignment
>> the HMM is based on (not another sequence, so not an HSP).
>
> But this is just semantics. It doesn't /matter/ that its not really
> truly a sequence that's being aligned. The parser needs to present to
> the user the information in the file. As we see in the OP's  
> example, it
> simply fails to do this because the parser isn't model-centric  
> while the
> file it is parsing /is/.
>
> And in any case, your argument doesn't hold because even the current
> parser /does/ store domains in hsp objects! It just only stores one  
> hsp
> per hit, repeatedly, which is nonsensical.
>
> [to avoid confusion, in the following the use of 'model' is in the
> programming sense, whilst 'Model' refers to the things generated by  
> hmmer]
>
> The correct model to describe the file being parsed is one that is  
> able
> provide to the user all the available results for all Models that  
> hit a
> query sequence, even when there are no alignments in the file. To make
> this fit the SearchIO scheme, we must have one hit per Model. The hit
> has hsps which are the domains. This perfectly matches the information
> in the file. It matches something like a Blast, where you have one hit
> per database sequence/query sequence combo.
>
> A hit could end up with no hsps (no domains), but we may not even  
> care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.
>
> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple  
> hits),
> and next_domain doesn't give you the next domain (because each hit  
> only
> has one domain).
>
>
>> I think the reasoning for keeping single model-domain pairs is that
>> you should consider each domain's location in the sequence as well as
>> the number of times they appear, regardless of whether they belong to
>> the same model or not.  One protein could have three ATP-binding
>> domains and another two, and they could be located in different
>> positions on the sequence.  But where they are on the sequence in
>> relation to other domains and to each other (i.e. positional
>> information) is just as important, maybe more so, than how many times
>> that domain appears.
>
> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all  
> hsps
> listed out in sequence position order, would you have multiple hits  
> per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result  
> parsing.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Thu Jun 29 03:40:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Jun 2006 22:40:28 -0500
Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score,
	evalue, description)
In-Reply-To: <AF76E0AA-CCF1-4D73-9855-D8008A424FD9@ufl.edu>
Message-ID: <000301c69b2d$c3fdc6a0$15327e82@pyrimidine>

According to CVS, using -A0 (no alignments) is supposed to work since v.
1.5.1 and (I'm guessing here) should return HMMERHit/HMMERHSP objects with
no sequences, just the values from the table.  By this reasoning using -A5
should work but the first five Hit/HSP pairs will give you sequences and any
remaining should give nothing, just the Sequence Model combined evalue
(which you can get by $model->significance) and individual Domain (HSP-like)
evalues ($domain->evalue).  I don't get these either (I only get a max of 5
model/domain pairs). 

So, I tried a little experiment using the first single result output for
this query from your combined file (nbd27e02.y1  716 69 831 ; translated),
which was the first one I came across with more than five model/domain
pairs, and this scripted loop:

while ( my $result = $searchio->next_result() ) {
  print "Query: ",$result->query_name,"\n";
  while (my $model = $result->next_model) {
	print "\tModel : ",$model->name,"\n";
	print "\tSignif: ",$model->significance,"\n";
	while (my $domain = $model->next_domain) {
	  print "\t\tEvalue : ",$domain->evalue,"\n";
	}
  }
}

I get this with the file containing the alignments.  For anyone following,
I'm using bioperl-live, perl 5.8, WinXP:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 2.6e-43
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 40
        Model : IBN_N
        Signif: 2.1
                Evalue : 2.1
        Model : Arm
        Signif: 6e-38
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 1.2e-11
                Evalue : 0.0096

If I manually delete the alignments (make it like -A0 output) I get this:

Query: nbd27e02.y1  716 69 831 ; translated
        Model : IBB
        Signif: 157.3
                Evalue : 2.6e-43
        Model : HEAT
        Signif: 52.1
                Evalue : 40
        Model : IBN_N
        Signif: -3.6
                Evalue : 2.1
        Model : Arm
        Signif: 139.5
                Evalue : 3.5e-12
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0096
        Model : Arm
        Signif: 139.5
                Evalue : 2.2e-13
        Model : HEAT
        Signif: 52.1
                Evalue : 0.0032
        Model : Arm
        Signif: 139.5
                Evalue : 0.00019

i.e. all the model/domain pairs!  So I think it's safe to say that this is a
bug; the last few don't get processed but should.  I'll drop a bug report
into Bugzilla along with the test files and script so it can be confirmed.
This shouldn't be too hard to fix but it make take a few days; I'm pretty
busy here until Saturday.
 
Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Selvi Kadirvel
> Sent: Wednesday, June 28, 2006 3:12 PM
> To: bioperl-l at lists.open-bio.org
> Cc: Selvi Kadirvel
> Subject: Re: [Bioperl-l] Bio::SearchIO - Accessing Model parameters
> (score,evalue, description)
> 
> Sendu,
> 
> >
> > What do you mean by this? What is ??A? ?
> > Is this an option you're supplying to hmmpfam or a bioperl module?
> 
> I was referring to the '-A' option when running hmmpfam. So if I were
> to use  '-A 5', Section 3 will have only the top scoring (first) five
> HSPs.
> 
> >
> > So you can say:
> > $hit->name, " ", $hit->description, " ", $hit->significance, " ",
> > $hit->score;
> >
> > To get the information you want.
> > General information about the result can be had like so:
> > print $result->query_name, " ", $result->algorithm, " ",
> > $result->hmm_name, "\n";
> 
> I do use the same methods that you have suggested. Let me try to
> explain my problem in detail. Lets say I have a report that was
> generated using this "-A 5" option. I want to get the description,
> score, evalue of a model that *does not* have a domain in the top 5
> high scoring HSPs. This information *exists* in the report in Section
> 1 but neither $result->next_hit or $hit->next_hsp can see it.
> 
> Details of ALL domains  are available through:
> 
>      foreach $domain ($result->each_Domain)
>      {
>             $domain-> [ hmmname, hmmacc, start, end, hstart, hend,
> evalue ]
>      }
> 
> where $result is a Bio::Tools::HMMER::Results object. But this again
> represents information in Section 2. It gives us domain scores and
> evalues (and not model scores and evalues.)
> 
> I am working around this by finding the sum of scores (evalues) of
> all domains in a model. But there seems to be no work-around to
> retrieve the description. $domain->hmmacc contains only the first
> string of the description.
> 
> -Selvi
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 05:20:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:20:10 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A309FB.2050009@sendu.me.uk>
Message-ID: <000d01c69b3b$b17776d0$15327e82@pyrimidine>

> I know, see my earlier post.
...
> [I'm using my own data, not the OP's]
...

Sorry, I was typing that one up over a three-hour period in between
experiments, so I didn't go back and check everything before I sent it.
Pretty much the entire file Selvi sent me (and the entire group, grrr) shows
that the domains in the domain table are not completely parsed, and the
number of reported hits correlates with the number of alignments present.
In other words, only five or less hits are reported based on the alignments
and the default max alignments reported per result is five.  I figured out
that it is a bug and plan on submitting it to Bugzilla.

What you are talking about and what Selvi describes are two separate issues.
I dealt with Selvi's for the moment; let's deal with yours.

> > Well, that and SearchIO is set up as a SAX-like parser, so I believe
...
> Yes, this is the problem. The parser does the obvious thing, but in my
> view it does not do the correct thing.

Yes, and that's your opinion.  To tell the truth I'm quite neutral on this;
I'm trying to reason along the lines the contributors for the module
intended.  The fact of the matter is the parser is set up to do it this way,
and it was set up this way by others (not you or I); modifying it to suit
one's personal wants and needs is not our job here.  I don't have issues
while I'm running it so I really don't see what the problem is, well,
besides the reported bug I found along with Selvi's help.

My view on all this before I quit for the night:

I'm really don't want to get into what I consider nit-picky issues (the
'semantics' you mention; it's a simple difference in opinion and a small one
at that).  We can agree to disagree, whatever.  The issue immediately at
hand, what I consider the most important, is that Selvi has uncovered a bug
with the code, as is.  But I'm going to vent here a bit.  It's late, I'm
tired, and this whole thing irks me.  It irks me a great deal. 

Personally, I don't think right now is the time to think about refactoring
this particular module, esp. since I find it essentially works.  I believe
that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
instance, or refactoring SearchIO::blast etc to use hashes instead of
objects to speed things up.  Or creating something yourself.  Or doing what
you currently are doing (Bio::Map).  In other words, areas where use is
high, code is aging, and refactoring is more productive.

I'll add that I'm not trying to dissuade you from trying to build your own
variation of a SearchIO HMMER parser; by all means go ahead.  The above is
how I feel.  You can build your own parser to do what you want; you can even
base it off the current SearchIO HMMER parser and see if you can set it up
to give you the results you want, using a different handler and so on.  Just
don't break the API or modify the current code based strictly on what your
opinion of how it should work is.  It was probably set up this way for a
particular reason.

According to the SearchIO HOWTO the intent for SearchIO was to 'genericize'
parsing reports with 'similar' styles, like BLAST, FASTA, HMMER, and so on.
The most prevalently parsed reports, by a long stretch, are BLAST reports,
which is what the system is based on: 

http://www.bioperl.org/wiki/HOWTO:SearchIO#Design

So the SearchIO system is based on the >assumption< that these reports can
be divi'd up with the data mapped into categories (Results, Hits, HSPs), so
similar objects should be able to handle them.  Domain data are currently
stored in HSP objects (HMMERHSP), but that's nothing more than a convenient
way to store HMMER report data in my opinion; the alignment matches,
strictly speaking, are not HSP's.  You could rename HMMERHit HMMERModel and
HMMERHsp HMMERDomain, but they would still, if they fit into SearchIO and
used the current event handlers, implement HitI/HSPI by inheriting from
GenericHit/GenericHSP.  Ergo, any easy way you go about it here, HMMERHit
is-a HitI and HMMERHsp is-a HSPI.  You could probably work around it by
building the 'correct' object hierarchy by setting up your own handler and
SearchIO plugin, but that risks changing API.  And, really, if you decide to
go down that path, consider what Jason is talking about when he mentions
using "under-the-hood" hashes.

> A hit could end up with no hsps (no domains), but we may not even care.
> Sometimes you really do just want to know if a particular model hit at
> all, and with what evalue/score. The current parsing model isn't
> guaranteed to tell you this even when you can read it yourself in the
> file being parsed.

For every model (hit) you should have a corresponding domain (HSP) or more
depending on your view of how the parser works, even if the domain (HSP) is
only present in the table and not in an alignment.  You shouldn't have
models w/o domains from your query (hits w/o hsps); that doesn't make any
sense.  If hmmpfam output has this then it's a serious issue, but, again,
that doesn't make sense.  All that information is in the tables in the
hmmpfam output; you can even build objects w/o alignments present (-A0)
straight from the tables.

If you wanted to know whether a particular model hit at all, grab all the
model objects ($result->models) and run through them to see if your expected
model (Annexin, Phosphoribosyl, or whatever) is there using a map/grep
block, regex, or whatever; you could autovivicate a hash or similar data
structure indicating that a particular sequence has x domains of y type.  Or
iterate through them like you would for a BLAST report.  I don't see what's
difficult about this; I do it for BLAST sequences, SeqFeatures, and many
other BioPerl objects all the time!  Yes, it can be slow; that's an issue
with object instantiation and Perl and there is no easy way around it
besides refactoring the SearchIO parsers/eventhandlers to send back hashes,
as Jason has suggested.

> You can guess at the intent of the original authors, I think, just by
> looking at those method synonyms. next_hit == next_model. next_hsp ==
> next_domain. This makes perfect sense. This is the way to correctly
> model the information in the file. The problem is that next_model
> doesn't give you the next Model (because each Model has multiple hits),
> and next_domain doesn't give you the next domain (because each hit only
> has one domain).

....

> Well, that's for the user to decide. But the way the results are
> presented needs to make sense. If blast results came back with all hsps
> listed out in sequence position order, would you have multiple hits per
> database sequence each with one hsp? No, because the meaning is
> completely wrong. The 'hit' is the collection of alignments of a
> particular database sequence hitting a query sequence. The alignments
> are stored in a bunch of hsps. It is absurd to have more than one hit
> object for a database+query sequence combo, because then we have
> multiple hit objects duplicating the exact same information, and 'hit'
> no longer has any meaning - it is a collection of /some/ of the
> alignments? Yet this is exactly what we have with hmmpfam result parsing.

The problem is that the module is geared to parse the output as simply as
possible, so it does it by sequence order, just like the output.  And, as
is, it makes sense to me why Eddy and Co. set it that way, not that I
completely agree with it.  Hmmpfam output is designed for annotating
sequences using Pfam HMM's, so the results are hard-coded to appear in
sequence order, not based on score or evalue.  That's the way it is; not
necessarily the best way IMHO (I would have a way to sort by evalue or model
myself as an option), but it's the only way that's currently available.
Yes, each Model can match more than one domain on a query sequence.  Again,
that this is the 'correct way' to set up this parser is your opinion; if you
want, design your own SearchIO parser.  Like I said, I don't have a problem
with using this module myself.  And I'm a bit reticent to spend the energy
overhaulin' this module when I could spend my time working on something else
I consider more constructive (or destructive, depending on your view).  

And, frankly, it's not up to the user when using code they didn't create.
You have to deal with it.  Or code something yourself to do things the way
you want.  You have the power to do that; most bioperl users don't simply
b/c they probably don't understand the class structure and OO nature of
Bioperl.  It's just a matter of where you want to spend your energy: dealing
with something that interests you or fixing other's people's broken code.


Chris


From cjfields at uiuc.edu  Thu Jun 29 05:23:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 00:23:03 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <000e01c69b3c$18d58fb0$15327e82@pyrimidine>

...
 
> I think the bug fix you refer to had to do with not returning things
> ordered by E-value -- the creation machinery only only builds Hit
> objects when there are HSP objects being built.  Basically the
> parsing is linear in terms of the file, we read "Model" (Hit) data
> first and store them in a hash keyed by the name of the domain, but
> we only >>build<< the "Hits" when seen HSPs, hence the problem when
> the -A option limits alignments but reports Hits that don't have
> individual alignments.  This has to do with the order of things not
> syncing up and/or dealing with the -A option when there is leftover
> Hit data but no HSPs to populate them.  We also had this problem in
> BLAST reports and had to work around that, but I never bothered
> solving it in HMMER I guess.  Glad there are other people who are
> going to fix the problems!

Yeah, just figured that one out.  I see the two tables are parsed into two
arrays, so it is feasible to have the leftover (Hit/HSP|Model/Domain)
whatever converted into the proper objects like without any alignments (-A0
optional output).  I plan on reporting this in Bugzilla and will work on it,
but can't get to it immediately (probably not 'til Friday-Saturday at the
earliest).  If Sendu wants to tackle it I don't have a problem.

> The one "alignment" (HSP) per hit was a workaround to the problem
> that Hits were being returned in the order the HSPs came in (Sequence
> order) -- because that is the order they were being built in -- not
> in the sorted order of the Hits as seen in the report.

The SAX method, I gather, getting in the way.  

> Feel free to propose an alternative implement for parser as you see
> fit as long as the API is preserved.  you can contibute a new
> SearchIO plugin and HMMERSearchResultListener to deal with it - or I
> guess do what I also do and just run hmmer2table and deal with things
> in a tab-delimited format.

Or set it up as hashes, which you have mentioned before for BLAST.

> Personally my interests lie in the actual domains so the Hit objects
> are superfluous in my own work so it never bothered me to have one
> per Hit and it flows more naturally to things like GFF, etc.  You can
> aggregate them however you like after the fact pretty simply so I
> don't find this too hard to deal with, but if this a major deterrent
> for people I guess have at it ( I think the speed of object creation
> is a larger problem that I hope that someone will work on soon).

Agreed, though now it's finding the time....


Chris 

> I'd appreciate you including the salient points of how the report is
> interpreted on the wiki at some point (with 8X10 glossy pictures and
> circles and arrows on the back...http://en.wikipedia.org/wiki/Alice%
> 27s_Restaurant) so the debate can be archived too.
> 
> -jason
> 
> On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote:
> 
> > Chris Fields wrote:
> >>> Sendu Bala wrote:
> > [snip]
> >>> In any case, this is extremely counter-intuitive, especially given
> >>> that next_domain is a synonym of next_hsp. I think either the
> >>> synonym relationship remains and hits have multiple hsps (and there
> >>> is only one hit per model)
> > [snip]
> >
> >> The model (hit-like) table scores are retained and can be retrieved
> >> via $model->significance and the individual domain (hsp-like) evalues
> >> via $model->evalue.
> >
> > I know, see my earlier post.
> >
> >> The reason you don't get all the individual domain evalues is that
> >> only five alignments are returned by default.  You might try changing
> >> the 'A' parameter to see if you can get more alignments; that may
> >> work around the problem of missing domains for now.
> >
> > [I'm using my own data, not the OP's]
> > No, I have all the alignments: 'A' isn't a problem. And I can get all
> > the domains. The problem is I have to check multiple different hits to
> > find them all.
> >
> >
> >> You'll note that the Model/Domain results returned are not based on
> >> top score but what looks like the position of the domain in the
> >> sequence (seq-t in the last table); that's what is stated in the
> >> hmmpfam docs.
> > [...]
> >> Well, that and SearchIO is set up as a SAX-like parser, so I believe
> >> it processes the model-domain alignments as the file is parsed.
> >
> > Yes, this is the problem. The parser does the obvious thing, but in my
> > view it does not do the correct thing.
> >
> >
> >> Model/domain pairs really aren't Hits/HSPs by definition, like the
> >> CVS commit from Jason states.  The way Pfam is set up, you actually
> >> have your query(ies) scanned using a database of Pfam domains (HMM's,
> >> built from protein alignments for various protein families), hence
> >> the alignment in the report is not a HSP since HSPs come from
> >> pairwise sequence alignments.  An HSP is a pair of sequences which,
> >> when aligned, meet or exceed a maximal cutoff.  The hmmpfam report
> >> has alignments of the sequence and the consensus for the alignment
> >> the HMM is based on (not another sequence, so not an HSP).
> >
> > But this is just semantics. It doesn't /matter/ that its not really
> > truly a sequence that's being aligned. The parser needs to present to
> > the user the information in the file. As we see in the OP's
> > example, it
> > simply fails to do this because the parser isn't model-centric
> > while the
> > file it is parsing /is/.
> >
> > And in any case, your argument doesn't hold because even the current
> > parser /does/ store domains in hsp objects! It just only stores one
> > hsp
> > per hit, repeatedly, which is nonsensical.
> >
> > [to avoid confusion, in the following the use of 'model' is in the
> > programming sense, whilst 'Model' refers to the things generated by
> > hmmer]
> >
> > The correct model to describe the file being parsed is one that is
> > able
> > provide to the user all the available results for all Models that
> > hit a
> > query sequence, even when there are no alignments in the file. To make
> > this fit the SearchIO scheme, we must have one hit per Model. The hit
> > has hsps which are the domains. This perfectly matches the information
> > in the file. It matches something like a Blast, where you have one hit
> > per database sequence/query sequence combo.
> >
> > A hit could end up with no hsps (no domains), but we may not even
> > care.
> > Sometimes you really do just want to know if a particular model hit at
> > all, and with what evalue/score. The current parsing model isn't
> > guaranteed to tell you this even when you can read it yourself in the
> > file being parsed.
> >
> > You can guess at the intent of the original authors, I think, just by
> > looking at those method synonyms. next_hit == next_model. next_hsp ==
> > next_domain. This makes perfect sense. This is the way to correctly
> > model the information in the file. The problem is that next_model
> > doesn't give you the next Model (because each Model has multiple
> > hits),
> > and next_domain doesn't give you the next domain (because each hit
> > only
> > has one domain).
> >
> >
> >> I think the reasoning for keeping single model-domain pairs is that
> >> you should consider each domain's location in the sequence as well as
> >> the number of times they appear, regardless of whether they belong to
> >> the same model or not.  One protein could have three ATP-binding
> >> domains and another two, and they could be located in different
> >> positions on the sequence.  But where they are on the sequence in
> >> relation to other domains and to each other (i.e. positional
> >> information) is just as important, maybe more so, than how many times
> >> that domain appears.
> >
> > Well, that's for the user to decide. But the way the results are
> > presented needs to make sense. If blast results came back with all
> > hsps
> > listed out in sequence position order, would you have multiple hits
> > per
> > database sequence each with one hsp? No, because the meaning is
> > completely wrong. The 'hit' is the collection of alignments of a
> > particular database sequence hitting a query sequence. The alignments
> > are stored in a bunch of hsps. It is absurd to have more than one hit
> > object for a database+query sequence combo, because then we have
> > multiple hit objects duplicating the exact same information, and 'hit'
> > no longer has any meaning - it is a collection of /some/ of the
> > alignments? Yet this is exactly what we have with hmmpfam result
> > parsing.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 29 07:02:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:02:49 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
Message-ID: <44A37B19.7030908@sendu.me.uk>

Chris Fields wrote:
>
> Personally, I don't think right now is the time to think about refactoring
> this particular module, esp. since I find it essentially works.  I believe
> that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for
> instance, or refactoring SearchIO::blast etc to use hashes instead of
> objects to speed things up.  Or creating something yourself.  Or doing what
> you currently are doing (Bio::Map).  In other words, areas where use is
> high, code is aging, and refactoring is more productive.

Hmmer parsing happens to be important to me, in fact vital for my work. 
I've been using my own parser up till now, so didn't know what the 
Bioperl one was like. I'd like to use Bioperl for more things, 
preferably everything.


> I'll add that I'm not trying to dissuade you from trying to build your own
> variation of a SearchIO HMMER parser; by all means go ahead.  The above is
> how I feel.  You can build your own parser to do what you want; you can even
> base it off the current SearchIO HMMER parser and see if you can set it up
> to give you the results you want, using a different handler and so on.  Just
> don't break the API or modify the current code based strictly on what your
> opinion of how it should work is.  It was probably set up this way for a
> particular reason.

Well, I don't like the idea of there being multiple SearchIO parsers for 
the same thing.

[...]
> And, frankly, it's not up to the user when using code they didn't create.
> You have to deal with it.  Or code something yourself to do things the way
> you want.  You have the power to do that; most bioperl users don't simply
> b/c they probably don't understand the class structure and OO nature of
> Bioperl.  It's just a matter of where you want to spend your energy: dealing
> with something that interests you or fixing other's people's broken code.

My original question was essentially: does doing it my way make sense? 
And implicitly: would doing it my way be of any harm? Ie. can I go ahead 
and change how the parser reports results and groups them together? I 
don't think it will involve an API change, but the results it generates 
will obviously be very different.


From bix at sendu.me.uk  Thu Jun 29 07:54:50 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 29 Jun 2006 08:54:50 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
Message-ID: <44A3874A.9040803@sendu.me.uk>

Jason Stajich wrote:
>
> Feel free to propose an alternative implement for parser as you see  
> fit as long as the API is preserved.  you can contibute a new  
> SearchIO plugin and HMMERSearchResultListener to deal with it - or [snip]

What's the thinking behind the way SearchIOs work? Is it necessary or 
desirable to always do it with events and listeners? Or is it enough to 
simply return a ResultI regardless of how you made it?


From cjfields at uiuc.edu  Thu Jun 29 13:27:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 08:27:00 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <A1A48284-6FD6-4898-9438-DEEB105496EC@uiuc.edu>


On Jun 29, 2006, at 2:02 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> Personally, I don't think right now is the time to think about  
>> refactoring
>> this particular module, esp. since I find it essentially works.  I  
>> believe
>> that energy is better spent elsewhere, such as SeqIO::genbank/ 
>> swiss/embl for
>> instance, or refactoring SearchIO::blast etc to use hashes instead of
>> objects to speed things up.  Or creating something yourself.  Or  
>> doing what
>> you currently are doing (Bio::Map).  In other words, areas where  
>> use is
>> high, code is aging, and refactoring is more productive.
>
> Hmmer parsing happens to be important to me, in fact vital for my  
> work.
> I've been using my own parser up till now, so didn't know what the
> Bioperl one was like. I'd like to use Bioperl for more things,
> preferably everything.

We're not deterring you from setting up your own parser, something  
both Jason and I suggested.  I just don't see what the major issue  
is; hmmerpfam results never really contain the same number of hits  
per query that BLAST does (I get at the very most 30-40 and that is  
usually based on repeats).  I believe the best place to spend this  
energy first and foremost is fixing the bug.

>> I'll add that I'm not trying to dissuade you from trying to build  
>> your own
>> variation of a SearchIO HMMER parser; by all means go ahead.  The  
>> above is
>> how I feel.  You can build your own parser to do what you want;  
>> you can even
>> base it off the current SearchIO HMMER parser and see if you can  
>> set it up
>> to give you the results you want, using a different handler and so  
>> on.  Just
>> don't break the API or modify the current code based strictly on  
>> what your
>> opinion of how it should work is.  It was probably set up this way  
>> for a
>> particular reason.
>
> Well, I don't like the idea of there being multiple SearchIO  
> parsers for
> the same thing.

See, here's the thing: if the community-at-large decides to use your  
version of the parser then, by default it will become the only HMMER  
SearchIO parser and we'll deprecate the old one.  I just don't think  
this is the way I would go about it.  Jason has mentioned that object  
instantiation is a bigger issue with parsing (speed) than anything  
else; why not, if you plan on doing this, set up a Handler to return  
hashes, or do it completely under-the-hood?  Have it be the 'new,  
faster way to run SearchIO.'  Don't rehash (pardon the bad pun) the  
way things were esp. when proposals are out there to improve the  
toolkit.

> [...]
>> And, frankly, it's not up to the user when using code they didn't  
>> create.
>> You have to deal with it.  Or code something yourself to do things  
>> the way
>> you want.  You have the power to do that; most bioperl users don't  
>> simply
>> b/c they probably don't understand the class structure and OO  
>> nature of
>> Bioperl.  It's just a matter of where you want to spend your  
>> energy: dealing
>> with something that interests you or fixing other's people's  
>> broken code.
>
> My original question was essentially: does doing it my way make sense?
> And implicitly: would doing it my way be of any harm? Ie. can I go  
> ahead
> and change how the parser reports results and groups them together? I
> don't think it will involve an API change, but the results it  
> generates
> will obviously be very different.

And my point is that both ways make sense, at least to me (and it  
sounds like to Jason though I could be wrong).  Again, create a new  
version of the parser based on what you want to do and accomplish.   
Don't just modify something the community at-large uses based on your  
whims. Make the changes to a new module and let the community  
decide.  As an example, BioPerl, for the longest time, had several  
BLAST parsers; we directed everybody over to SearchIO and most people  
seem to like it; hence the others are deprecated.

And changing the results returned by some could be considered  
changing the API or a bug.  If someone using this module has an  
automated pipeline set up for annotation using Pfam, hmmpfam,  
Bioperl, and a database, and their setup expects single model/domain  
pairs, yeah, your changes will break that.  Maybe small,  
inconsequential even, but it's possible (and even true; many genome  
annotation pipelines are set up exactly how I describe).

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ClarkeW at AGR.GC.CA  Thu Jun 29 14:31:14 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 10:31:14 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>

Hi all, 

 
Recently I was working on a project which required some manipulation of
Quality files. I may be wrong in this, but I don't believe that there is
a Quality format for Bio:SeqIO. If there is, someone could point me in
the right direction as I could write a much nicer script then what I
currently have, if not I was wondering if anyone here has any use for
such a thing. I am pretty new to developing but would be willing to give
it a shot, as I feel that for all the use I get out of BioPerl with no
thanks to anyone who spent time on writing something I used, I could try
and contribute my limited amount. Any comments would be appreciated, and
don't be afraid to tell me this is a lost cause. I realize that quality
files tend to be less important than FASTA sequence files. I will give
you a little information on me so that you know what to expect/what I am
working with.

I am a fourth year bioinformatics student, and am currently working as a
summer student. I have some limited experience with writing perl modules
and test scripts. Mostly I write perl to do specific jobs, that I or
someone else has come up with to fill some immediate need of the
company. I am interested in most things bioinformatics/computer
sci/biology and am hoping to do Graduate studies when I finish my
degree.

Well that's enough for now, if you have any comments/suggestions I would
appreciate it.

 
Cheers, Wayne


From cjfields at uiuc.edu  Thu Jun 29 14:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 09:55:16 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <001601c69b8c$08cdce70$15327e82@pyrimidine>

> Recently I was working on a project which required some manipulation of
> Quality files. I may be wrong in this, but I don't believe that there is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could try
> and contribute my limited amount. Any comments would be appreciated, and
> don't be afraid to tell me this is a lost cause. I realize that quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in BioPerl,
which is the sequence implementation for sequences with quality data and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as a
> summer student. I have some limited experience with writing perl modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Thu Jun 29 15:01:52 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Thu, 29 Jun 2006 11:01:52 -0400
Subject: [Bioperl-l] BioPerl and quality files
Message-ID: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>


Thanks Chris, 

I don't know how I didn't come up with this before. Can I use
Bio::SeqIO::qual as follows?

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

Cheers, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Thursday, June 29, 2006 8:55 AM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] BioPerl and quality files

> Recently I was working on a project which required some manipulation
of
> Quality files. I may be wrong in this, but I don't believe that there
is
> a Quality format for Bio:SeqIO. If there is, someone could point me in
> the right direction as I could write a much nicer script then what I
> currently have, if not I was wondering if anyone here has any use for
> such a thing. I am pretty new to developing but would be willing to
give
> it a shot, as I feel that for all the use I get out of BioPerl with no
> thanks to anyone who spent time on writing something I used, I could
try
> and contribute my limited amount. Any comments would be appreciated,
and
> don't be afraid to tell me this is a lost cause. I realize that
quality
> files tend to be less important than FASTA sequence files. I will give
> you a little information on me so that you know what to expect/what I
am
> working with.

Here's a list I dredged up when looking for Bio::Seq::Quality in
BioPerl,
which is the sequence implementation for sequences with quality data
and/or
trace values:

Instances: 2    Module : Bio::Assembly::Contig
Instances: 2    Module : Bio::Assembly::IO::ace
Instances: 1    Module : Bio::Assembly::Singlet
Instances: 1    Module : Bio::Index::Fastq
Instances: 2    Module : Bio::Seq::Meta::Array
Instances: 1    Module : Bio::Seq::MetaI
Instances: 8    Module : Bio::Seq::Quality
Instances: 1    Module : Bio::Seq::SeqWithQuality
Instances: 6    Module : Bio::Seq::SequenceTrace
Instances: 1    Module : Bio::Seq::TraceI
Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 5    Module : Bio::SeqIO::qual
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

Does that help?

> I am a fourth year bioinformatics student, and am currently working as
a
> summer student. I have some limited experience with writing perl
modules
> and test scripts. Mostly I write perl to do specific jobs, that I or
> someone else has come up with to fill some immediate need of the
> company. I am interested in most things bioinformatics/computer
> sci/biology and am hoping to do Graduate studies when I finish my
> degree.
> 
> Well that's enough for now, if you have any comments/suggestions I
would
> appreciate it.

Always can use an extra hand!

Chris

> 
> 
> Cheers, Wayne
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Jun 29 15:21:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:21:21 -0500
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca>
Message-ID: <002001c69b8f$ad754450$15327e82@pyrimidine>

It should work that way, yes:  

my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');

# the below should return a Bio::Seq::Quality object
my $seq = $in->next_seq; 

You might want to check the other SeqIO modules as well depending on your
format:

...

Instances: 2    Module : Bio::SeqIO::abi
Instances: 2    Module : Bio::SeqIO::ctf
Instances: 2    Module : Bio::SeqIO::exp
Instances: 10   Module : Bio::SeqIO::fastq
Instances: 5    Module : Bio::SeqIO::phd
Instances: 3    Module : Bio::SeqIO::raw
Instances: 13   Module : Bio::SeqIO::scf
Instances: 2    Module : Bio::SeqIO::ztr

...

Chris

> Thanks Chris,
> 
> I don't know how I didn't come up with this before. Can I use
> Bio::SeqIO::qual as follows?
> 
> my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual');
> 
> Cheers, Wayne
...


From cjfields at uiuc.edu  Thu Jun 29 15:23:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 10:23:20 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
Message-ID: <002101c69b8f$f48bd070$15327e82@pyrimidine>

Sendu, 

The HOWTO explains everything:

http://www.bioperl.org/wiki/HOWTO:SearchIO

under "Implementation."  I learned this the hard way when I started working
on SearchIO::blast and wondered why it had so many *_element methods.  

Yes, you will need an EventHandler if you implement SearchIO; the
EventHandler should implement Bio::SearchIO::EventHandlerI interface.  You
might not need one that returns objects though (i.e. it could return
hashes).  And you could possibly get around the event handler somehow,
though if you plan on doing that, why not just work on Bio::Tools::Hmmpfam
as an alternative parser?  We've had other BLAST parsers before
(Bio::Tools::BPLite comes to mind); if they aren't maintained and there is a
viable alternative they can be deprecated.  Hence the reason I mentioned
working on your own version of SearchIO::hmmer; if that module becomes most
prevalently used we can deprecate the older version.

The idea that a SearchIO plugin should act like a SAX parser is based on the
fact that many files being parsed are quite large, so it would be nice to
have everything parsed as a stream (on-the-go) as opposed to preprocessing
everything into an object hierarchy (which can be very memory intensive for
large files).  Whether this is done in practice in all SearchIO modules is
another thing; it may be based upon what particular fixes were made over
time or the contributor's intentions.  

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, June 29, 2006 2:55 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
> 
> Jason Stajich wrote:
> >
> > Feel free to propose an alternative implement for parser as you see
> > fit as long as the API is preserved.  you can contibute a new
> > SearchIO plugin and HMMERSearchResultListener to deal with it - or
> [snip]
> 
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Thu Jun 29 15:05:54 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Thu, 29 Jun 2006 16:05:54 +0100
Subject: [Bioperl-l] BioPerl and quality files
In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca>
Message-ID: <44A3EC52.7030502@colibase.bham.ac.uk>

Hi Wayne.

I think Bio::SeqIO::qual is what you are looking for.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk


From jason at bioperl.org  Thu Jun 29 18:04:12 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 29 Jun 2006 14:04:12 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A3874A.9040803@sendu.me.uk>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
Message-ID: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>

however you want - the idea of listeners at the time was to make it  
more SAX like so we could throw away events we didn't want and speed  
up the whole system when there was some idea of how you wanted the  
data filtered.  That may have been too much wishful thinking and I  
just couldn't do it alone.


On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>
>> Feel free to propose an alternative implement for parser as you see
>> fit as long as the API is preserved.  you can contibute a new
>> SearchIO plugin and HMMERSearchResultListener to deal with it - or  
>> [snip]
>
> What's the thinking behind the way SearchIOs work? Is it necessary or
> desirable to always do it with events and listeners? Or is it  
> enough to
> simply return a ResultI regardless of how you made it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From prettyblondegirl222 at yahoo.com  Thu Jun 29 18:23:56 2006
From: prettyblondegirl222 at yahoo.com (S S)
Date: Thu, 29 Jun 2006 11:23:56 -0700 (PDT)
Subject: [Bioperl-l] TAKE ME OFF
Message-ID: <20060629182356.93810.qmail@web51305.mail.yahoo.com>

  
---------------------------------
How low will we go? Check out Yahoo! Messenger?s low  PC-to-Phone call rates.


From cjfields at uiuc.edu  Fri Jun 30 03:53:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Jun 2006 22:53:22 -0500
Subject: [Bioperl-l] SearchIO::blast, was Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
References: <007e01c69b00$2e091410$15327e82@pyrimidine>	<44A309FB.2050009@sendu.me.uk>
	<3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org>
	<44A3874A.9040803@sendu.me.uk>
	<166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org>
Message-ID: <7511BE75-3A87-4E78-BFEA-2B38210BAD85@uiuc.edu>

If we can work around the listener/handler that'll definitely speed  
things up.  I was thinking about tackling the SearchIO::blast parser  
next, refactoring it to use hashes as a separate plugin module; if I  
don't need the handler for that then it'll speed things up a bit.

Chris

On Jun 29, 2006, at 1:04 PM, Jason Stajich wrote:

> however you want - the idea of listeners at the time was to make it
> more SAX like so we could throw away events we didn't want and speed
> up the whole system when there was some idea of how you wanted the
> data filtered.  That may have been too much wishful thinking and I
> just couldn't do it alone.
>
>
> On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote:
>
>> Jason Stajich wrote:
>>>
>>> Feel free to propose an alternative implement for parser as you see
>>> fit as long as the API is preserved.  you can contibute a new
>>> SearchIO plugin and HMMERSearchResultListener to deal with it - or
>>> [snip]
>>
>> What's the thinking behind the way SearchIOs work? Is it necessary or
>> desirable to always do it with events and listeners? Or is it
>> enough to
>> simply return a ResultI regardless of how you made it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Fri Jun 30 12:45:15 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 30 Jun 2006 14:45:15 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A37B19.7030908@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
Message-ID: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>

Hi,

>My original question was essentially: does doing it my way make sense?
With respect to Sendu's points, I can only say that a colleague
(developer) and I were surprised that the HMMer parser did not group
the hits as the blast parser does, in "Hit" and "Hsp".
When we realized how hmmer parsing worked we continued with to use it
but used a check for multiple hits of one domain on 1 query sequence
(e.g. in hmmpfam).

Regards,
Bernd


From jason at bioperl.org  Fri Jun 30 14:05:01 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Jun 2006 10:05:01 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
Message-ID: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>

I understand the confusion and it was the intention of having HSPs  
grouped together under the same Hit initialy just like BLAST reports  
- but somehow in the bug-fix-cycle the way to deal with the fact that  
"HSPs" aren't ordered by the overall Hit table led to this design  
decision - the problem before was something with the ordering, but I  
must admit to not being able to remember what specifically was the  
problem t I can't really remember why I changed things to do this.   
Does 1.4 actually do it the way you expect?

Again, more user feedback is definitely critical to make these tools  
useful to everyone so please don't bashful about reporting your  
preferences.

-j

On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:

> Hi,
>
>> My original question was essentially: does doing it my way make  
>> sense?
> With respect to Sendu's points, I can only say that a colleague
> (developer) and I were surprised that the HMMer parser did not group
> the hits as the blast parser does, in "Hit" and "Hsp".
> When we realized how hmmer parsing worked we continued with to use it
> but used a check for multiple hits of one domain on 1 query sequence
> (e.g. in hmmpfam).
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Fri Jun 30 15:56:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 10:56:09 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
Message-ID: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>

It may have been just simpler to have it be one HSP (domain) per Hit  
(model) as that's how the reports are generated.  My reasoning was  
that using the one domain per model made sense based on what you are  
actually trying to do, which is annotate the sequence based on the  
order the domain appears.  Most others may not view it that way,  
which is fine.  One can always gather the relevant HSP's, convert to  
seqfeatures, then sort them if order is important, I suppose.

I would say, if the overall consensus is to modify it to have  
multiple domain hits per model (similar to BLAST) then Sendu should  
go ahead and make those changes then announce it on the list so no  
one can gripe about it later.  My main concern was not changing  
things so dramatically that it'll break for someone, but seeing as  
we've had a lengthy discussion about it already they should have  
piped up by now!   Well, that and trying to return everything as  
hashes as Jason suggested.  From looking at SearchIO::hmmer we need  
to make sure that both hmmsearch and hmmpfam work the same way (looks  
like they have different sections) and that the reported bug about  
missing hits (Bug 2036) is fixed as well.

Chris

On Jun 30, 2006, at 9:05 AM, Jason Stajich wrote:

> I understand the confusion and it was the intention of having HSPs
> grouped together under the same Hit initialy just like BLAST reports
> - but somehow in the bug-fix-cycle the way to deal with the fact that
> "HSPs" aren't ordered by the overall Hit table led to this design
> decision - the problem before was something with the ordering, but I
> must admit to not being able to remember what specifically was the
> problem t I can't really remember why I changed things to do this.
> Does 1.4 actually do it the way you expect?
>
> Again, more user feedback is definitely critical to make these tools
> useful to everyone so please don't bashful about reporting your
> preferences.
>
> -j
>
> On Jun 30, 2006, at 8:45 AM, Bernd Web wrote:
>
>> Hi,
>>
>>> My original question was essentially: does doing it my way make
>>> sense?
>> With respect to Sendu's points, I can only say that a colleague
>> (developer) and I were surprised that the HMMer parser did not group
>> the hits as the blast parser does, in "Hit" and "Hsp".
>> When we realized how hmmer parsing worked we continued with to use it
>> but used a check for multiple hits of one domain on 1 query sequence
>> (e.g. in hmmpfam).
>>
>> Regards,
>> Bernd
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Jun 30 16:14:05 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Jun 2006 17:14:05 +0100
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
Message-ID: <44A54DCD.3050708@sendu.me.uk>

Chris Fields wrote:
> It may have been just simpler to have it be one HSP (domain) per Hit 
> (model) as that's how the reports are generated.  My reasoning was that 
> using the one domain per model made sense based on what you are actually 
> trying to do, which is annotate the sequence based on the order the 
> domain appears.  Most others may not view it that way, which is fine.  
> One can always gather the relevant HSP's, convert to seqfeatures, then 
> sort them if order is important, I suppose.
> 
> I would say, if the overall consensus is to modify it to have multiple 
> domain hits per model (similar to BLAST) then Sendu should go ahead and 
> make those changes then announce it on the list so no one can gripe 
> about it later.  My main concern was not changing things so dramatically 
> that it'll break for someone

Going on your earlier suggestion, I was thinking about making 
SearchIO::hmmpfam instead, which would get used if you set the format to 
'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I 
suppose I would make a SearchIO::hmmsearch as well, if necessary.


[...]
> that the reported bug about missing hits (Bug 2036) is fixed as well.

However, having never made a SearchIO plugin before, it will be some 
time before I get my head around it. I'll want to make one the current 
HOWTO:SearchIO way before I can think about doing it a better way 
(hashes) as well. So I can say I'll make a move on this at some point in 
the future, but if someone wants to fix Bug 2036 in the mean time, they 
are welcome to. Again as suggested, my priority is Bio::Map right now.


From rmb32 at cornell.edu  Fri Jun 30 17:01:38 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 10:01:38 -0700
Subject: [Bioperl-l] parser for GeneSeqer
Message-ID: <44A558F2.2050304@cornell.edu>

Hi all,

I find myself needing a parser for GeneSeqer output, so I'm writing one 
(which I will submit for your consideration when it's working).  In a 
nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of 
ESTs to genomic sequence, then using those alignments to predict where 
in the genomic sequence the genes are.  So really what you get from this 
is a bunch of hierarchical features.

I don't really know where I should put it in the bioperl hierarchy 
though.  Probably FeatureIO?

And what's the current fashion for objects it should emit?  
Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at uiuc.edu  Fri Jun 30 17:43:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:43:56 -0500
Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
In-Reply-To: <44A54DCD.3050708@sendu.me.uk>
References: <000d01c69b3b$b17776d0$15327e82@pyrimidine>
	<44A37B19.7030908@sendu.me.uk>
	<716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com>
	<54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org>
	<6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu>
	<44A54DCD.3050708@sendu.me.uk>
Message-ID: <E2C6F66F-9B85-42D3-B2A0-BD7C8B222572@uiuc.edu>

I'll try looking at it this weekend.  A suggested workaround is to  
either try setting -A for no alignments or setting it to a high  
number to retrieve all of them.  It's pretty serious as the error  
silently dumps those domains, so for those using automated annotation  
pipelines would miss it unless they are also checking the raw output.

You could design a SearchIO::hmmpfam parser then expand it to take in  
hmmsearch output at a later point, or keep them separate.  I like the  
idea of having modules that are more specific about what they parse;  
seems at some point you reach serious code bloat and maintenance  
becomes an issue.  Look at SearchIO::blast; it parses various text  
BLAST output very well but with some serious obfuscation.  Just don't  
know how productive it would be to separate out the PSI-BLAST and  
bl2seq stuff since they are pretty close to a standard BLAST  
report... oh well.

To Jason : good luck on your move.  Drop  us a line here to let us  
know everything went well.

Chris

On Jun 30, 2006, at 11:14 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> It may have been just simpler to have it be one HSP (domain) per Hit
>> (model) as that's how the reports are generated.  My reasoning was  
>> that
>> using the one domain per model made sense based on what you are  
>> actually
>> trying to do, which is annotate the sequence based on the order the
>> domain appears.  Most others may not view it that way, which is fine.
>> One can always gather the relevant HSP's, convert to seqfeatures,  
>> then
>> sort them if order is important, I suppose.
>>
>> I would say, if the overall consensus is to modify it to have  
>> multiple
>> domain hits per model (similar to BLAST) then Sendu should go  
>> ahead and
>> make those changes then announce it on the list so no one can gripe
>> about it later.  My main concern was not changing things so  
>> dramatically
>> that it'll break for someone
>
> Going on your earlier suggestion, I was thinking about making
> SearchIO::hmmpfam instead, which would get used if you set the  
> format to
> 'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I
> suppose I would make a SearchIO::hmmsearch as well, if necessary.
>
>
> [...]
>> that the reported bug about missing hits (Bug 2036) is fixed as well.
>
> However, having never made a SearchIO plugin before, it will be some
> time before I get my head around it. I'll want to make one the current
> HOWTO:SearchIO way before I can think about doing it a better way
> (hashes) as well. So I can say I'll make a move on this at some  
> point in
> the future, but if someone wants to fix Bug 2036 in the mean time,  
> they
> are welcome to. Again as suggested, my priority is Bio::Map right now.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Jun 30 17:54:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Jun 2006 12:54:23 -0500
Subject: [Bioperl-l] parser for GeneSeqer
In-Reply-To: <44A558F2.2050304@cornell.edu>
References: <44A558F2.2050304@cornell.edu>
Message-ID: <2FB066C7-12E6-46D8-8F4A-BD096BE2A0CA@uiuc.edu>

If you plan on generating seqfeatures from this output you could  
check out the Bio::Tools core modules for examples.  There are a few  
there that take program output and convert them to  
Bio::SeqFeature::Generic objects, including Bio::Tools:RNAMotif and  
Bio::Tools::tRNAscanSE.  If alignments are involved you might want  
something like Bio::SeqFeature::FeaturePair.  Not sure about using  
the SeqFeature::Annotation or others; I thought that the some of the  
Annotation/Annotatable stuff might be changing soon but I may be wrong.

Chris

On Jun 30, 2006, at 12:01 PM, Robert Buels wrote:

> Hi all,
>
> I find myself needing a parser for GeneSeqer output, so I'm writing  
> one
> (which I will submit for your consideration when it's working).  In a
> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> ESTs to genomic sequence, then using those alignments to predict where
> in the genomic sequence the genes are.  So really what you get from  
> this
> is a bunch of hierarchical features.
>
> I don't really know where I should put it in the bioperl hierarchy
> though.  Probably FeatureIO?
>
> And what's the current fashion for objects it should emit?
> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>
> Rob
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From rmb32 at cornell.edu  Fri Jun 30 19:32:11 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 30 Jun 2006 12:32:11 -0700
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
In-Reply-To: <29201430510651801@webmail.iastate.edu>
References: <29201430510651801@webmail.iastate.edu>
Message-ID: <44A57C3B.8040808@cornell.edu>

Aha!  Isn't it amazing what gets revealed when you just get off your 
butt and ask on the mailing list.

I'll look at that code straightaway.  The concept is quite attractive to 
me, since GenomeThreader is the next program that I'm going to be 
integrating into my analysis stuff.  Unfortunately, (I am under the 
impression that) my GeneSeqer parser is almost finished.

This brings us to the next question, what about parsing the 
GenomeThreader XML?  Would be lovely to have a Bioperl interface for 
that.  Is there some code floating about for that too?

Rob

Michael E Sparks wrote:
> Hi Rob,
>
> For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
>  You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/
>
> There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
> an XML format also used by the GenomeThreader spliced alignment program, whose
> schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
> 0README in the above directory will give you an overview of what tools I've made
> available.  Hope you find it useful!
>
> Regards,
> Michael
>
> --
> Thanks,
> Michael E Sparks
> Graduate Assistant, Brendel Lab
> 2128 Molecular Biology Building
> Iowa State University
> Ames, IA 50011-3260
> 1-515-294-4063
> http://www.public.iastate.edu/~mespar1/
>
>
> Forwarded Message:
>   
>> To: <plantgdb at iastate.edu>
>> From: "Shannon D Schlueter" <sds at iastate.edu>
>> Subject: FW: [Bioperl-l] parser for GeneSeqer
>> Date: Fri, 30 Jun 2006 13:01:46 -0500
>> -----
>>     
>>> Date: Fri, 30 Jun 2006 10:01:38 -0700
>>> From: Robert Buels <rmb32 at cornell.edu>
>>> User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
>>> To: bioperl-l at bioperl.org
>>> Subject: [Bioperl-l] parser for GeneSeqer
>>> Sender: bioperl-l-bounces at lists.open-bio.org
>>>
>>> Hi all,
>>>
>>> I find myself needing a parser for GeneSeqer output, so I'm writing one
>>> (which I will submit for your consideration when it's working).  In a
>>> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
>>> ESTs to genomic sequence, then using those alignments to predict where
>>> in the genomic sequence the genes are.  So really what you get from this
>>> is a bunch of hierarchical features.
>>>
>>> I don't really know where I should put it in the bioperl hierarchy
>>> though.  Probably FeatureIO?
>>>
>>> And what's the current fashion for objects it should emit? 
>>> Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
>>>
>>> Rob
>>>
>>> --
>>> Robert Buels
>>> SGN Bioinformatics Analyst
>>> 252A Emerson Hall, Cornell University
>>> Ithaca, NY  14853
>>> Tel: 503-889-8539
>>> rmb32 at cornell.edu
>>> http://www.sgn.cornell.edu
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>
>
>
>   

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From mespar1 at iastate.edu  Fri Jun 30 19:20:29 2006
From: mespar1 at iastate.edu (Michael E Sparks)
Date: Fri, 30 Jun 2006 14:20:29 -0500 (CDT)
Subject: [Bioperl-l] Fwd: FW:  parser for GeneSeqer
Message-ID: <29201430510651801@webmail.iastate.edu>

Hi Rob,

For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output.
 You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/

There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into
an XML format also used by the GenomeThreader spliced alignment program, whose
schema is specified in a RELAX NG document, GenomeThreader.rng.txt.  The file
0README in the above directory will give you an overview of what tools I've made
available.  Hope you find it useful!

Regards,
Michael

--
Thanks,
Michael E Sparks
Graduate Assistant, Brendel Lab
2128 Molecular Biology Building
Iowa State University
Ames, IA 50011-3260
1-515-294-4063
http://www.public.iastate.edu/~mespar1/


Forwarded Message:
> To: <plantgdb at iastate.edu>
> From: "Shannon D Schlueter" <sds at iastate.edu>
> Subject: FW: [Bioperl-l] parser for GeneSeqer
> Date: Fri, 30 Jun 2006 13:01:46 -0500
> -----
> >Date: Fri, 30 Jun 2006 10:01:38 -0700
> >From: Robert Buels <rmb32 at cornell.edu>
> >User-Agent: Thunderbird 1.5.0.2 (X11/20060516)
> >To: bioperl-l at bioperl.org
> >Subject: [Bioperl-l] parser for GeneSeqer
> >Sender: bioperl-l-bounces at lists.open-bio.org
> >
> >Hi all,
> >
> >I find myself needing a parser for GeneSeqer output, so I'm writing one
> >(which I will submit for your consideration when it's working).  In a
> >nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of
> >ESTs to genomic sequence, then using those alignments to predict where
> >in the genomic sequence the genes are.  So really what you get from this
> >is a bunch of hierarchical features.
> >
> >I don't really know where I should put it in the bioperl hierarchy
> >though.  Probably FeatureIO?
> >
> >And what's the current fashion for objects it should emit? 
> >Bio::SeqFeature::Generic?  Bio::SeqFeature::Annotated?
> >
> >Rob
> >
> >--
> >Robert Buels
> >SGN Bioinformatics Analyst
> >252A Emerson Hall, Cornell University
> >Ithaca, NY  14853
> >Tel: 503-889-8539
> >rmb32 at cornell.edu
> >http://www.sgn.cornell.edu
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>