From shameer at ncbs.res.in  Tue May  1 07:36:31 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 17:06:31 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
Message-ID: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>

Dear All,

I am trying to impliment a bioperl based program to generate a dynamic,
clickable image. I have used Dr. Lincoln Steins's code provided in
example3 at this URL :
http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
be perfect for my purpose.

I need to add few modifications to the image. I reffered the Bio::Graphics
HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
(may be am missing something imp.. ? )  but I couldnt get a quick
solution, Thought I will ask about it to the experts for some tips and
tricks.

This is what I am looking for :

1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
changed according to length of the sequence. My sequence length is usually
in a range of 70 - 200.

2. I also need to make the image interactive / clickable on the various
blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
used instead of name of the blast hits)


Many thanks in advance for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 12:04:13 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:13 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42403.192.168.1.1.1178035453.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 12:04:11 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:11 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From cain at cshl.edu  Tue May  1 10:04:09 2007
From: cain at cshl.edu (Scott Cain)
Date: Tue, 01 May 2007 10:04:09 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
Message-ID: <1178028249.2644.13.camel@localhost.localdomain>

Hi Shameer,

There is a fair amount of documentation in the perldoc for
Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
you read that?  Also, for changing the scale, that should happen
automatically--have you tried yet?

Scott


On Tue, 2007-05-01 at 17:06 +0530, Shameer Khadar wrote:
> Dear All,
> 
> I am trying to impliment a bioperl based program to generate a dynamic,
> clickable image. I have used Dr. Lincoln Steins's code provided in
> example3 at this URL :
> http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
> be perfect for my purpose.
> 
> I need to add few modifications to the image. I reffered the Bio::Graphics
> HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
> (may be am missing something imp.. ? )  but I couldnt get a quick
> solution, Thought I will ask about it to the experts for some tips and
> tricks.
> 
> This is what I am looking for :
> 
> 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
> changed according to length of the sequence. My sequence length is usually
> in a range of 70 - 200.
> 
> 2. I also need to make the image interactive / clickable on the various
> blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
> used instead of name of the blast hits)
> 
> 
> Many thanks in advance for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/f84a3220/attachment.bin 

From cjfields at uiuc.edu  Tue May  1 13:10:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 12:10:10 -0500
Subject: [Bioperl-l] Pb makefile
In-Reply-To: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
References: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
Message-ID: <D975B11D-1303-4CF4-AE3B-878881964DB9@uiuc.edu>

Is there any reason you want to install bioperl 1.4 (which is over 3  
yrs old)?  The latest is v.1.5.2 (Dec. 2006); man page generation has  
been fixed for that version, which uses Module::Build.

The man page generation was turned off prior to 1.4, though I may be  
wrong.  Based on the Extutils::MakeMaker FAQ you should be able to  
prevent man page generation this way:

perl Makefile.PL INSTALLMAN1DIR=none INSTALLMAN3DIR=none

chris

On Apr 30, 2007, at 5:35 AM, Francoise.LECOMTE at biogemma.com wrote:

> Hi
> I try to install biopoerl1.4 on Tru64 plateform and I've got a message
> "make:line too long" when I run the command make install
> How can I solve it ? How disable man pages installaton in  
> Makefile.PL if
> it can sove this problem
>
> Best regards
>
> Fran?oise Lecomte


From cain.cshl at gmail.com  Tue May  1 15:50:42 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 01 May 2007 15:50:42 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <1178049042.2644.36.camel@localhost.localdomain>

Perhaps if you provided some code and sample data we might be able to
help you better.

Scott


On Tue, 2007-05-01 at 21:34 +0530, Shameer Khadar wrote:
> Dear Scot,
> 
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
> 
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
> 
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
> 
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>  to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
> 
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
> 
> Thanks a lot for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/9c655e4c/attachment.bin 

From agathman at semo.edu  Tue May  1 19:10:20 2007
From: agathman at semo.edu (Gathman, Allen)
Date: Tue, 1 May 2007 18:10:20 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
Message-ID: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>

Hi, all --
 
I've been using BioPerl 1.4 for a while; recently I installed 1.5.2, and
found that scripts that had been using spliced_seq are now broken.  Any
thoughts on what might be going on? 
 
Here's a sample script:
 
*********************************************
 
#!/usr/bin/perl -w
 
use strict;
use Bio::DB::GFF;

my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
                               -dsn        =>
'dbi:mysql:database=cc;host=localhost',
                               -fasta      => '/gbrowse/databases/cc'
                               );
$db->add_aggregator('transcript{CDS/mRNA}');
my $seg=$db->segment('ccin_Contig120');
my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
 
for my $gene (@genes) {
    my $gid = $gene->display_id;
 
    print STDERR "Gene is $gid\n";
    my $splgene = $gene->spliced_seq();
}

********************************************
The line with "spliced_seq" in it crashes the program.  Here's the
STDERR output:
 
Gene is Jan06m400_GLEAN_11487

-------------------- WARNING ---------------------

MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
absolute set to 1 -- be warned you may not be getting things on the
correct strand

---------------------------------------------------

-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is
::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::,(0,881935
,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to
[Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170)Bio::Prim
arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308)Bio::PrimarySeq=HAS
H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH(0x881f4a
4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] which
does not look healthy

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359

STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258

STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210

STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484

STACK: Bio::SeqFeatureI::spliced_seq
/usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498

STACK: /transfer/testsplice.pl:20

-----------------------------------------------------------

Allen Gathman

http://cstl-csm.semo.edu/gathman

 
From cjfields at uiuc.edu  Tue May  1 20:27:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 19:27:46 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
In-Reply-To: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
References: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
Message-ID: <9F00B020-AFF0-40DB-9694-6061B5A11A73@uiuc.edu>

Can you file a bug on this?  Attach the script and maybe detail what  
data is loaded into your local MySQL database (if possible).

chris

On May 1, 2007, at 6:10 PM, Gathman, Allen wrote:

> Hi, all --
>
> I've been using BioPerl 1.4 for a while; recently I installed  
> 1.5.2, and
> found that scripts that had been using spliced_seq are now broken.   
> Any
> thoughts on what might be going on?
>
> Here's a sample script:
>
> *********************************************
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::DB::GFF;
>
> my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
>                                -dsn        =>
> 'dbi:mysql:database=cc;host=localhost',
>                                -fasta      => '/gbrowse/databases/cc'
>                                );
> $db->add_aggregator('transcript{CDS/mRNA}');
> my $seg=$db->segment('ccin_Contig120');
> my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
>
> for my $gene (@genes) {
>     my $gid = $gene->display_id;
>
>     print STDERR "Gene is $gid\n";
>     my $splgene = $gene->spliced_seq();
> }
>
> ********************************************
> The line with "spliced_seq" in it crashes the program.  Here's the
> STDERR output:
>
> Gene is Jan06m400_GLEAN_11487
>
> -------------------- WARNING ---------------------
>
> MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
> absolute set to 1 -- be warned you may not be getting things on the
> correct strand
>
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
>
> MSG: seq doesn't validate, mismatch is
> ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::, 
> (0,881935
> ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)
>
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
>
> MSG: Attempting to set the sequence to
> [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170) 
> Bio::Prim
> arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308) 
> Bio::PrimarySeq=HAS
> H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH 
> (0x881f4a
> 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)]  
> which
> does not look healthy
>
> STACK: Error::throw
>
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359
>
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258
>
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210
>
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484
>
> STACK: Bio::SeqFeatureI::spliced_seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498
>
> STACK: /transfer/testsplice.pl:20
>
> -----------------------------------------------------------
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From shameer at ncbs.res.in  Tue May  1 23:46:59 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 2 May 2007 09:16:59 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178049042.2644.36.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<1178049042.2644.36.camel@localhost.localdomain>
Message-ID: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>

Dear Scott,

Once thanks a lot for your inputs.

I am following same  data formats as in
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
The purpose is to display these results in a web page.

I am using the program in Stein's Bio::Graphics example
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

I need exactly same image as in
http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
only difference is I need the scale (0.1k - 0.9k) in a range of simple
1-XXX , here XXX depends on the length of the sequence input.

Many thanks for your help,


> Perhaps if you provided some code and sample data we might be able to
> help you better.
>
> Scott
>

-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed May  2 06:02:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 2 May 2007 06:02:48 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<1178049042.2644.36.camel@localhost.localdomain>
	<59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
Message-ID: <200705020602.48404.sdavis2@mail.nih.gov>

On Tuesday 01 May 2007 23:46, Shameer Khadar wrote:
> Dear Scott,
>
> Once thanks a lot for your inputs.
>
> I am following same  data formats as in
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
> Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
> blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
> The purpose is to display these results in a web page.

Do you have your data loaded into bioperl objects?  What code did you use for 
that (post that code)?

> I am using the program in Stein's Bio::Graphics example
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

Does this example run on your computer?  Have you been able to use the bioperl 
objects you created in the first step in the creation of a graphic?  If not, 
what have you tried (post the code) and any error messages.

> I need exactly same image as in
> http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
> only difference is I need the scale (0.1k - 0.9k) in a range of simple
> 1-XXX , here XXX depends on the length of the sequence input.

Again, what have you tried?  Posting code is helpful here, also.  

I'm not an expert in bioperl graphics, but it does really help those that know 
to see the code that you have written to know how best to help.  

Sean

From lzlgboy at gmail.com  Wed May  2 09:58:14 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 2 May 2007 21:58:14 +0800
Subject: [Bioperl-l] Extract CDS from CDNA given Protein SEQs
Message-ID: <d78b3d40705020658w1bee4c68s3058a63ef23c62a1@mail.gmail.com>

Hi ,everyone

   I got a task to extract cds sequences from cdna , and I have the protein
sequence for each cdna, what should I do?
   Should I try 3_frmae_translate? But how.
   Thanks.

-- 
??????
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From MEC at stowers-institute.org  Wed May  2 18:38:31 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 2 May 2007 17:38:31 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>

Lincoln, 
 
Here for your comment and review is a very reworked version of
Bio::Graphics::FeatureBase->gff3_string.
 
The main difference is to that homogenous children get ALL their
attributes except for start/stop from the parent, including the group.
I also provide option as to whether or now to "remove extraneous level
of parentage" called $preserveHomegenousParent.
 
There is an in-line comment and question for you in the code body.
 
It works well in my hands to my use cases, but, I'm not positive it is
in the spirit of your intentions.
 
Cheers,
 
Malcolm
 
 
sub gff3_string {
  my ($self, $recurse, $preserveHomegenousParent,
 
      # Note: the following parameters, whose name begins with '$_',
      # are intended for recursive call only.
 
      $_parent,
      $_self_is_hsf,  # is $self the child in a homogeneous parent/child
relationship?
      $_hsf_parentgroup, # if so, what is the group (GFF column 9) of
the parent
     ) = @_;
 
  # PURPOSE: Return GFF3 format for the feature $self.  Optionally
  # $recurse to include GFF for any subfeatures of the feature. If
  # recursing, provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have subfeatures all of whose types are the same as the
  # feature itself (the "homogenous parent/child" case). This usage is
  # a convention for representing discontiguous features; they may be
  # created by using the -segment directive without specifying a
  # distinct -subtype in to `new` when creating a
  # Bio::Graphics::FeatureBase (i.e.  Bio::DB::SeqFeature,
  # Bio::Graphics::Feature).  Such homogenous subfeatures created in
  # this fashion DO NOT have the parent (GFF column 9) attributes
  # propogated to them; so, since they are all part of the same
  # parent, the ONLY difference relevant to GFF production SHOULD be
  # the $start and $end coordinates for their segment, and ALL THIER
  # OTHER ATTRIBUTES should be copied down from the parent (including:
  # strand, score, Name, ID, Parent, etc).
 

  my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's
parent, if it is a homogenous child, otherwise $self.
 
  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
    my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be
TRUE only if  all subfeatures are the same type as $self.
    my $mygroup =
      # compute $self's group if it is needed to be passed down to
      # subfeatures, unless it is already being passed down (in which
      # case there are (at least) 3 levels of homogenous parent child
      # (will this ever happen in practice???))
      ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent); 
    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
  } else {
    my $name  = $hparentORself->name;
    my $class = $hparentORself->class;
    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
    my $strand = ('-','.','+')[$self->strand+1]; 
    # TODO: understand conditions under which this could be other than
    # hparentORself->strand.  In particular, why does add_segment flip
    # the strand when start > stop?  I thought this was not allowed!
    # Lincoln - any ideas?
    my $p      = join("\t",
 
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
        $self->start||'.',$self->stop||'.',
        defined($hparentORself->score) ? $hparentORself->score : '.',
        $strand||'.',
        defined($hparentORself->phase) ? $hparentORself->phase : '.',
        $group||'');
  }
}
 

________________________________

	From: Cook, Malcolm 
	Sent: Friday, April 27, 2007 1:45 PM
	To: 'lincoln.stein at gmail.com'
	Cc: 'lstein at cshl.org'; 'bioperl list'
	Subject: RE: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Lincoln,
	 
	Cool.
	 
	The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
	 

	Malcolm Cook
	Database Applications Manager - Bioinformatics
	Stowers Institute for Medical Research - Kansas City, Missouri
	  

________________________________

		From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
		Sent: Friday, April 27, 2007 12:45 PM
		To: Cook, Malcolm
		Cc: lstein at cshl.org; bioperl list
		Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
		
		
		Hi Malcom,
		
		This is absolutely ok and you can go ahead and commit.
Thanks for figuring this out!
		
		Lincoln
		
		
		On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

			Lincoln, et al,
			
			I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
			from a Bio::DB::SeqFeature::Store that were
initially created with
			-seqments (i.e. whose location was
discontiguous) does not display any
			other attributes in column 9 than "Name".
			
			What do you think of the following patch to
Bio::Graphics::FeatureBase, 
			whose effect is to "contrive to return
(duplicated) common group values"
			(which otherwise get lost when "collapsing"
"homogenous" parent/child
			features)
			
			Another approach would be to copy the attributes
from the parent to the 
			children when the -seqments are first created.
			
			Another approach would be to use
Bio::SeqFeature::Generic  as the db's
			-seqfeature_class and save with -location being
a Bio::Location::Split,
			but this was wrougth with other problems. 
			
			Any other suggestions?  Do you want me to commit
this patch?
			
			Cheers,
			
			Malcolm
			
			Patch follows:
			
			
			Index: FeatureBase.pm
	
=================================================================== 
			RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
			retrieving revision 1.29
			diff -c -r1.29 FeatureBase.pm
			*** FeatureBase.pm      16 Apr 2007 19:55:33
-0000      1.29
			--- FeatureBase.pm       26 Apr 2007 16:30:23
-0000
			***************
			*** 581,587 ****
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     return join "\n", at children;
			    }
			
			    return join("\n",$p, at children);
			--- 581,589 ----
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     #return join "\n", at children; 
			!     # Instead of above, additionally, contrive
to return (duplicated)
			common group values
			!     return(join("$group\n", at children) .
$group);
			    }
			
			    return join("\n",$p, at children); 
			

		-- 
		Lincoln D. Stein
		Cold Spring Harbor Laboratory
		1 Bungtown Road
		Cold Spring Harbor, NY 11724
		(516) 367-8380 (voice)
		(516) 367-8389 (fax)
		FOR URGENT MESSAGES & SCHEDULING, 
		PLEASE CONTACT MY ASSISTANT, 
		SANDRA MICHELSEN, AT michelse at cshl.edu 


From lstein at cshl.edu  Thu May  3 12:01:38 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 12:01:38 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>

The width of the image is determined by the -width attribute and is given in
pixels. You cannot control the height of the image as it is computed
dynamically based on the number of features and bumping options.

Lincoln

On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>
> Dear Scot,
>
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
>
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
>
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
>
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
> to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
>
> Thanks a lot for your inputs,
> --
> Shameer Khadar
> Lab (# 25) The Computational Biology Group
> National Centre for Biological Sciences (TIFR)
> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
> T - 91-080-23666001 EXT - 6251
> W - http://www.ncbs.res.in
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From bioperlanand at yahoo.com  Thu May  3 16:09:18 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Thu, 3 May 2007 13:09:18 -0700 (PDT)
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
Message-ID: <922386.19570.qm@web36808.mail.mud.yahoo.com>

Hi

I am using Bioperl 1.4 and I am trying to obtain protein sequences for specific Uniprot records.

For some records (ROA1_HUMAN), it prints the correct sequence, but  it first prints the warning "Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <STREAM> line 43." 

For other records (BOLA_HAEIN), it prints the correct sequence (without any warnings).

Here is the code:
-------------------------------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Bio::Perl;
use Bio::DB::SwissProt;

my $sp = new Bio::DB::SwissProt;

#my $seq_object  = $sp->get_Seq_by_id('ROA1_HUMAN');
my $seq_object  = $sp->get_Seq_by_id('BOLA_HAEIN');

my $sequence_as_a_string = $seq_object->seq();
print "$sequence_as_a_string\n";
-------------------------------------------------------------------------------------------

 Is there something I need to fix.

Thanks in advance for the help.
 
 Anand

       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.

From MEC at stowers-institute.org  Thu May  3 16:19:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 3 May 2007 15:19:00 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
	<CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>
	<6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E230A6@exchkc02.stowers-institute.org>

Lincoln,
 
Ah, yes, round-tripping GFF, the holy grail....
 
Unfortunately, I don't really have a baseline to go against for an
example that roundtrips successfully now.  Do you?
 
For example, after loading test data: 
 
> bp_seqfeature_load.PLS  bioperl-live/t/data/biodbgff/test.gff3
 
the Contig1 portion of which looks like this:
 
##gff-version 3
## sequence-region Contig1 1 37450
Contig1 confirmed transcript 1001 2000 42 + .
ID=Transcript:trans-1;Gene=abc-1;Gene=xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . ID=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 ID=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 ID=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 ID=Transcript:trans-1
Contig1 est similarity 1001 1100 96 . . Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . . Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . . Target=EST:CEESC13F 201 250 +
Contig1 tc1 transposon 5001 6000 . + . ID=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
ID=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - . ID=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - . ID=Transcript:trans-2
 
 
and then generating output with
 
>bp_seqfeature_gff3.PLS --gff=1 -- seq_id Contig1  #  using a script I
just committed - I hope you like it.  Note: gff=1 => recurse 
 
we get output gff with problems such as:
 
    1 IDs get turned into Aliases
    2 the seqid of a Target attributes gets copied into the features
Name attribute
    3 supression of parents of homogeneous subfeatures doesn't work when
the parent has other subfeatures that those with its same type (i.e. the
transcript feature also has exon subfeatures)
 
look:
 
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed transcript 1001 2000 42 + .
Parent=2;Alias=Transcript:trans-1;Note=function+unknown;Gene=abc-1,xyz-2
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed transcript 30001 31000 . - .
Parent=9;Alias=Transcript:trans-2;Note=Terribly+interesting;Gene=xyz-2
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
with my new version of gff3_string (not yet commited), only the 3rd
problem is addressed, generating
 
bp_seqfeature_gff3.PLS --gff 1  -- seq_id Contig1
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
 
I had to make another change to get this output though, since I had to
change the behaviour to
 
  # provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have at least one subfeature with the same type as the
  # feature itself (thus redefining Lincoln's "homogenous
  # parent/child" case, which previously required all children to have
  # the same type as parent)
 
 
I think you will agree this is the more desirable behaviour.
 
I would be happy to test any other GFF you suggest might be (more or
less) roundtripped.
 
What think you?
 
--Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Thursday, May 03, 2007 9:46 AM
	To: Cook, Malcolm
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcolm,
	
	For me, the major use case is that GFF3 files round-trip
correctly through the database. Do any of your use cases cover that?
	
	Lincoln
	
	
	On 5/2/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, 
		 
		Here for your comment and review is a very reworked
version of Bio::Graphics::FeatureBase->gff3_string.
		 
		The main difference is to that homogenous children get
ALL their attributes except for start/stop from the parent, including
the group.  I also provide option as to whether or now to "remove
extraneous level of parentage" called $preserveHomegenousParent.
		 
		There is an in-line comment and question for you in the
code body.
		 
		It works well in my hands to my use cases, but, I'm not
positive it is in the spirit of your intentions.
		 
		Cheers,
		 
		Malcolm
		 
		 
		sub gff3_string {
		  my ($self, $recurse, $preserveHomegenousParent,
		 
		      # Note: the following parameters, whose name
begins with '$_',
		      # are intended for recursive call only.
		 
		      $_parent,
		      $_self_is_hsf,  # is $self the child in a
homogeneous parent/child relationship?
		      $_hsf_parentgroup, # if so, what is the group (GFF
column 9) of the parent
		     ) = @_;
		 
		  # PURPOSE: Return GFF3 format for the feature $self.
Optionally
		  # $recurse to include GFF for any subfeatures of the
feature. If
		  # recursing, provide special handling to "remove an
extraneous level
		  # of parentage" (unless $preserveHomegenousParent) for
features
		  # which have subfeatures all of whose types are the
same as the
		  # feature itself (the "homogenous parent/child" case).
This usage is
		  # a convention for representing discontiguous
features; they may be
		  # created by using the -segment directive without
specifying a
		  # distinct -subtype in to `new` when creating a
		  # Bio::Graphics::FeatureBase (i.e.
Bio::DB::SeqFeature,
		  # Bio::Graphics::Feature).  Such homogenous
subfeatures created in
		  # this fashion DO NOT have the parent (GFF column 9)
attributes
		  # propogated to them; so, since they are all part of
the same
		  # parent, the ONLY difference relevant to GFF
production SHOULD be
		  # the $start and $end coordinates for their segment,
and ALL THIER
		  # OTHER ATTRIBUTES should be copied down from the
parent (including:
		  # strand, score, Name, ID, Parent, etc).
		 
		
		  my $hparentORself = $_self_is_hsf ? $_parent : $self;
# $self's parent, if it is a homogenous child, otherwise $self.
		 
		  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
		    my $homogenous = ! grep {$_->type ne $self->type}
@ssf; # will be TRUE only if  all subfeatures are the same type as
$self.
		    my $mygroup =
		      # compute $self's group if it is needed to be
passed down to
		      # subfeatures, unless it is already being passed
down (in which
		      # case there are (at least) 3 levels of homogenous
parent child
		      # (will this ever happen in practice???))
		      ! $homogenous ? '' : $_self_is_hsf ?
$_hsf_parentgroup : $self->format_attributes($_parent); 
		    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
		  } else {
		    my $name  = $hparentORself->name;
		    my $class = $hparentORself->class;
		    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
		    my $strand = ('-','.','+')[$self->strand+1]; 
		    # TODO: understand conditions under which this could
be other than
		    # hparentORself->strand.  In particular, why does
add_segment flip
		    # the strand when start > stop?  I thought this was
not allowed!
		    # Lincoln - any ideas?
		    my $p      = join("\t",
	
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
		        $self->start||'.',$self->stop||'.',
		        defined($hparentORself->score) ?
$hparentORself->score : '.',
		        $strand||'.',
		        defined($hparentORself->phase) ?
$hparentORself->phase : '.',
		        $group||'');
		  }
		}
		 
		
________________________________

			From: Cook, Malcolm 
			Sent: Friday, April 27, 2007 1:45 PM
			To: 'lincoln.stein at gmail.com'
			Cc: 'lstein at cshl.org'; 'bioperl list'
			Subject: RE: Handling discontiguous feature
locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
			
			
			Hi Lincoln,
			 
			Cool.
			 
			The principal of what I figured out I still
think holds but the implementation is slightly broke.  Improved patch
forthoming next week.
			 

			Malcolm Cook
			Database Applications Manager - Bioinformatics
			Stowers Institute for Medical Research - Kansas
City, Missouri
			  

________________________________

				From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
				Sent: Friday, April 27, 2007 12:45 PM
				To: Cook, Malcolm
				Cc: lstein at cshl.org; bioperl list
				Subject: Re: Handling discontiguous
feature locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
				
				
				Hi Malcom,
				
				This is absolutely ok and you can go
ahead and commit. Thanks for figuring this out!
				
				Lincoln
				
				
				On 4/26/07, Cook, Malcolm <
MEC at stowers-institute.org <mailto:MEC at stowers-institute.org> > wrote: 

				Lincoln, et al,
				
				I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
				from a Bio::DB::SeqFeature::Store that
were initially created with
				-seqments (i.e. whose location was
discontiguous) does not display any
				other attributes in column 9 than
"Name".
				
				What do you think of the following patch
to Bio::Graphics::FeatureBase, 
				whose effect is to "contrive to return
(duplicated) common group values"
				(which otherwise get lost when
"collapsing" "homogenous" parent/child
				features)
				
				Another approach would be to copy the
attributes from the parent to the 
				children when the -seqments are first
created.
				
				Another approach would be to use
Bio::SeqFeature::Generic  as the db's
				-seqfeature_class and save with
-location being a Bio::Location::Split,
				but this was wrougth with other
problems. 
				
				Any other suggestions?  Do you want me
to commit this patch?
				
				Cheers,
				
				Malcolm
				
				Patch follows:
				
				
				Index: FeatureBase.pm
	
=================================================================== 
				RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
				retrieving revision 1.29
				diff -c -r1.29 FeatureBase.pm
				*** FeatureBase.pm      16 Apr 2007
19:55:33 -0000      1.29
				--- FeatureBase.pm       26 Apr 2007
16:30:23 -0000
				***************
				*** 581,587 ****
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     return join "\n", at children;
				    }
				
				    return join("\n",$p, at children);
				--- 581,589 ----
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     #return join "\n", at children; 
				!     # Instead of above, additionally,
contrive to return (duplicated)
				common group values
				!     return(join("$group\n", at children)
. $group);
				    }
				
				    return join("\n",$p, at children); 
				

				-- 
				Lincoln D. Stein
				Cold Spring Harbor Laboratory
				1 Bungtown Road
				Cold Spring Harbor, NY 11724
				(516) 367-8380 (voice)
				(516) 367-8389 (fax)
				FOR URGENT MESSAGES & SCHEDULING, 
				PLEASE CONTACT MY ASSISTANT, 
				SANDRA MICHELSEN, AT michelse at cshl.edu 


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Thu May  3 16:57:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 3 May 2007 15:57:43 -0500
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
In-Reply-To: <922386.19570.qm@web36808.mail.mud.yahoo.com>
References: <922386.19570.qm@web36808.mail.mud.yahoo.com>
Message-ID: <2930F3F1-2BFB-4320-9A2C-50DFE6F808A1@uiuc.edu>

I would update to BioPerl 1.5.2.  v.1.4 is 3 yrs old and there have  
been tons of changes both for sequence retrieval and parsers.

We can't predict when a new 'stable' release will be available but  
1.5.2 works well for most purposes.

chris

On May 3, 2007, at 3:09 PM, Anand Venkatraman wrote:

> Hi
>
> I am using Bioperl 1.4 and I am trying to obtain protein sequences  
> for specific Uniprot records.
> ...
>  Is there something I need to fix.
>
> Thanks in advance for the help.
>
>  Anand
>
>
> ---------------------------------
> Ahhh...imagining that irresistible "new car" smell?
>  Check outnew cars at Yahoo! Autos.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thiago.venancio at gmail.com  Thu May  3 17:12:35 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Thu, 3 May 2007 18:12:35 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
	<54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
Message-ID: <44255ea80705031412n7abef247je70d2681bb3cc7ed@mail.gmail.com>

Hi all,

Just for record. I am getting good results to extract CDS from protein X dna
alignments by using the following procedure:

- BLASTX to identify the hits for each dna sequence (if you want to process
sequences for further multiple sequence alignment, it is important to record
the frames);

- fastx/y to refine the alignment between the protein and the dna. FASTX/Y
is is quite good, because it performs well with frame shifts and a allows
better identification of premature stop codons. In addition, the alignment
(and the CDS prediction) is better.

This is interesting to note, to avoid analysis of "phantom" mRNAs, which are
sequences that have stops, so merely looking at the blast can raise
misleading results sometimes.

Best.

Thiago


On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Hi -
> There are some tools that do this for you -- I've listed a few from a
> google search or from what I remember reading.  It would be great If you
> (and others!) are willing to contribute a little of the info of what you
> find that works for you to the wiki, that would be great as well.   A little
> HOWTO would be cool - here or on openwetware.org.
>
> Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
> EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2
>
> Ewan Birney's estwise as part of wise package also can help if you have a
> likely protein from BLAST you want to align to the est - estwise can handle
> frameshifts, but can be too slow for some people.  Exonerate's protein2dna
> model may also work here, but I haven't tried it.
>
> -jason
> On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:
>
> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed
> to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================

From lstein at cshl.edu  Thu May  3 17:35:57 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 17:35:57 -0400
Subject: [Bioperl-l] CSHL is hiring
Message-ID: <6dce9a0b0705031435r3bc2d2ddlfca5ac02844b4ef0@mail.gmail.com>

Hi Folks,

Sorry for the spam. My group at CSHL is looking for a scientific programmer
with good software development credentials and some experience in
bioinformatics. Experience in object-oriented Perl programming is a strict
requirement.

This is to work on user interface development for several projects
including:

   - BioMart (data warehouse) project (www.biomart.org)
   - GBrowse genome browser (www.gmod.org/GBrowse)
   - Reactome pathways database (www.reactome.org)

I can offer salaries in the 60-80K range, depending on level of experience.
Please reply to lstein at cshl.edu.

Best,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From MEC at stowers-institute.org  Tue May  8 12:59:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 8 May 2007 11:59:10 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start
	and stop coordinates??
Message-ID: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>

Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
coordinates, 

as in:
  ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
&& $start > $stop;

I thought it is not legal for a feature to be so composed.  

Anyone know?

Cheers,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue May  8 13:12:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 12:12:45 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>

I believe all seqfeature location coordinates are designed to have  
start < stop for consistency; in cases where the strand matters (CDS,  
gene, etc.) then the strand is set to 1 or -1.  When start > stop,  
the two are reversed and the strand is flipped; at least that's the  
way locations are set up in BioPerl.

chris

On May 8, 2007, at 11:59 AM, Cook, Malcolm wrote:

> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Tue May  8 14:37:20 2007
From: juheymann at yahoo.com (Bohr)
Date: Tue, 8 May 2007 11:37:20 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
Message-ID: <10381379.post@talk.nabble.com>


Hi,

I installed bioperl under OSX Tiger via Fink. I tested the installation
using the test tutorial via: perl -w bptutorial.pl 5

The script failed indicating that the file to retrieve was missing. To
identify the problem, I used a script using 'get_sequence' that will
retrieve a file from 'genbank' or 'embl'. Both succeeded. If I replace it
with 'swiss' or 'swissprot' and substitute the ID with the identical ID as
in the tutorial, I am recreating the problem found with bptutorial.pl. Other
ID's do the same.

Any pointers on the origin of this finding would be greatly appreciated.
-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10381379
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Tue May  8 17:53:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 16:53:04 -0500
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
In-Reply-To: <10381379.post@talk.nabble.com>
References: <10381379.post@talk.nabble.com>
Message-ID: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>

The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
etc) accessed via bioperl.

As a note, the bptutorial.pl has been moved to the bioperl wiki:

http://www.bioperl.org/wiki/Bptutorial

chris

On May 8, 2007, at 1:37 PM, Bohr wrote:

>
> Hi,
>
> I installed bioperl under OSX Tiger via Fink. I tested the  
> installation
> using the test tutorial via: perl -w bptutorial.pl 5
>
> The script failed indicating that the file to retrieve was missing. To
> identify the problem, I used a script using 'get_sequence' that will
> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
> replace it
> with 'swiss' or 'swissprot' and substitute the ID with the  
> identical ID as
> in the tutorial, I am recreating the problem found with  
> bptutorial.pl. Other
> ID's do the same.
>
> Any pointers on the origin of this finding would be greatly  
> appreciated.
> -- 
> View this message in context: http://www.nabble.com/problem-with- 
> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
> tf3711391.html#a10381379
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Wed May  9 18:17:27 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:17:27 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10403903.post@talk.nabble.com>


Thank you for the feedback and the suggestion.

I installed 1.5.2 via Build.pl and the results were the same e.g. embl and
genbank worked fine, swissprot failed

Here is the output:

MSG: acc (CALX_YEAST) does not exist
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Did not provide a valid Bio::PrimarySeqI object
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SeqIO::fasta::write_seq
/sw/lib/perl5/5.8.6/Bio/SeqIO/fasta.pm:181

Before contemplating too much:
Here my question: how do I verify the update to 1.5.2? (I ran ./Build test
and that came back positive.) And what else could have gone wrong here?

What might be a clever way to troubleshoot this?


---------------------------------------------------------------------------

Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10403903
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From ursula_cox at btinternet.com  Wed May  9 18:12:26 2007
From: ursula_cox at btinternet.com (Ursula at BT)
Date: Wed, 9 May 2007 23:12:26 +0100
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
Message-ID: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>

Dear BioPerl List,

 
I'm new to BioPerl (and Perl for that matter). I have an array of enzyme
names, and a larger collection of enzymes (guaranteed to be a superset by
the way it's constructed). I need to make a new collection containing just
the enzymes corresponding to the names I have in the array.

 
I was hoping that something like:

 
my $all_rebase =
Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');

my $all_rebase_collection = $all_rebase->read();

 
my @enzymes =
('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','AccB1I','
AccB7I','AccI');

 
my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

foreach $enzyme (all_rebase_collection)

            {

            $new_collection($enzyme) if grep $_ eq $enzyme->name, @enzymes;

            }

 
would work, but I get a syntax error near "$new_collection(".

 
Any clues much appreciated,

 
Ursula Cox


From juheymann at yahoo.com  Wed May  9 18:38:42 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:38:42 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10404211.post@talk.nabble.com>


Thank you for pointing that out! I installed 1.5.2 via Build.pl. The scripts
work as expected now.


Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10404211
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Wed May  9 19:37:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 May 2007 18:37:33 -0500
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
In-Reply-To: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
References: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
Message-ID: <E4472E55-AADB-4697-8C4D-2EC231923F0B@uiuc.edu>


On May 9, 2007, at 5:12 PM, Ursula at BT wrote:

> Dear BioPerl List,
>
>
>
> I'm new to BioPerl (and Perl for that matter). I have an array of  
> enzyme
> names, and a larger collection of enzymes (guaranteed to be a  
> superset by
> the way it's constructed). I need to make a new collection  
> containing just
> the enzymes corresponding to the names I have in the array.

First, prior to using BioPerl you should really brush up on perl  
itself (Learning Perl, or James Tisdall's Perl for Bioinformatics  
books, the former preferred).  Though there are several scripts  
available to get you started with Bioperl, much of the code is  
written with the expectation that you can write and debug a basic  
perl script (and there is some expectation that you are somewhat  
familiar with OO Perl).

Saying that, let's see what's wrong...

> I was hoping that something like:
>
>
>
> my $all_rebase =
> Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');
>
> my $all_rebase_collection = $all_rebase->read();

The 'bionet' format is not supported; only 'withrefm', 'itype2',  
'bairoch' are (the latter only experimentally).  See 'perldoc  
Bio::Restriction::IO'.

> my @enzymes =
> ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','Acc 
> B1I','
> AccB7I','AccI');
>
>
>
> my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

Missing a new() constructor here.

> foreach $enzyme (all_rebase_collection)

Not sure what this is.  No '$' sigil for $all_rebase_collection will  
make the compiler look for (and fail to find) the sub  
all_rebase_collection().

>
>             {
>
>             $new_collection($enzyme) if grep $_ eq $enzyme->name,  
> @enzymes;
>
>             }
>
>
>
> would work, but I get a syntax error near "$new_collection(".

Yep.  You don't have your grep sub block in brackets {}, hence the  
error.  See 'perldoc -f grep'.

> Any clues much appreciated,
>
>
>
> Ursula Cox

No prob, but again you might want to brush up on perl.

chris


From darin.london at duke.edu  Thu May 10 12:17:38 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Thu, 10 May 2007 12:17:38 -0400
Subject: [Bioperl-l] BOSC 2007 Second Call For Papers
Message-ID: <200705101617.l4AGHceI002463@tenero.duhs.duke.edu>


The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th.  The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions.   Thank you, and we hope to see you in Vienna.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From lstein at cshl.edu  Thu May 10 13:13:09 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 10 May 2007 13:12:09 -0401
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0705101013w1923c173l5ec5d9288c67c9a2@mail.gmail.com>

It's a workaround for some broken data sources. It should "never happen."

Lincoln

On 5/8/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From Bank.Beszteri at awi.de  Thu May 10 12:13:00 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Thu, 10 May 2007 18:13:00 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
Message-ID: <4643448C.4000807@awi.de>

Dear Bioperl folks,

I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
but in some things it did not behave as I expected it to, so I had to 
look inside a bit.
In particular, I had problems with mixed up bootstrap values after 
re-rooting. After looking into the Bio::Tree::Tree data structures, it 
seems that

a) bootstrap values are stored as attributes of nodes of the tree [to my 
understanding, they should rather be attributes of branches but 
Bio::Tree::Tree apparently tries to simplify away branches]; each node 
stores the bootstrap value belonging to the branch that connects it to 
its ancestor node (I?m reading in trees from Newick strings, and 
bootstrap values arrive in the id fields of internal branches)

b) when re-rooting a tree, bootstrap values stay with the same node 
where they were before. Because the node that used to be the ancestor of 
a particular node in the original tree might have become its descendant 
after re-rooting, the bootstrap values are being mixed up.

Can you confirm my conclusion? Whether yes or no, have you got an easy 
workaround or alternative solution to re-rooting trees (without having 
to touch the reroot method) or any other hints that could be useful for 
me to deal with this issue?

Cheers,

Bank


--
Dr. B?nk Beszteri
Alfred Wegener Institute for Polar and Marine Research

From dmessina at wustl.edu  Thu May 10 16:16:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 10 May 2007 15:16:48 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
Message-ID: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>

Hi everyone,

Shin Leong here at the Wash U GSC has written SearchIO-compliant  
cross_match parsing and result modules. Specifically,  
Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.

To my knowledge this functionality doesn't exist in BioPerl. Any  
comments or objections before I commit these to CVS?

Thanks,
Dave


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From aperezp at uma.es  Thu May 10 13:58:32 2007
From: aperezp at uma.es (=?ISO-8859-1?Q?=22Antonio_J=2E_P=E9rez=22?=)
Date: Thu, 10 May 2007 19:58:32 +0200
Subject: [Bioperl-l] Get Swiss Entry
Message-ID: <46435D48.4020309@uma.es>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/ca4e893e/attachment.html 

From jason at bioperl.org  Thu May 10 16:53:28 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 10 May 2007 13:53:28 -0700
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <FDBE1855-6252-4902-B32B-E984EC6B22E9@bioperl.org>

Awesome!
On May 10, 2007, at 1:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment.bin 

From cjfields at uiuc.edu  Fri May 11 00:55:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 10 May 2007 23:55:05 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>

Sounds good to me!  Any tests to be added?

chris

On May 10, 2007, at 3:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Fri May 11 01:42:53 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 11 May 2007 00:42:53 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
Message-ID: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>

> Sounds good to me!  Any tests to be added?

No tests right now as far as I can tell. I'm swamped personally, but  
perhaps I can persuade Mark Johnson over here to crank out a few.


From cjfields at uiuc.edu  Fri May 11 11:25:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 10:25:34 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
	<57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
Message-ID: <B654B314-FE39-4DB2-9B2F-5C812CF3E257@uiuc.edu>

Thanks Mark!  I don't think you'll need to add a ton of tests; just  
enough to demo anything that you feel is necessary or specific to the  
parser.  These could go into SearchIO.t or their own test suite.

chris

On May 11, 2007, at 10:14 AM, Mark Johnson wrote:

>>> Sounds good to me!  Any tests to be added?
>>
>> No tests right now as far as I can tell. I'm swamped personally, but
>> perhaps I can persuade Mark Johnson over here to crank out a few.
>
> I'll see what I can do.  I just had to open my mouth about getting  
> this
> contributed back after I noticed it, so I suppose this is appropriate
> retribution.  8)
>
>


From mjohnson at watson.wustl.edu  Fri May 11 11:14:56 2007
From: mjohnson at watson.wustl.edu (Mark Johnson)
Date: Fri, 11 May 2007 10:14:56 -0500 (CDT)
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
Message-ID: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>

>> Sounds good to me!  Any tests to be added?
>
> No tests right now as far as I can tell. I'm swamped personally, but
> perhaps I can persuade Mark Johnson over here to crank out a few.

I'll see what I can do.  I just had to open my mouth about getting this
contributed back after I noticed it, so I suppose this is appropriate
retribution.  8)


From golharam at umdnj.edu  Fri May 11 16:20:41 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 16:20:41 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
Message-ID: <000501c79409$d8c03480$f6028a0a@PICO>

I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


From jason at bioperl.org  Fri May 11 16:53:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 11 May 2007 13:53:19 -0700
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>

Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup();


-jason
On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri May 11 16:57:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 15:57:23 -0500
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <41E91E58-48A5-4E29-B6BA-E9417BF17513@uiuc.edu>

cleanup() is supposed to clean up temp directory stuff; it's  
inherited from Bio::Tools::Run::WrapperBase.

chris

On May 11, 2007, at 3:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From golharam at umdnj.edu  Fri May 11 18:11:47 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 18:11:47 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
 after itself
In-Reply-To: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>
Message-ID: <001301c79419$5e794e90$f6028a0a@PICO>

No, I didn't, but I will now.  Thanks.  Interestingly enough ClustalW
removes the files from within the temp directory, but not the temp directory
itself.
 
 
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Friday, May 11, 2007 4:53 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
after itself


Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup(); 


-jason

On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:


I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From goshng at gmail.com  Sat May 12 11:21:59 2007
From: goshng at gmail.com (Sang Chul Choi)
Date: Sat, 12 May 2007 11:21:59 -0400
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
Message-ID: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>

Hi,

One Bio::Seq's sequence is "ACGT" and I want this object to have
"ACGA" by changing the fouth letter from T to A. I thought I could do
this by reading sequence string through the method of seq(), changing
the string by perl's general function, and generating another Bio::Seq
object with the new string. This seems to be silly, a little bit.

Is there any simple way to do this? Or, is there any method of
Bio::Seq to do this: to change one letter at a particular position, or
additionally to change letters with some range?

Thank you,

Sang Chul

From jason at bioperl.org  Sat May 12 12:50:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 09:50:10 -0700
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
In-Reply-To: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
References: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
Message-ID: <22C99635-C22D-4F51-AADD-5CCF595222DF@bioperl.org>

You can get/set the seq data via the seq() method.

use Bio::Seq;
my $seq = Bio::Seq->new(-seq => 'ACGT');

my $str = $seq->seq;
print $str, "\n";

substr($str,3,1,'A');
$seq->seq($str);
print $seq->seq, "\n";

On May 12, 2007, at 8:21 AM, Sang Chul Choi wrote:

> Hi,
>
> One Bio::Seq's sequence is "ACGT" and I want this object to have
> "ACGA" by changing the fouth letter from T to A. I thought I could do
> this by reading sequence string through the method of seq(), changing
> the string by perl's general function, and generating another Bio::Seq
> object with the new string. This seems to be silly, a little bit.
>
> Is there any simple way to do this? Or, is there any method of
> Bio::Seq to do this: to change one letter at a particular position, or
> additionally to change letters with some range?
>
> Thank you,
>
> Sang Chul
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Sat May 12 18:12:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 15:12:56 -0700
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <4643448C.4000807@awi.de>
References: <4643448C.4000807@awi.de>
Message-ID: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>


On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:

> Dear Bioperl folks,
>
> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees,
> but in some things it did not behave as I expected it to, so I had to
> look inside a bit.
> In particular, I had problems with mixed up bootstrap values after
> re-rooting. After looking into the Bio::Tree::Tree data structures, it
> seems that
>
> a) bootstrap values are stored as attributes of nodes of the tree  
> [to my
> understanding, they should rather be attributes of branches but
> Bio::Tree::Tree apparently tries to simplify away branches]; each node
> stores the bootstrap value belonging to the branch that connects it to
> its ancestor node (I?m reading in trees from Newick strings, and
> bootstrap values arrive in the id fields of internal branches)

Please feel free to suggest an alternative implementation if you  
don't agree with the object model.    It has worked quite well in our  
hands so I'd be all ears for someone wanting to get in an do some  
more work on it.

We have answered the question as to why bootstrap values are internal  
ids many times on this list and I believe on the wiki -- the parser  
can't tell the difference between a node id and a bootstrap value  
because nexus uses the same slot for both.  if you know you have  
bootstrap values in the internal node it is trivial to process your  
tree and copy the values over.


for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
  $node->bootstrap($node->id);
  $node->id('');
}

I just added this as a method to TreeFunctionI so that it can be  
easily called now to help satisfy everyone who hopes that the toolkit  
can guess whether the internal nodes are bootstraps or identifiers.


>
> b) when re-rooting a tree, bootstrap values stay with the same node
> where they were before. Because the node that used to be the  
> ancestor of
> a particular node in the original tree might have become its  
> descendant
> after re-rooting, the bootstrap values are being mixed up.
>
> Can you confirm my conclusion? Whether yes or no, have you got an easy
> workaround or alternative solution to re-rooting trees (without having
> to touch the reroot method) or any other hints that could be useful  
> for
> me to deal with this issue?
>

I think you are right, but I am not clear what should be value for  
the internal node attached to the root now.

Note that is always helpful to provide example code illustrating your  
problem.  Here is an example which I think illustrates your problem.

use Bio::TreeIO;

my $in = Bio::TreeIO->new(-format => 'newick',
			  -fh => \*DATA);
my $out = Bio::TreeIO->new(-format => 'newick');
while( my $t = $in->next_tree ){
     my ($a) = $t->find_node(-id =>"A");
     $out->write_tree($t);
     $t->reroot($a);
     $out->write_tree($t);
}
__DATA__
(((A:5,B:5)90:2,C:4)25:3,D:10);


> Cheers,
>
> Bank
>
>
>
> --
> Dr. B?nk Beszteri
> Alfred Wegener Institute for Polar and Marine Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From darin.london at duke.edu  Mon May 14 10:44:56 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Mon, 14 May 2007 10:44:56 -0400
Subject: [Bioperl-l] BOSC 2007 Abstract Submission Deadline Extended
Message-ID: <200705141444.l4EEium2026969@tenero.duhs.duke.edu>


Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st.  The announcement day will remain the same so that it remains before the Early Discount Date.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From thiago.venancio at gmail.com  Mon May 14 14:54:44 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Mon, 14 May 2007 15:54:44 -0300
Subject: [Bioperl-l] get regions
Message-ID: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>

Hi all,

Using Bio::Seq, is there any easy way to get the coordinates where a
regular expression matches or should I build a sliding window?

For example, looking for a given promoter region in a FASTA file. If
the region is found, I would like to recover exactly the coordinates
where it matches.

Thanks in advance.

Thiago
-- 
"Doubt is not a pleasant condition, but certainty is absurd."
            Voltaire

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================

From jason at bioperl.org  Mon May 14 15:06:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 12:06:11 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>

I assume you are doing the matches on the string with =~ so Bio::Seq  
doesn't really help you here I don't think.
See the $` variable in Perl for how to capture the position of where  
a regexp matches.

-jason
On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:

> Hi all,
>
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
>
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
>
> Thanks in advance.
>
> Thiago
> -- 
> "Doubt is not a pleasant condition, but certainty is absurd."
>             Voltaire
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Mon May 14 15:15:09 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 14 May 2007 12:15:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>

I do this in perl with the pos() function.  This requires the use of the
match operator (m) like

if ($gene =~ m/$pattern/gi)
{
	$start = pos($gene) - length($pattern) + 1;
}

pos() returns the location of the pointer where the regex left off after
finding a match.  I remove the length of my pattern (which is just a
string with a few placeholder (.) wildcards, so I know how long the
match will always be).

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Monday, May 14, 2007 12:06 PM
> To: Thiago Venancio
> Cc: bioperl-l list
> Subject: Re: [Bioperl-l] get regions
> 
> I assume you are doing the matches on the string with =~ so 
> Bio::Seq doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position 
> of where a regexp matches.
> 
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> 
> > Hi all,
> >
> > Using Bio::Seq, is there any easy way to get the 
> coordinates where a 
> > regular expression matches or should I build a sliding window?
> >
> > For example, looking for a given promoter region in a FASTA 
> file. If 
> > the region is found, I would like to recover exactly the 
> coordinates 
> > where it matches.
> >
> > Thanks in advance.
> >
> > Thiago
> > --
> > "Doubt is not a pleasant condition, but certainty is absurd."
> >             Voltaire
> >
> > ========================
> > Thiago Motta Venancio, MSc
> > PhD student in Bioinformatics
> > University of Sao Paulo
> > ========================
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Bank.Beszteri at awi.de  Mon May 14 09:20:07 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Mon, 14 May 2007 15:20:07 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
References: <4643448C.4000807@awi.de>
	<1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
Message-ID: <46486207.60304@awi.de>

Dear Jason,

thanks for your answer! Sorry about having been ambiguous - it is clear 
that bootstrap values are parsed as ids from newick files, I had no 
problem with that, it was only the first step of the explanation of my 
problem, which was the rerooting issue.

Thanks for your example code as well, it is indeed really useful to 
illustrate the problem. I modified the original tree a bit to make my 
point clearer:

In your example, there are two internal node ids in a four-taxon tree. 
This is not a realistic situtation for bootstrap values, because 
bootstrap values are attached to bipartitions of terminal nodes, i.e., 
edges / branches of a tree (in what proportion of the bootstrap 
replicates was a particular bipartition recovered - an alternative 
representation of bootstraps, like produced e.g. by PAUP, is indeed a 
"taxon bipartition table"). This means that in a four taxon tree, we can 
have at most one bootstrap value - corresponding to the single 
non-trivial bipartition (all other bipartitions are trivial, i.e., they 
separate a terminal node from the rest).

So here is an example 4-taxon tree with a bootstrap value:

(A:52,(B:46,C:50)68:11,D:70);

After rerooting at node B (using your example code) it looks like

((B:46,C:50,(A:52,D:70):11)68);

Now there are two problems:
    1) this seems to be a small problem with TreeIO rather than with 
rerooting: there is an extra pair of parentheses around the whole tree;

but more importantly: 
    2) the bootstrap value appears at the root node, which is not 
sensible according to the convention that "each node stores the 
bootstrap value belonging to the branch linking it to its ancestor". You 
would like the bootstrap value appear at the node connecting A & D in 
this situation, which would look like

(B:46,C:50,(A:52,D:70)68:11);

because in  this new situation, this position would correspond to the 
same bipartition as in the original tree [which is (A,D)(B,C)].

In the meanwhile, I got a mail showing me the solution (thx Daniel!), 
which is in fact pretty simple: all that has to be done is go through 
the nodes on the path from the old to the new root after rerooting, and 
for each node, take the bootstrap values from its ancestor (and remove 
it from the ancestor). This leaves the root node without a bootstrap 
value, which is exactly what you want (because it has no branch 
connecting it to its ancestor, there is no sensible bootstrap value 
attached to a root node).

So this exercise tells me that bootstraps and "real" node ids should be 
handled in different manners when rerooting: real ids should of course 
stay with the nodes, whereas bootstrap values on the path between the 
new and old root should move over to the other end of the corresponding 
branch.

Best wishes,

Bank

Jason Stajich wrote:
>
> On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:
>
>> Dear Bioperl folks,
>>
>> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
>> but in some things it did not behave as I expected it to, so I had to 
>> look inside a bit.
>> In particular, I had problems with mixed up bootstrap values after 
>> re-rooting. After looking into the Bio::Tree::Tree data structures, it 
>> seems that
>>
>> a) bootstrap values are stored as attributes of nodes of the tree [to my 
>> understanding, they should rather be attributes of branches but 
>> Bio::Tree::Tree apparently tries to simplify away branches]; each node 
>> stores the bootstrap value belonging to the branch that connects it to 
>> its ancestor node (I?m reading in trees from Newick strings, and 
>> bootstrap values arrive in the id fields of internal branches)
>
> Please feel free to suggest an alternative implementation if you don't 
> agree with the object model.    It has worked quite well in our hands 
> so I'd be all ears for someone wanting to get in an do some more work 
> on it.
>
> We have answered the question as to why bootstrap values are internal 
> ids many times on this list and I believe on the wiki -- the parser 
> can't tell the difference between a node id and a bootstrap value 
> because nexus uses the same slot for both.  if you know you have 
> bootstrap values in the internal node it is trivial to process your 
> tree and copy the values over.  
>
>
> for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
>  $node->bootstrap($node->id); 
>  $node->id('');
> }
>
> I just added this as a method to TreeFunctionI so that it can be 
> easily called now to help satisfy everyone who hopes that the toolkit 
> can guess whether the internal nodes are bootstraps or identifiers.
>
>
>>
>> b) when re-rooting a tree, bootstrap values stay with the same node 
>> where they were before. Because the node that used to be the ancestor of 
>> a particular node in the original tree might have become its descendant 
>> after re-rooting, the bootstrap values are being mixed up.
>>
>> Can you confirm my conclusion? Whether yes or no, have you got an easy 
>> workaround or alternative solution to re-rooting trees (without having 
>> to touch the reroot method) or any other hints that could be useful for 
>> me to deal with this issue?
>>
>
> I think you are right, but I am not clear what should be value for the 
> internal node attached to the root now.
>
> Note that is always helpful to provide example code illustrating your 
> problem.  Here is an example which I think illustrates your problem.
>
> use Bio::TreeIO;
>
> my $in = Bio::TreeIO->new(-format => 'newick',
>   -fh => \*DATA);
> my $out = Bio::TreeIO->new(-format => 'newick');
> while( my $t = $in->next_tree ){
>     my ($a) = $t->find_node(-id =>"A");
>     $out->write_tree($t);
>     $t->reroot($a);
>     $out->write_tree($t);
> }
> __DATA__
> (((A:5,B:5)90:2,C:4)25:3,D:10);
>
>
>> Cheers,
>>
>> Bank
>>
>>
>>
>> --
>> Dr. B?nk Beszteri
>> Alfred Wegener Institute for Polar and Marine Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
>
>


From basu at pharm.sunysb.edu  Mon May 14 15:10:33 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Mon, 14 May 2007 15:10:33 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <4648B429.2030907@pharm.sunysb.edu>

Thiago Venancio wrote:
> Hi all,
> 
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
The perl core function "pos" should help you in this case. Do a 'perldoc
-f pos' for details.

-sidd


> 
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
> 
> Thanks in advance.
> 
> Thiago


From cjfields at uiuc.edu  Mon May 14 16:48:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 May 2007 15:48:36 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>

I use pos() with m{}g; the quoted globals tend to slow things down  
for me.

Ah, see Kevin's answered that...

chris

On May 14, 2007, at 2:06 PM, Jason Stajich wrote:

> I assume you are doing the matches on the string with =~ so Bio::Seq
> doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position of where
> a regexp matches.
>
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>
>> Hi all,
>>
>> Using Bio::Seq, is there any easy way to get the coordinates where a
>> regular expression matches or should I build a sliding window?
>>
>> For example, looking for a given promoter region in a FASTA file. If
>> the region is found, I would like to recover exactly the coordinates
>> where it matches.
>>
>> Thanks in advance.
>>
>> Thiago
>> -- 
>> "Doubt is not a pleasant condition, but certainty is absurd."
>>             Voltaire
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon May 14 17:50:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 14:50:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
Message-ID: <A5BECADC-6516-41FF-A5DB-EE865AD63842@bioperl.org>

yep you are right pos() much better and faster for getting the position.

-j
On May 14, 2007, at 1:48 PM, Chris Fields wrote:

> I use pos() with m{}g; the quoted globals tend to slow things down  
> for me.
>
> Ah, see Kevin's answered that...
>
> chris
>
> On May 14, 2007, at 2:06 PM, Jason Stajich wrote:
>
>> I assume you are doing the matches on the string with =~ so Bio::Seq
>> doesn't really help you here I don't think.
>> See the $` variable in Perl for how to capture the position of where
>> a regexp matches.
>>
>> -jason
>> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>>
>>> Hi all,
>>>
>>> Using Bio::Seq, is there any easy way to get the coordinates where a
>>> regular expression matches or should I build a sliding window?
>>>
>>> For example, looking for a given promoter region in a FASTA file. If
>>> the region is found, I would like to recover exactly the coordinates
>>> where it matches.
>>>
>>> Thanks in advance.
>>>
>>> Thiago
>>> -- 
>>> "Doubt is not a pleasant condition, but certainty is absurd."
>>>             Voltaire
>>>
>>> ========================
>>> Thiago Motta Venancio, MSc
>>> PhD student in Bioinformatics
>>> University of Sao Paulo
>>> ========================
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From sac at bioperl.org  Mon May 14 21:46:55 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 14 May 2007 18:46:55 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
Message-ID: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>

On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> I do this in perl with the pos() function.  This requires the use of the
> match operator (m) like
>
> if ($gene =~ m/$pattern/gi)
> {
>         $start = pos($gene) - length($pattern) + 1;
> }
>
> pos() returns the location of the pointer where the regex left off after
> finding a match.

Cool. I hadn't known that was possible.

> I remove the length of my pattern (which is just a
> string with a few placeholder (.) wildcards, so I know how long the
> match will always be).

To generalize your code so that it will work for any pattern, such as
one that can match strings of variable length like "A{5,10}", just
subtract the length of the actual string that was matched:

if ($gene =~ m/$pattern/gi)
{
    $start = pos($gene) - length($&) + 1;
 }

Steve

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > Jason Stajich
> > Sent: Monday, May 14, 2007 12:06 PM
> > To: Thiago Venancio
> > Cc: bioperl-l list
> > Subject: Re: [Bioperl-l] get regions
> >
> > I assume you are doing the matches on the string with =~ so
> > Bio::Seq doesn't really help you here I don't think.
> > See the $` variable in Perl for how to capture the position
> > of where a regexp matches.
> >
> > -jason
> > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> >
> > > Hi all,
> > >
> > > Using Bio::Seq, is there any easy way to get the
> > coordinates where a
> > > regular expression matches or should I build a sliding window?
> > >
> > > For example, looking for a given promoter region in a FASTA
> > file. If
> > > the region is found, I would like to recover exactly the
> > coordinates
> > > where it matches.
> > >
> > > Thanks in advance.
> > >
> > > Thiago
> > > --
> > > "Doubt is not a pleasant condition, but certainty is absurd."
> > >             Voltaire
> > >
> > > ========================
> > > Thiago Motta Venancio, MSc
> > > PhD student in Bioinformatics
> > > University of Sao Paulo
> > > ========================
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From shameer at ncbs.res.in  Mon May 14 23:03:57 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 15 May 2007 08:33:57 +0530 (IST)
Subject: [Bioperl-l] How to produce Bio::Graphics images using PROSITE
	output ?
In-Reply-To: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
Message-ID: <49697.192.168.1.1.1179198237.squirrel@mail.ncbs.res.in>

Dear All,

Thanks a lot for all your inputs [Help : Imagemaps using Bio::Graphics ].
I am still working on the other part of this project. Now, I am sure that
I can impliment it using Bio::Graphics. I will come back to imagemaps with
in a week or two.

Meanwhile, I need to parse a prosite output to present it as a
Bio::Graphics image. Any one had tries Bio::Graphics to create images
using prosite output ? I tried in the How-to I couldnt find anything
related to prosite.

My output looks like this :
    >Sequence : PS00001 ASN_GLYCOSYLATION N-glycosylation site.
          75 - 78  NGSM
    >Sequence : PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation
site.
         41 - 43  SpK
    >Sequence : PS00008 MYRISTYL N-myristoylation site.
           6 - 11  GTitNQ
    >Sequence : PS00009 AMIDATION Amidation site.
          78 - 81  mGKR

I need to impliment an image like blast-parser image.
Thanks to any inputs/pointers.

> The width of the image is determined by the -width attribute and is given
> in
> pixels. You cannot control the height of the image as it is computed
> dynamically based on the number of features and bumping options.
>
> Lincoln
>
> On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>>
>> Dear Scot,
>>
>> > There is a fair amount of documentation in the perldoc for
>> > Bio::Graphics::Panel under the section called 'Creating Imagemaps';
>> have
>> > you read that?
>>
>> I agreed, but I couldnt the exact information I needed :( (may be I
>> missed
>> something important).
>>
>> >  Also, for changing the scale, that should happen
>> > automatically--have you tried yet?
>>
>> I tried by changing the Lincoln's program eg: blast3.pl
>> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>> to my
>> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>>
>> But it had given me a smaller scale of length upto 300. I was looking
>> for
>> an option where I need same width and height of given image and a
>> dynamic
>> start and end values depending on length of my sequence. Since I couldnt
>> accomplish, I thought of getting some help from you guys. I think I need
>> to play a little bit with the value for reformat the scale to accomodate
>> my hits as well.
>>
>> Thanks a lot for your inputs,
>> --
>> Shameer Khadar


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From bix at sendu.me.uk  Tue May 15 04:23:52 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 15 May 2007 09:23:52 +0100
Subject: [Bioperl-l] New Blast parser
Message-ID: <46496E18.1000809@sendu.me.uk>

Back in August of last year I introduced Bio::PullParserI, a module that 
aids in the creation of fast SearchIO and Search modules. I've finally 
gotten around to implementing a Blast parser using the interface, which 
I've called Bio::SearchIO::blast_pull.

To use it you say:

my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");

or in the near future (when I've committed StandAloneBlast changes):

my $sab = Bio::Tools::Run::StandAloneBlast->new(-_READMETHOD => 
"blast_pull");


Currently the parser is incomplete: I've only tested it with NCBI BLASTN 
and BLASTP. However, results are promising. In one particular real-world 
usage-case involving running and parsing multiple Blast jobs via 
StandAloneBlast (amongst other things), changing only the _READMETHOD 
from 'blast' to 'blast_pull' in the code dropped run time from 20223s to 
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40% 
less).

Please try it out and feed-back any bugs you discover.


Cheers,
Sendu.

From aaron.j.mackey at gsk.com  Tue May 15 10:30:13 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 15 May 2007 10:30:13 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <OFFA3F7652.5382601A-ON852572DC.004F5B2A-852572DC.004FAF72@gsk.com>

Or, use a zero-width, positive look ahead assertion, and don't incur the 
penalty of either $` or $&:

  if ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
  }

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/14/2007 09:46:55 PM:

> On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > I do this in perl with the pos() function.  This requires the use of 
the
> > match operator (m) like
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >         $start = pos($gene) - length($pattern) + 1;
> > }
> >
> > pos() returns the location of the pointer where the regex left off 
after
> > finding a match.
> 
> Cool. I hadn't known that was possible.
> 
> > I remove the length of my pattern (which is just a
> > string with a few placeholder (.) wildcards, so I know how long the
> > match will always be).
> 
> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
> 
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
> 
> Steve
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > > Jason Stajich
> > > Sent: Monday, May 14, 2007 12:06 PM
> > > To: Thiago Venancio
> > > Cc: bioperl-l list
> > > Subject: Re: [Bioperl-l] get regions
> > >
> > > I assume you are doing the matches on the string with =~ so
> > > Bio::Seq doesn't really help you here I don't think.
> > > See the $` variable in Perl for how to capture the position
> > > of where a regexp matches.
> > >
> > > -jason
> > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> > >
> > > > Hi all,
> > > >
> > > > Using Bio::Seq, is there any easy way to get the
> > > coordinates where a
> > > > regular expression matches or should I build a sliding window?
> > > >
> > > > For example, looking for a given promoter region in a FASTA
> > > file. If
> > > > the region is found, I would like to recover exactly the
> > > coordinates
> > > > where it matches.
> > > >
> > > > Thanks in advance.
> > > >
> > > > Thiago
> > > > --
> > > > "Doubt is not a pleasant condition, but certainty is absurd."
> > > >             Voltaire
> > > >
> > > > ========================
> > > > Thiago Motta Venancio, MSc
> > > > PhD student in Bioinformatics
> > > > University of Sao Paulo
> > > > ========================
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > jason at bioperl.org
> > > http://jason.open-bio.org/
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From diogoat at gmail.com  Tue May 15 18:44:59 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 19:44:59 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
Message-ID: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>

Dear All,

I need to download a lot of sequence of Leishmania major in genbank
format...
But i can't download on the page of NCBI, because the downloaded file are
corrupted... when i use a browser to download this sequences
And them i looking for some script to download that`s file and fink
something like that:


#########################################################
use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>teste6.gb');
$out->write_seq($seqio);
#########################################################

And the system return me this erros
[diogo1 at genome perl]$ perl teste6.pl

-------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.

Any Ideia?

Thank`s

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http:biowebdb.org <http://www.ncbs.res.in/>


From diogoat at gmail.com  Tue May 15 19:27:05 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 20:27:05 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
Message-ID: <638512560705151627t2e25f17cg7f820f3097a67748@mail.gmail.com>

Thank for your help Barry!!

It`s work very fine and i`'m using the script... like you said...
The error was on the print that`s right?
I need to use a while to print all sequeces...

Thanks a Lot

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

2007/5/15, Barry Moore <barry.moore at genetics.utah.edu>:
>
> Diogo-
>
> write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO
> object.  Try this
>
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                  (-query   =>'Leishmania major
> [Organism]',
>                                   -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                            -file => '>>teste6.gb');
> while (my $seq = $seqio->next_seq) {
>          $out->write_seq($seq);
> }
>
> Barry
>
> On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:
>
> > Dear All,
> >
> > I need to download a lot of sequence of Leishmania major in genbank
> > format...
> > But i can't download on the page of NCBI, because the downloaded
> > file are
> > corrupted... when i use a browser to download this sequences
> > And them i looking for some script to download that`s file and fink
> > something like that:
> >
> >
> > #########################################################
> > use strict;
> > use warnings;
> >
> > use Bio::Seq;
> > use Bio::SeqIO;
> > use Bio::DB::GenBank;
> >
> > my $query = Bio::DB::Query::GenBank->new
> >                                 (-query   =>'Leishmania major
> > [Organism]',
> >                                 -db      => 'nucleotide');
> > my $gb = new Bio::DB::GenBank;
> > my $seqio = $gb->get_Stream_by_query($query);
> >
> > my $out = Bio::SeqIO->new(-format => 'genbank',
> >                           -file => '>>teste6.gb');
> > $out->write_seq($seqio);
> > #########################################################
> >
> > And the system return me this erros
> > [diogo1 at genome perl]$ perl teste6.pl
> >
> > -------------------- WARNING ---------------------
> > MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant
> > module.
> > Attempting to dump, but may fail!
> > ---------------------------------------------------
> > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
> >
> > Any Ideia?
> >
> > Thank`s
> >
> > Diogo Tschoeke
> > Laboratory of Molecular Biology of Trypanosomatides
> > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> > http://biowebdb.org
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From barry.moore at genetics.utah.edu  Tue May 15 19:17:39 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue, 15 May 2007 17:17:39 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>

Diogo-

write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO  
object.  Try this

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                 (-query   =>'Leishmania major  
[Organism]',
                                  -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                           -file => '>>teste6.gb');
while (my $seq = $seqio->next_seq) {
         $out->write_seq($seq);
}

Barry

On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded  
> file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
> module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 15 22:44:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 May 2007 21:44:43 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>


On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
...

> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
>
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
>
> Steve

Right, but $& (as well as $` and $') inflict a significant penalty  
for their use, as Aaron alludes to.  Their use, even indirectly via a  
library module, can cause a significant performance hit.

chris

From sac at bioperl.org  Wed May 16 04:16:38 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 01:16:38 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
Message-ID: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>

On 5/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
> ...
>
> > To generalize your code so that it will work for any pattern, such as
> > one that can match strings of variable length like "A{5,10}", just
> > subtract the length of the actual string that was matched:
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >     $start = pos($gene) - length($&) + 1;
> >  }
> >
> > Steve
>
> Right, but $& (as well as $` and $') inflict a significant penalty
> for their use, as Aaron alludes to.  Their use, even indirectly via a
> library module, can cause a significant performance hit.
>
> chris

Yes. I had forgotten how poisonous $&, $` and $' were to regex
performance. Please forgive me. We might consider regularly auditing
the bioperl module tree for use of these in committed code.

But regarding the use of the look ahead assertion, there's a problem
if you want to find *all* occurrences of the pattern in a target
string and the pattern can have variable length hits: it may report
overlapping hits because it only collects the starting points of the
match, and does not determine how long each match would be. For
example:

$gene = 'TTTAAAAAAAAGG';
$pattern="A{5,10}";
while ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
    print ++$hit, " hit starts at $start\n";
}

Generates:
1 hit starts at 4
2 hit starts at 5
3 hit starts at 6
4 hit starts at 7

You could get around this by imposing a constraint to avoid trivial
overlaps. OK if you know the length of the pattern, but not so good
for more complex patterns. If there was I way to get the look ahead to
match the longest string possible for a variable length pattern, then
this approach could work, but I'm not sure if that is possible.

Here's a solution I think does the job of reporting the extent of each
match without a performance hit and works for patterns of any
complexity, taking advantage of the special arrays containing hit
indexes, @- and @+:

$gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
while ($gene =~ m/$pattern/gi){
    $hit++;
    printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
}

Generates:
1 hit at:  4 - 11
2 hit at: 16 - 21

You can also use this approach to report the locations of any internal
back references, if the pattern contains any parentheses, via $-[1],
$+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
patterns, but patterns not containing parens won't be penalized.

Steve

From georg.otto at tuebingen.mpg.de  Wed May 16 05:19:06 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 16 May 2007 11:19:06 +0200
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <m17ir9m8hh.fsf@tuebingen.mpg.de>


Dear all,

I have a problem that has to do with downloading data from GenBank as
well, therefor I put it in this thread.

I try to get all entries from organism Danio rerio using the something
like this:


use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $query = "Danio rerio[ORGN]";
my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
					       -query => $query);
my $gb_obj = Bio::DB::GenBank->new;
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);


while (my $seq_obj = $stream_obj->next_seq) {
  my $out = Bio::SeqIO->new(-format => 'fasta',
			    -file => '>>output.fas');
  $out->write_seq($seq_obj);
}


However, the download process aborts after a few thousand entries. I
do not think that this is due to the request itself or problems with
specific entries, since the number of transferred sequences varies
before the stop. It might rather have to do with GenBank terminating
the connection.

Has anybody a suggestion of a better strategy to achieve what I want
(e.g. a different kind of query, a method to reassume the download at
the point where it terminated etc.)?

Best,

Georg


"Diogo Tschoeke" <diogoat at gmail.com> writes:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>


From cjfields at uiuc.edu  Wed May 16 09:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 08:05:59 -0500
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <m17ir9m8hh.fsf@tuebingen.mpg.de>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
Message-ID: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>

It's likely from a timeout issue on the remote server.  One thing  
which will speed things up is to retrieve the remote sequences in  
fasta format to begin with (described in the Bio::DB::GenBank POD):

my $gb_obj = Bio::DB::GenBank->new(-retrievaltype => 'tempfile' ,
			                      -format => 'fasta');
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);

while (my $seq_obj = $stream_obj->next_seq) {
   $out->write_seq($seq_obj);
}

I also suggest using the direct ftp downloads if at all possible  
(i.e. you are downloading WGS or contig sequences).  It's much faster.

chris

On May 16, 2007, at 4:19 AM, Georg Otto wrote:

>
> Dear all,
>
> I have a problem that has to do with downloading data from GenBank as
> well, therefor I put it in this thread.
>
> I try to get all entries from organism Danio rerio using the something
> like this:
>
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $query = "Danio rerio[ORGN]";
> my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
> 					       -query => $query);
> my $gb_obj = Bio::DB::GenBank->new;
> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
>
> while (my $seq_obj = $stream_obj->next_seq) {
>   my $out = Bio::SeqIO->new(-format => 'fasta',
> 			    -file => '>>output.fas');
>   $out->write_seq($seq_obj);
> }
>
>
> However, the download process aborts after a few thousand entries. I
> do not think that this is due to the request itself or problems with
> specific entries, since the number of transferred sequences varies
> before the stop. It might rather have to do with GenBank terminating
> the connection.
>
> Has anybody a suggestion of a better strategy to achieve what I want
> (e.g. a different kind of query, a method to reassume the download at
> the point where it terminated etc.)?
>
> Best,
>
> Georg
>
>
> "Diogo Tschoeke" <diogoat at gmail.com> writes:
>
>> Dear All,
>>
>> I need to download a lot of sequence of Leishmania major in genbank
>> format...
>> But i can't download on the page of NCBI, because the downloaded  
>> file are
>> corrupted... when i use a browser to download this sequences
>> And them i looking for some script to download that`s file and fink
>> something like that:
>>
>>
>> #########################################################
>> use strict;
>> use warnings;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> my $query = Bio::DB::Query::GenBank->new
>>                                 (-query   =>'Leishmania major  
>> [Organism]',
>>                                 -db      => 'nucleotide');
>> my $gb = new Bio::DB::GenBank;
>> my $seqio = $gb->get_Stream_by_query($query);
>>
>> my $out = Bio::SeqIO->new(-format => 'genbank',
>>                           -file => '>>teste6.gb');
>> $out->write_seq($seqio);
>> #########################################################
>>
>> And the system return me this erros
>> [diogo1 at genome perl]$ perl teste6.pl
>>
>> -------------------- WARNING ---------------------
>> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
>> module.
>> Attempting to dump, but may fail!
>> ---------------------------------------------------
>> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
>> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>>
>> Any Ideia?
>>
>> Thank`s
>>
>> Diogo Tschoeke
>> Laboratory of Molecular Biology of Trypanosomatides
>> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
>> http:biowebdb.org <http://www.ncbs.res.in/>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ferraria at gmail.com  Wed May 16 10:38:47 2007
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 16 May 2007 16:38:47 +0200
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
Message-ID: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>

Hi all,

I want to do something relatively simple and I want to know how far Bioperl
tools could help me because I'm having troubles to get to the point.
Here is the pipeline :

"EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
"GeneStructure"

(*) :
>From the EntrezGene ID, I want to retrieve the structure of the gene which
means having the whole genomic sequence and having the start and end
positions of each exons, introns, UTR'....

I thought of 2 ways to accomplish that :

  -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
desired positions.
     this method should work but would take a little time to be ok.

  -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
obtain a Bio::Seq object but I am not able to find any features stored in
it. So it doesn't seem that the get_Seq_by_id function get all information
contained in a EntrezGene entry (?) .

Can somebody help me to make the right choice or show me the right way?

I also saw that some packages detinated to deal with  gene structure exist
but I don't manage to know how to use it properly and even how to create one
of those objects !
Are those packages currently usable ?


Thanks in advance.
Best regards,
tony

From cjfields at uiuc.edu  Wed May 16 12:02:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 11:02:28 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
	<8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
Message-ID: <9C6F4829-4E06-4751-8B10-B2726B5288B9@uiuc.edu>


On May 16, 2007, at 3:16 AM, Steve Chervitz wrote:
...

>>
>> Right, but $& (as well as $` and $') inflict a significant penalty
>> for their use, as Aaron alludes to.  Their use, even indirectly via a
>> library module, can cause a significant performance hit.
>>
>> chris
>
> Yes. I had forgotten how poisonous $&, $` and $' were to regex
> performance. Please forgive me. We might consider regularly auditing
> the bioperl module tree for use of these in committed code.

Already done!  We have run a few audits for gotchas like that:

http://www.bioperl.org/wiki/Auditing

http://www.bioperl.org/wiki/Bioperl_Best_Practices

If there is anything we should be looking for please feel free to add  
as needed.  There shouldn't be any use of the 'naughty' variables in  
CVS, but it might be worth a second look...

> But regarding the use of the look ahead assertion, there's a problem
> if you want to find *all* occurrences of the pattern in a target
> string and the pattern can have variable length hits: it may report
> overlapping hits because it only collects the starting points of the
> match, and does not determine how long each match would be. For
> example:
>
> $gene = 'TTTAAAAAAAAGG';
> $pattern="A{5,10}";
> while ($gene =~ m/(?=$pattern)/gi) {
>     $start = pos($gene) + 1;
>     print ++$hit, " hit starts at $start\n";
> }
>
> Generates:
> 1 hit starts at 4
> 2 hit starts at 5
> 3 hit starts at 6
> 4 hit starts at 7
>
> You could get around this by imposing a constraint to avoid trivial
> overlaps. OK if you know the length of the pattern, but not so good
> for more complex patterns. If there was I way to get the look ahead to
> match the longest string possible for a variable length pattern, then
> this approach could work, but I'm not sure if that is possible.
>
> Here's a solution I think does the job of reporting the extent of each
> match without a performance hit and works for patterns of any
> complexity, taking advantage of the special arrays containing hit
> indexes, @- and @+:
>
> $gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
> while ($gene =~ m/$pattern/gi){
>     $hit++;
>     printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
> }
>
> Generates:
> 1 hit at:  4 - 11
> 2 hit at: 16 - 21
>
> You can also use this approach to report the locations of any internal
> back references, if the pattern contains any parentheses, via $-[1],
> $+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
> patterns, but patterns not containing parens won't be penalized.
>
> Steve

Friedl's Regex book has outlined a few ways to get around the  
'naughty' variables $`, $&, and $' using substr() and $-[0], $+[0],  
or both, which makes sense since @+ and @- are arrays of positions  
instead of actual text.

$`  substr(target, 0, $-[0])
$&  substr(target, $-[0], $+[0] - $-[0])
$'  substr(target, $+[0])

Wonderful book!

chris

From benoit at ebi.ac.uk  Wed May 16 12:35:39 2007
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 16 May 2007 17:35:39 +0100
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
Message-ID: <464B32DB.6080607@ebi.ac.uk>

Hi Tony,

I don't know how simple it is in bioperl, but it is quite simple using 
the ensembl perl API.

Have a look here :

API instalation:
http://www.ensembl.org/info/software/api_installation.html
API tutorial :
http://www.ensembl.org/info/software/core/core_tutorial.html
API Perl module Documentation :
http://www.ensembl.org/info/software/Pdoc/ensembl/index.html

so you can do something similar to the example below :

# Get the 'COG6' gene from human

my $gene = $gene_adaptor->fetch_by_display_label('COG6');

print "GENE ", $gene->stable_id(), "\n";
# here you get gene coordinate

foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
     print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
     #print transcript coordinates
	
	foreach my $exon ( @{ $transcript->get_all_exons() } ) {
	#print the exon coordinates

	}
     }
}

Hope this helps

Benoit


Anthony Ferrari wrote:
 > Hi all,
 >
 > I want to do something relatively simple and I want to know how far 
Bioperl
 > tools could help me because I'm having troubles to get to the point.
 > Here is the pipeline :
 >
 > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
 > "GeneStructure"
 >
 > (*) :
 >>From the EntrezGene ID, I want to retrieve the structure of the gene 
which
 > means having the whole genomic sequence and having the start and end
 > positions of each exons, introns, UTR'....
 >
 > I thought of 2 ways to accomplish that :
 >
 >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
 > desired positions.
 >      this method should work but would take a little time to be ok.
 >
 >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
 > obtain a Bio::Seq object but I am not able to find any features stored in
 > it. So it doesn't seem that the get_Seq_by_id function get all 
information
 > contained in a EntrezGene entry (?) .
 >
 > Can somebody help me to make the right choice or show me the right way?
 >
 > I also saw that some packages detinated to deal with  gene structure 
exist
 > but I don't manage to know how to use it properly and even how to 
create one
 > of those objects !
 > Are those packages currently usable ?
 >
 >
 > Thanks in advance.
 > Best regards,
 > tony
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnsonm at gmail.com  Wed May 16 15:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 16 May 2007 14:11:18 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
Message-ID: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>

On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I believe all seqfeature location coordinates are designed to have
> start < stop for consistency; in cases where the strand matters (CDS,
> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> the two are reversed and the strand is flipped; at least that's the
> way locations are set up in BioPerl.
>
> chris

    Oh yeah?  I always tend to ensure that (start < stop), regardless
of strand, when working with sequence features...the other day, I
caught Glimmer2 emitting a prediction on the plus strand with start >
stop.  I was going to work up a patch for the parser, but I wonder,
should I just force everything to start < stop?  Or only predictions
on the plus strand?  Should all the parsers for all the ab initio
predictors ensure they emit features with coordinates like this?

From diogoat at gmail.com  Wed May 16 16:02:44 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Wed, 16 May 2007 17:02:44 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
Message-ID: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>

Dear all,

The script wich i wrote with your helps is working very good ( I paste the
script in the end of e-mail).
But I have another problem now, all the times wich I use the script im every
all the file have a diferent size...
Any ideia? what is the problem..? My conection? Problem on Ncbi? The script
maybe?

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

#############################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Trypanosoma cruzi [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);
my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>Trypanosoma_cruzi1.gb');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
#########################################################


From barry.moore at genetics.utah.edu  Wed May 16 17:13:27 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 16 May 2007 15:13:27 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
Message-ID: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>

Diogo,

I'd guess that this is a result of NCBI terminating the connection as  
Chris suggested previously.  There are a number of approaches you  
could use:  Download only fasta if that's all you need.  Download  
only IDs, and then use SeqHound, Batch Entrez or BioPerl to download  
those sequences or you could download the genbank files from the ftp  
site as Chris also suggested, and then run a bioperl script on each  
of those files.  I can see that you are looking at Trypanosomes, so  
doing this (on linux or  Mac OSX):

wget ftp://ftp.ncbi.nih.gov/genbank/gbinv*.seq.gz

will get you the 10 files in the invertebrate division from GenBank,  
and you could run a bioperl script  on those 10 files.

Barry

On May 16, 2007, at 2:02 PM, Diogo Tschoeke wrote:

> Dear all,
>
> The script wich i wrote with your helps is working very good ( I  
> paste the
> script in the end of e-mail).
> But I have another problem now, all the times wich I use the script  
> im every
> all the file have a diferent size...
> Any ideia? what is the problem..? My conection? Problem on Ncbi?  
> The script
> maybe?
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org
>
> #############################################################
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Trypanosoma cruzi  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>Trypanosoma_cruzi1.gb');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> #########################################################
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sac at bioperl.org  Wed May 16 18:29:16 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 15:29:16 -0700
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <464B32DB.6080607@ebi.ac.uk>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
	<464B32DB.6080607@ebi.ac.uk>
Message-ID: <8f200b4c0705161529h26e7c44fk54082a1156201861@mail.gmail.com>

Another option is to use DAS ( http://biodas.org ), which was designed
precisely to solve this sort of problem.

A DAS genome query is a URL that specifies the genome assembly version
on which the returned coordinates should be based. For example, get
all features and their coordinates associated with the human actin
gene on hg17:

http://das.biopackages.net/das/genome/human/17/feature?name=ACTA1

Ensembl, UCSC, and  other sites also provide DAS servers for genomic
features, but these serve up a different XML response format (DAS/1.x)
from what biopackages.net is serving (DAS/2). Here's are some links to
these servers, both DAS/1 and DAS/2:

http://www.biodas.org/wiki/DAS/1#Servers
http://www.biodas.org/wiki/DAS/2#Servers

By default, a DAS/2 server will return data in DAS2XML format, but you
can specify alternative formats if a server supports them. This is one
advantage of the DAS/2 retrieval spec, which is stable and is
described here:

http://biodas.org/documents/das2/das2_get.html

You may not be able to user an Entrez gene ID directly in the query.
It depends on whether these IDs are available on the given server.
Accessions and gene names should be OK. You can always map your Entrez
ids to accessions or gene names using this file
ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz .

Steve

On 5/16/07, Benoit Ballester <benoit at ebi.ac.uk> wrote:
> Hi Tony,
>
> I don't know how simple it is in bioperl, but it is quite simple using
> the ensembl perl API.
>
> Have a look here :
>
> API instalation:
> http://www.ensembl.org/info/software/api_installation.html
> API tutorial :
> http://www.ensembl.org/info/software/core/core_tutorial.html
> API Perl module Documentation :
> http://www.ensembl.org/info/software/Pdoc/ensembl/index.html
>
> so you can do something similar to the example below :
>
> # Get the 'COG6' gene from human
>
> my $gene = $gene_adaptor->fetch_by_display_label('COG6');
>
> print "GENE ", $gene->stable_id(), "\n";
> # here you get gene coordinate
>
> foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
>      print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
>      #print transcript coordinates
>
>         foreach my $exon ( @{ $transcript->get_all_exons() } ) {
>         #print the exon coordinates
>
>         }
>      }
> }
>
> Hope this helps
>
> Benoit
>
>
> Anthony Ferrari wrote:
>  > Hi all,
>  >
>  > I want to do something relatively simple and I want to know how far
> Bioperl
>  > tools could help me because I'm having troubles to get to the point.
>  > Here is the pipeline :
>  >
>  > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
>  > "GeneStructure"
>  >
>  > (*) :
>  >>From the EntrezGene ID, I want to retrieve the structure of the gene
> which
>  > means having the whole genomic sequence and having the start and end
>  > positions of each exons, introns, UTR'....
>  >
>  > I thought of 2 ways to accomplish that :
>  >
>  >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
>  > desired positions.
>  >      this method should work but would take a little time to be ok.
>  >
>  >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
>  > obtain a Bio::Seq object but I am not able to find any features stored in
>  > it. So it doesn't seem that the get_Seq_by_id function get all
> information
>  > contained in a EntrezGene entry (?) .
>  >
>  > Can somebody help me to make the right choice or show me the right way?
>  >
>  > I also saw that some packages detinated to deal with  gene structure
> exist
>  > but I don't manage to know how to use it properly and even how to
> create one
>  > of those objects !
>  > Are those packages currently usable ?
>  >
>  >
>  > Thanks in advance.
>  > Best regards,
>  > tony
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From heikki at sanbi.ac.za  Thu May 17 02:46:44 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 17 May 2007 08:46:44 +0200
Subject: [Bioperl-l] Writing OBO fiies
Message-ID: <200705170846.44641.heikki@sanbi.ac.za>


I've started putting together Bio::OntologyIO::obo::write_ontology().
The current parser ignores a number of fields in common obo files.
If anyone knows any issues regarding adding more information into obo ontology 
object, shout now.

I need to start parsing at least "xref_analog" and "subset" to get a 
reasonable roundtrip of obo files representing cell ontology and sequence 
ontology.

I am not aiming at extending the existing ontology interfaces but simply 
patching obo parsing, but I am open to suggestions.

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From bernd.web at gmail.com  Thu May 17 06:48:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 17 May 2007 12:48:07 +0200
Subject: [Bioperl-l] (Simple)Align
Message-ID: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>

Hi,

I am playing with alignment and would like to insert strings at
certain columns (so in all sequences in the alignment). I know about
the slice and remove_columns.
Is there already an insert_columns type of functionality?
Otherwise I'll just iterate over the sequences similar to
remove_columns (and give it a try to implement add_columns like
remove_columns).


Regards
Bernd

From Kevin.M.Brown at asu.edu  Thu May 17 11:17:04 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 17 May 2007 08:17:04 -0700
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B403284273@EX02.asurite.ad.asu.edu>

> I am playing with alignment and would like to insert strings 
> at certain columns (so in all sequences in the alignment). I 
> know about the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to 
> remove_columns (and give it a try to implement add_columns 
> like remove_columns).

Try reading the deobfuscator to see all the methods available to the
simplealign object.
http://bioperl.org/cgi-bin/deob_interface.cgi


From diogoat at gmail.com  Thu May 17 14:14:14 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Thu, 17 May 2007 15:14:14 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
Message-ID: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>

Hi Barry thank's for all your help,

I choose download the Invertebrates division of NCBI to machine...
but the I don't have thus script to get the sequences of the local file and
I know how to write...
i tried choose change in the script
the -db => 'nucleotide' for -db => 'local-gbdi.gb'
like I wrote below

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major',
                                -db     => '>local-gbdi.gb );
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

but didn't work because de Bio:DB::Query::GenBank is a perl module wich
conect at Ncbi to do my query and my Database is now local.

 I need the genomes of Trypanosoma cruzi, Trypanosoma brucei, Leishmania
major, Entamoeba and Plasmodium falciparum in the genbank format file.
Any Sugestion? Somebody have this script?
Help!
And thank's for the help!

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org


From barry.moore at genetics.utah.edu  Thu May 17 14:19:46 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 17 May 2007 12:19:46 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
	<638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
Message-ID: <F5104D8D-030D-4F01-884C-623B5F2D63CC@genetics.utah.edu>

Diogo-

Look at the bioperl documentation - there you will find a HowTo on  
SeqIO.  This will help you learn how to write scripts to load genbank  
flat files and you can then iterate over those files and check the  
organism to see if it's one that you want.  You should be able to  
find everything that you need in the documentation.

B

On May 17, 2007, at 12:14 PM, Diogo Tschoeke wrote:

> Hi Barry thank's for all your help,
>
> I choose download the Invertebrates division of NCBI to machine...
> but the I don't have thus script to get the sequences of the local  
> file and I know how to write...
> i tried choose change in the script
> the -db => 'nucleotide' for -db => 'local-gbdi.gb'
> like I wrote below
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major',
>                                 -db     => '>local-gbdi.gb );
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> but didn't work because de Bio:DB::Query::GenBank is a perl module  
> wich conect at Ncbi to do my query and my Database is now local.
>
>  I need the genomes of Trypanosoma cruzi, Trypanosoma brucei,  
> Leishmania major, Entamoeba and Plasmodium falciparum in the  
> genbank format file.
> Any Sugestion? Somebody have this script?
> Help!
> And thank's for the help!
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org


From torsten.seemann at infotech.monash.edu.au  Fri May 18 04:13:38 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 18 May 2007 18:13:38 +1000
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <46496E18.1000809@sendu.me.uk>
References: <46496E18.1000809@sendu.me.uk>
Message-ID: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>

Sendu,

> Back in August of last year I introduced Bio::PullParserI, a module that
> aids in the creation of fast SearchIO and Search modules. I've finally
> gotten around to implementing a Blast parser using the interface, which
> I've called Bio::SearchIO::blast_pull.
> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");
> Please try it out and feed-back any bugs you discover.

This is very cool!
Here's hoping NCBI don't change the default output format too much.

You should be able to add "rpsblast -p T" support as this is identical
to "blastall -p blastp" except for first line:
BLASTP 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]

The only problem is the (rarely used) "rpsblast -p F" mode which
looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
"Frame"

 Score = 29.6 bits (65), Expect = 0.26
 Identities = 10/26 (38%), Positives = 12/26 (46%)
 Frame = -1

BUT has the same header line, so you can't know -p F was used until
you see a "Frame = ??" in a hit (what were they thinking???).

TBLASTN 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...

Thanks for the good work. Shame I converted most of our systems to blastxml :-(

--Torsten

From cjfields at uiuc.edu  Fri May 18 09:39:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 08:39:05 -0500
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
References: <46496E18.1000809@sendu.me.uk>
	<a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
Message-ID: <2219EED8-F721-4586-B029-EF6CD9C32246@uiuc.edu>

I'll be looking at cleaning up SearchIO::blastxml soon myself.  It  
needs to be more memory-friendly with large XML files and PSI-BLAST  
iterations need to be addressed (nope, I haven't forgot about that!).

There is a XML::LibXML pull parser interface (XML::LibXML::Reader) we  
could look into...

chris

On May 18, 2007, at 3:13 AM, Torsten Seemann wrote:

> Sendu,
>
>> Back in August of last year I introduced Bio::PullParserI, a  
>> module that
>> aids in the creation of fast SearchIO and Search modules. I've  
>> finally
>> gotten around to implementing a Blast parser using the interface,  
>> which
>> I've called Bio::SearchIO::blast_pull.
>> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file =>  
>> "file");
>> Please try it out and feed-back any bugs you discover.
>
> This is very cool!
> Here's hoping NCBI don't change the default output format too much.
>
> You should be able to add "rpsblast -p T" support as this is identical
> to "blastall -p blastp" except for first line:
> BLASTP 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]
>
> The only problem is the (rarely used) "rpsblast -p F" mode which
> looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
> "Frame"
>
>  Score = 29.6 bits (65), Expect = 0.26
>  Identities = 10/26 (38%), Positives = 12/26 (46%)
>  Frame = -1
>
> BUT has the same header line, so you can't know -p F was used until
> you see a "Frame = ??" in a hit (what were they thinking???).
>
> TBLASTN 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...
>
> Thanks for the good work. Shame I converted most of our systems to  
> blastxml :-(
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri May 18 10:00:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 09:00:38 -0500
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <239FDEF1-38D4-47B8-AC71-514B61BDF9E0@uiuc.edu>

Sounds great to me!  Sohel Merchant might have some ideas...

chris

On May 17, 2007, at 1:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat May 19 20:54:11 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 20:54:11 -0400
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <221DB1CF-2F4E-47D4-80A8-D8D8BD777423@gmx.net>

Sounds great to me! -hilmar

On May 17, 2007, at 2:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat May 19 21:36:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 21:36:49 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
Message-ID: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>

FYI. Is it worth thinking about implementing a remote access  
interface to the CIPRES tree inference tools, similar to what we have  
for RemoteBlast?

	-hilmar

Begin forwarded message:

From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
Date: May 16, 2007 6:48:49 AM EDT
Subject: FW: release of cipres portal for tree inference

The CIPRES Central Resource team is pleased to announce the first public
release of the CIPRES portal for Tree Inference.

The portal is based on capabilities exposed by the Cipres software
libraries, which were constructed as a Joint Effort between Mark Holder
at Florida State University and the SDSC SW engineering team led by
Terri Liebowitz.

It currently presents Parsimony (PAUP) and Likelihood (GARLI and RAxML)
tools with or without boosting from RecIDCM3 created by Usman Roshan and
co-workers. Nexus and Phylip files are currently supported.

The site is available to all, and is underwritten by the CIPRES cluster
at SDSC.

The portal is fully supported by the SDSC team, with contributions and
new features introduced by the team in collaboration with Mark Holder
and Rutger Vos. At present weekly releases are made with improvements
and new features.

You can visit the portal at the Cipres Web Site.

http://www.phylo.org/sub_sections/portal.htm

Please forward this information to anyone you feel may find the
portal useful.

On behalf of the whole CIPRES team,

Mark

Mark A. Miller, PhD
Principal Investigator, Biology
San Diego Supercomputer Center
University of California, San Diego
La Jolla, CA, 92093-0505
Tel: 858-822-0866
Fax: 858-822-3610

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat May 19 22:10:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 19 May 2007 21:10:53 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
Message-ID: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>

I think it would be worthwhile.  Would we place it in bioperl-run?

chris

On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:

> FYI. Is it worth thinking about implementing a remote access
> interface to the CIPRES tree inference tools, similar to what we have
> for RemoteBlast?
>
> 	-hilmar
>
> Begin forwarded message:
>
> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
> Date: May 16, 2007 6:48:49 AM EDT
> Subject: FW: release of cipres portal for tree inference
>
> The CIPRES Central Resource team is pleased to announce the first  
> public
> release of the CIPRES portal for Tree Inference.
>
> The portal is based on capabilities exposed by the Cipres software
> libraries, which were constructed as a Joint Effort between Mark  
> Holder
> at Florida State University and the SDSC SW engineering team led by
> Terri Liebowitz.
>
> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
> RAxML)
> tools with or without boosting from RecIDCM3 created by Usman  
> Roshan and
> co-workers. Nexus and Phylip files are currently supported.
>
> The site is available to all, and is underwritten by the CIPRES  
> cluster
> at SDSC.
>
> The portal is fully supported by the SDSC team, with contributions and
> new features introduced by the team in collaboration with Mark Holder
> and Rutger Vos. At present weekly releases are made with improvements
> and new features.
>
> You can visit the portal at the Cipres Web Site.
>
> http://www.phylo.org/sub_sections/portal.htm
>
> Please forward this information to anyone you feel may find the
> portal useful.
>
> On behalf of the whole CIPRES team,
>
> Mark
>
> Mark A. Miller, PhD
> Principal Investigator, Biology
> San Diego Supercomputer Center
> University of California, San Diego
> La Jolla, CA, 92093-0505
> Tel: 858-822-0866
> Fax: 858-822-3610
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat May 19 22:19:47 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 22:19:47 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
Message-ID: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>

I guess so. That's where RemoteBlast is too, if I'm not mistaken?

What sucks about the UI from a programming perspective is that it  
goes through multiple screens. There may be a lot of screen-scraping.

	-hilmar

On May 19, 2007, at 10:10 PM, Chris Fields wrote:

> I think it would be worthwhile.  Would we place it in bioperl-run?
>
> chris
>
> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>
>> FYI. Is it worth thinking about implementing a remote access
>> interface to the CIPRES tree inference tools, similar to what we have
>> for RemoteBlast?
>>
>> 	-hilmar
>>
>> Begin forwarded message:
>>
>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>> Date: May 16, 2007 6:48:49 AM EDT
>> Subject: FW: release of cipres portal for tree inference
>>
>> The CIPRES Central Resource team is pleased to announce the first  
>> public
>> release of the CIPRES portal for Tree Inference.
>>
>> The portal is based on capabilities exposed by the Cipres software
>> libraries, which were constructed as a Joint Effort between Mark  
>> Holder
>> at Florida State University and the SDSC SW engineering team led by
>> Terri Liebowitz.
>>
>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
>> RAxML)
>> tools with or without boosting from RecIDCM3 created by Usman  
>> Roshan and
>> co-workers. Nexus and Phylip files are currently supported.
>>
>> The site is available to all, and is underwritten by the CIPRES  
>> cluster
>> at SDSC.
>>
>> The portal is fully supported by the SDSC team, with contributions  
>> and
>> new features introduced by the team in collaboration with Mark Holder
>> and Rutger Vos. At present weekly releases are made with improvements
>> and new features.
>>
>> You can visit the portal at the Cipres Web Site.
>>
>> http://www.phylo.org/sub_sections/portal.htm
>>
>> Please forward this information to anyone you feel may find the
>> portal useful.
>>
>> On behalf of the whole CIPRES team,
>>
>> Mark
>>
>> Mark A. Miller, PhD
>> Principal Investigator, Biology
>> San Diego Supercomputer Center
>> University of California, San Diego
>> La Jolla, CA, 92093-0505
>> Tel: 858-822-0866
>> Fax: 858-822-3610
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sun May 20 01:06:53 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 19 May 2007 22:06:53 -0700
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
Message-ID: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>

technically remoteblast is in bioperl-live, but for historical/ease  
of user-install purposes (i.e. so many people want to use blast out  
of the box, we kept it in bioperl-live to not force them to install  
bioperl-run).

I think it would be great to have the interface - can we do it all  
via HTTP or will it require some installation of client software and/ 
or CORBA?

-jason
On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:

> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>
> What sucks about the UI from a programming perspective is that it
> goes through multiple screens. There may be a lot of screen-scraping.
>
> 	-hilmar
>
> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>
>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>
>> chris
>>
>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>
>>> FYI. Is it worth thinking about implementing a remote access
>>> interface to the CIPRES tree inference tools, similar to what we  
>>> have
>>> for RemoteBlast?
>>>
>>> 	-hilmar
>>>
>>> Begin forwarded message:
>>>
>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>> Date: May 16, 2007 6:48:49 AM EDT
>>> Subject: FW: release of cipres portal for tree inference
>>>
>>> The CIPRES Central Resource team is pleased to announce the first
>>> public
>>> release of the CIPRES portal for Tree Inference.
>>>
>>> The portal is based on capabilities exposed by the Cipres software
>>> libraries, which were constructed as a Joint Effort between Mark
>>> Holder
>>> at Florida State University and the SDSC SW engineering team led by
>>> Terri Liebowitz.
>>>
>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>> RAxML)
>>> tools with or without boosting from RecIDCM3 created by Usman
>>> Roshan and
>>> co-workers. Nexus and Phylip files are currently supported.
>>>
>>> The site is available to all, and is underwritten by the CIPRES
>>> cluster
>>> at SDSC.
>>>
>>> The portal is fully supported by the SDSC team, with contributions
>>> and
>>> new features introduced by the team in collaboration with Mark  
>>> Holder
>>> and Rutger Vos. At present weekly releases are made with  
>>> improvements
>>> and new features.
>>>
>>> You can visit the portal at the Cipres Web Site.
>>>
>>> http://www.phylo.org/sub_sections/portal.htm
>>>
>>> Please forward this information to anyone you feel may find the
>>> portal useful.
>>>
>>> On behalf of the whole CIPRES team,
>>>
>>> Mark
>>>
>>> Mark A. Miller, PhD
>>> Principal Investigator, Biology
>>> San Diego Supercomputer Center
>>> University of California, San Diego
>>> La Jolla, CA, 92093-0505
>>> Tel: 858-822-0866
>>> Fax: 858-822-3610
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0001.bin 

From bernd.web at gmail.com  Sun May 20 10:56:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 20 May 2007 16:56:07 +0200
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
	<C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
Message-ID: <716af09c0705200756h46bf2134x3d6841d2a98744c0@mail.gmail.com>

Hi

I have made a simple add_columns function in SimpleAlign along the
lines of remove_columns. I only need to insert characters that are the
same for all sequences:

=head2 add_columns

 Title     : add_columns
  Usage     : $aln2 = $aln->add_columns([0, 10, '.'], [12, 15])
  Function  : Creates an alignment with columns added by specifying
the columns by number and supplying the character (optional) to insert
in all sequences. Default character is gap_char.
  Returns   : Bio::SimpleAlign object
  Args      : Array ref where the referenced array contains a pair of
integers that
             that specify a column range and optionally the character to insert.
             The first column is 0.

=cut

The functionalilty could be extended:
- possibility to supply a string to insert (for all sequences)
- possibility to define the string to insert on a per sequence basis
(although this may be more transparant to do outside SimpleAlign).

After some final checks I could supply it (e.g. via bugzilla).


Regards,
Bernd


On 5/17/07, Jason Stajich <jason at bioperl.org> wrote:
> not yet - when I did this to insert intron positions I just manipulated the
> sequence strings outside of SimpleAlign, but I think it would be nice to
> have an insert function.
>
> -jason
>
> On May 17, 2007, at 3:48 AM, Bernd Web wrote:
>
> Hi,
>
> I am playing with alignment and would like to insert strings at
> certain columns (so in all sequences in the alignment). I know about
> the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to
> remove_columns (and give it a try to implement add_columns like
> remove_columns).
>
>
> Regards
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>

From hlapp at gmx.net  Sun May 20 11:59:03 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 20 May 2007 11:59:03 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
Message-ID: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>

Just HTTP, no CORBA or other stuff needed client-side.

Ultimately it would of course be nice if they offered a more SOA  
compliant interface too, to obviate the screen-scraping need.  
However, if I understand the UI correctly the screen scraping is - if  
at all - only needed for walking through the steps, and for  
extracting the location of the result. The result itself is in NEXUS  
format, as a separate file.

	-hilmar

On May 20, 2007, at 1:06 AM, Jason Stajich wrote:

> technically remoteblast is in bioperl-live, but for historical/ease  
> of user-install purposes (i.e. so many people want to use blast out  
> of the box, we kept it in bioperl-live to not force them to install  
> bioperl-run).
>
> I think it would be great to have the interface - can we do it all  
> via HTTP or will it require some installation of client software  
> and/or CORBA?
>
> -jason
> On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:
>
>> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>>
>> What sucks about the UI from a programming perspective is that it
>> goes through multiple screens. There may be a lot of screen-scraping.
>>
>> 	-hilmar
>>
>> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>>
>>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>>
>>> chris
>>>
>>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>>
>>>> FYI. Is it worth thinking about implementing a remote access
>>>> interface to the CIPRES tree inference tools, similar to what we  
>>>> have
>>>> for RemoteBlast?
>>>>
>>>> 	-hilmar
>>>>
>>>> Begin forwarded message:
>>>>
>>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>>> Date: May 16, 2007 6:48:49 AM EDT
>>>> Subject: FW: release of cipres portal for tree inference
>>>>
>>>> The CIPRES Central Resource team is pleased to announce the first
>>>> public
>>>> release of the CIPRES portal for Tree Inference.
>>>>
>>>> The portal is based on capabilities exposed by the Cipres software
>>>> libraries, which were constructed as a Joint Effort between Mark
>>>> Holder
>>>> at Florida State University and the SDSC SW engineering team led by
>>>> Terri Liebowitz.
>>>>
>>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>>> RAxML)
>>>> tools with or without boosting >from RecIDCM3 created by Usman
>>>> Roshan and
>>>> co-workers. Nexus and Phylip files are currently supported.
>>>>
>>>> The site is available to all, and is underwritten by the CIPRES
>>>> cluster
>>>> at SDSC.
>>>>
>>>> The portal is fully supported by the SDSC team, with contributions
>>>> and
>>>> new features introduced by the team in collaboration with Mark  
>>>> Holder
>>>> and Rutger Vos. At present weekly releases are made with  
>>>> improvements
>>>> and new features.
>>>>
>>>> You can visit the portal at the Cipres Web Site.
>>>>
>>>> http://www.phylo.org/sub_sections/portal.htm
>>>>
>>>> Please forward this information to anyone you feel may find the
>>>> portal useful.
>>>>
>>>> On behalf of the whole CIPRES team,
>>>>
>>>> Mark
>>>>
>>>> Mark A. Miller, PhD
>>>> Principal Investigator, Biology
>>>> San Diego Supercomputer Center
>>>> University of California, San Diego
>>>> La Jolla, CA, 92093-0505
>>>> Tel: 858-822-0866
>>>> Fax: 858-822-3610
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Mon May 21 11:19:56 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 10:19:56 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
Message-ID: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>

Sounds like time to bust out WWW::Mechanize.  I didn't step through
the whole process, but the first screen/step looks okay.  Plain HTML
form with plain buttons.  Looks like the Javascript is only getting
involved for client-side sanity checking.  Should be easy to automate
(Don't look at me, I've bitten off a bit too much as it is).

On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> Just HTTP, no CORBA or other stuff needed client-side.
>
> Ultimately it would of course be nice if they offered a more SOA
> compliant interface too, to obviate the screen-scraping need.
> However, if I understand the UI correctly the screen scraping is - if
> at all - only needed for walking through the steps, and for
> extracting the location of the result. The result itself is in NEXUS
> format, as a separate file.
>
>         -hilmar

From cjfields at uiuc.edu  Mon May 21 16:11:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:11:36 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
	<ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
Message-ID: <61E0D74B-77F7-499B-A0B7-B1E5106964E6@uiuc.edu>

It would be nice to have a generalized interface (SOAP, CGI,  
anything), as Hilmar states.  I agree WWW::Mechanize is prob. the way  
to go for now.  Don't know who wants to take it up...

chris

On May 21, 2007, at 10:19 AM, Mark Johnson wrote:

> Sounds like time to bust out WWW::Mechanize.  I didn't step through
> the whole process, but the first screen/step looks okay.  Plain HTML
> form with plain buttons.  Looks like the Javascript is only getting
> involved for client-side sanity checking.  Should be easy to automate
> (Don't look at me, I've bitten off a bit too much as it is).
>
> On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> Just HTTP, no CORBA or other stuff needed client-side.
>>
>> Ultimately it would of course be nice if they offered a more SOA
>> compliant interface too, to obviate the screen-scraping need.
>> However, if I understand the UI correctly the screen scraping is - if
>> at all - only needed for walking through the steps, and for
>> extracting the location of the result. The result itself is in NEXUS
>> format, as a separate file.
>>
>>         -hilmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon May 21 16:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:35:41 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
Message-ID: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>

On May 16, 2007, at 2:11 PM, Mark Johnson wrote:

> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> I believe all seqfeature location coordinates are designed to have
>> start < stop for consistency; in cases where the strand matters (CDS,
>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>> the two are reversed and the strand is flipped; at least that's the
>> way locations are set up in BioPerl.
>>
>> chris
>
>     Oh yeah?  I always tend to ensure that (start < stop), regardless
> of strand, when working with sequence features...the other day, I
> caught Glimmer2 emitting a prediction on the plus strand with start >
> stop.  I was going to work up a patch for the parser, but I wonder,
> should I just force everything to start < stop?  Or only predictions
> on the plus strand?  Should all the parsers for all the ab initio
> predictors ensure they emit features with coordinates like this?

Odd that it would predict a start > stop on the plus strand, though  
it may be corrected in Glimmer3.  Does the same prediction show up in  
Glimmer3?

chris

From johnsonm at gmail.com  Mon May 21 16:48:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 15:48:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>

Check the test data for Glimmer2 and Glimmer3.  They both predict one
large gene, I'd guess covering most of the sequence, in frame +1.
That's probably a bogus prediction, but that's not up to the parser to
decide.  I hadn't noticed it until recently.

I sent a patch via bugzilla to swap the coordinates if start > end and
strand > 0.

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
> > On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> I believe all seqfeature location coordinates are designed to have
> >> start < stop for consistency; in cases where the strand matters (CDS,
> >> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> >> the two are reversed and the strand is flipped; at least that's the
> >> way locations are set up in BioPerl.
> >>
> >> chris
> >
> >     Oh yeah?  I always tend to ensure that (start < stop), regardless
> > of strand, when working with sequence features...the other day, I
> > caught Glimmer2 emitting a prediction on the plus strand with start >
> > stop.  I was going to work up a patch for the parser, but I wonder,
> > should I just force everything to start < stop?  Or only predictions
> > on the plus strand?  Should all the parsers for all the ab initio
> > predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris
>

From cjfields at uiuc.edu  Mon May 21 16:56:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:56:50 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <6186D928-A47E-4EED-B06A-50E25A4893CC@uiuc.edu>

On May 21, 2007, at 3:35 PM, Chris Fields wrote:

> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
>> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> I believe all seqfeature location coordinates are designed to have
>>> start < stop for consistency; in cases where the strand matters  
>>> (CDS,
>>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>>> the two are reversed and the strand is flipped; at least that's the
>>> way locations are set up in BioPerl.
>>>
>>> chris
>>
>>     Oh yeah?  I always tend to ensure that (start < stop), regardless
>> of strand, when working with sequence features...the other day, I
>> caught Glimmer2 emitting a prediction on the plus strand with start >
>> stop.  I was going to work up a patch for the parser, but I wonder,
>> should I just force everything to start < stop?  Or only predictions
>> on the plus strand?  Should all the parsers for all the ab initio
>> predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris

... and I see that it does (per your bug report).  The next thing to  
ask is how often these odd Glimmer hits occur and whether others have  
seen the same thing.  Maybe there's an explanation (bug, etc) but I  
can't immediately think of anything that makes sense unless it's  
running the reverse of the + strand as a control for some reason.

chris

From cjfields at uiuc.edu  Mon May 21 17:17:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 16:17:37 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
Message-ID: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>


On May 21, 2007, at 3:48 PM, Mark Johnson wrote:

> Check the test data for Glimmer2 and Glimmer3.  They both predict one
> large gene, I'd guess covering most of the sequence, in frame +1.
> That's probably a bogus prediction, but that's not up to the parser to
> decide.  I hadn't noticed it until recently.
>
> I sent a patch via bugzilla to swap the coordinates if start > end and
> strand > 0.

I think I know what it is.  If you mean these predictions:

Glimmer2:

    27    29263        6  [+1 L= 684 r=-1.187]

Glimmer3:

orf00001    29263        9  +1     9.60

Glimmer2/3 are predicting a gene for a circular chromosome that  
starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off  
the stop codon).  Note in Glimmer2 detailed output the end is 29946  
and the length of the sequence is 29940, so Glimmer2 artificially  
extends the end of the sequence with part of the start.

This is handled as a split location in bioperl and in most GenBank  
files; the above would be a location string like 'join 
(29263..29940,1..9)'.  If you switched the start and stop the  
location would be '9..29263' which wouldn't be correct (and would be  
a huge gene).

chris

From johnsonm at gmail.com  Mon May 21 17:21:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 16:21:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
Message-ID: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>

    That makes sense.  Is that behavior documented anywhere?  I'll
feel like less of an idiot if it's not.  8)  Either way, if you're
sure that's whats going on, I'll fix up the parser to handle that as a
split location.

> I think I know what it is.  If you mean these predictions:
>
> Glimmer2:
>
>     27    29263        6  [+1 L= 684 r=-1.187]
>
> Glimmer3:
>
> orf00001    29263        9  +1     9.60
>
> Glimmer2/3 are predicting a gene for a circular chromosome that
> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> and the length of the sequence is 29940, so Glimmer2 artificially
> extends the end of the sequence with part of the start.
>
> This is handled as a split location in bioperl and in most GenBank
> files; the above would be a location string like 'join
> (29263..29940,1..9)'.  If you switched the start and stop the
> location would be '9..29263' which wouldn't be correct (and would be
> a huge gene).
>
> chris
>

From cjfields at uiuc.edu  Mon May 21 19:13:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 18:13:24 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
Message-ID: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>

glimmer2/3 both assume the genome is circular by default (I'm  
assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to  
the Glimmer3 release notes the detail file has the information in the  
header; from the Glimmer3 data used for tests:

Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA  
Glimmer3.icm Glimmer3

Sequence file = ../BCTDNA
ICM model file = Glimmer3.icm
Excluded regions file = none
List of orfs file = none
Truncated orfs = false
Circular genome = true
...

There are options available for glimmer3 (-L, -X) that specify a  
linear sequence or allow ORFs to extend past the end of the sequence  
analyzed (the latter assumes a linear sequence).

chris

On May 21, 2007, at 4:21 PM, Mark Johnson wrote:

>     That makes sense.  Is that behavior documented anywhere?  I'll
> feel like less of an idiot if it's not.  8)  Either way, if you're
> sure that's whats going on, I'll fix up the parser to handle that as a
> split location.
>
>> I think I know what it is.  If you mean these predictions:
>>
>> Glimmer2:
>>
>>     27    29263        6  [+1 L= 684 r=-1.187]
>>
>> Glimmer3:
>>
>> orf00001    29263        9  +1     9.60
>>
>> Glimmer2/3 are predicting a gene for a circular chromosome that
>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>> and the length of the sequence is 29940, so Glimmer2 artificially
>> extends the end of the sequence with part of the start.
>>
>> This is handled as a split location in bioperl and in most GenBank
>> files; the above would be a location string like 'join
>> (29263..29940,1..9)'.  If you switched the start and stop the
>> location would be '9..29263' which wouldn't be correct (and would be
>> a huge gene).
>>
>> chris
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnsonm at gmail.com  Mon May 21 19:57:03 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 18:57:03 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>

    Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
this for a fix?  For plus strand predictions with start > end, use a
split location.  For minus strand predictions with start < end, use a
split location.  Without knowing the length of the sequence, that's
the best that can be done, I think.
    Unless there are objections, I'll go code that up.  Close that bug
out as 'requester is an idiot'.  8)

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:
>
> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3
>
> Sequence file = ../BCTDNA
> ICM model file = Glimmer3.icm
> Excluded regions file = none
> List of orfs file = none
> Truncated orfs = false
> Circular genome = true
> ...
>
> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).
>
> chris
>
> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>
> >     That makes sense.  Is that behavior documented anywhere?  I'll
> > feel like less of an idiot if it's not.  8)  Either way, if you're
> > sure that's whats going on, I'll fix up the parser to handle that as a
> > split location.
> >
> >> I think I know what it is.  If you mean these predictions:
> >>
> >> Glimmer2:
> >>
> >>     27    29263        6  [+1 L= 684 r=-1.187]
> >>
> >> Glimmer3:
> >>
> >> orf00001    29263        9  +1     9.60
> >>
> >> Glimmer2/3 are predicting a gene for a circular chromosome that
> >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> >> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> >> and the length of the sequence is 29940, so Glimmer2 artificially
> >> extends the end of the sequence with part of the start.
> >>
> >> This is handled as a split location in bioperl and in most GenBank
> >> files; the above would be a location string like 'join
> >> (29263..29940,1..9)'.  If you switched the start and stop the
> >> location would be '9..29263' which wouldn't be correct (and would be
> >> a huge gene).
> >>
> >> chris
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From torsten.seemann at infotech.monash.edu.au  Mon May 21 20:29:47 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 22 May 2007 10:29:47 +1000
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>

> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:

You beat me to the reply Chris - yes, Glimmer2/3 assume circular
chromosome by default. I had forgotten about this in earlier
discussions of the new Glimmer parsers as I normally run it in
--linear / -L mode (even if I know it is circular) because it is
easier to handle, and our sequencer/assembler team usually gets the
origin of replication right.

> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3

I did a double-take here - that's the path to my Glimmer3
installation! It took me a couple of minutes to realise that you got
it from the bioperl test data I created. D'oh! :-)

> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).

If the -L mode should produce Bio::Location::Split objects, I guess if
-X is used
it should produce Bio::Location::Fuzzy objects too...

--Torsten

From cjfields at uiuc.edu  Mon May 21 20:59:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 19:59:20 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
Message-ID: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>

You can add the necessary patch to the bug report when it's ready; no  
need to close it out.

The most complete file format to parse seems to be the details file;  
it contains the sequence length:

 >BCTDNA
Sequence length = 29940

which can be used for the split location.  As Torsten points out, use  
of -X could also potentially produce fuzzy locations.

Since the parser currently only parses predict files, you could  
optionally supply the parser with the seq length and emit a warning  
if seqfeatures requiring it are produced, such as the sporadic ones  
which wrap around.  If one were using the bioperl-run module this  
could be automated a bit by passing the seq length in to the parser  
object by adding the seq length to the constructor argument list.

chris

On May 21, 2007, at 6:57 PM, Mark Johnson wrote:

>     Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
> this for a fix?  For plus strand predictions with start > end, use a
> split location.  For minus strand predictions with start < end, use a
> split location.  Without knowing the length of the sequence, that's
> the best that can be done, I think.
>     Unless there are objections, I'll go code that up.  Close that bug
> out as 'requester is an idiot'.  8)
>
> On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>>
>> Sequence file = ../BCTDNA
>> ICM model file = Glimmer3.icm
>> Excluded regions file = none
>> List of orfs file = none
>> Truncated orfs = false
>> Circular genome = true
>> ...
>>
>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>>
>> chris
>>
>> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>>
>>>     That makes sense.  Is that behavior documented anywhere?  I'll
>>> feel like less of an idiot if it's not.  8)  Either way, if you're
>>> sure that's whats going on, I'll fix up the parser to handle that  
>>> as a
>>> split location.
>>>
>>>> I think I know what it is.  If you mean these predictions:
>>>>
>>>> Glimmer2:
>>>>
>>>>     27    29263        6  [+1 L= 684 r=-1.187]
>>>>
>>>> Glimmer3:
>>>>
>>>> orf00001    29263        9  +1     9.60
>>>>
>>>> Glimmer2/3 are predicting a gene for a circular chromosome that
>>>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>>>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>>>> and the length of the sequence is 29940, so Glimmer2 artificially
>>>> extends the end of the sequence with part of the start.
>>>>
>>>> This is handled as a split location in bioperl and in most GenBank
>>>> files; the above would be a location string like 'join
>>>> (29263..29940,1..9)'.  If you switched the start and stop the
>>>> location would be '9..29263' which wouldn't be correct (and  
>>>> would be
>>>> a huge gene).
>>>>
>>>> chris
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon May 21 21:00:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 20:00:58 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
Message-ID: <E22A8442-E00D-4732-9D80-EE61C75732B7@uiuc.edu>


On May 21, 2007, at 7:29 PM, Torsten Seemann wrote:

>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>
> You beat me to the reply Chris - yes, Glimmer2/3 assume circular
> chromosome by default. I had forgotten about this in earlier
> discussions of the new Glimmer parsers as I normally run it in
> --linear / -L mode (even if I know it is circular) because it is
> easier to handle, and our sequencer/assembler team usually gets the
> origin of replication right.
>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>
> I did a double-take here - that's the path to my Glimmer3
> installation! It took me a couple of minutes to realise that you got
> it from the bioperl test data I created. D'oh! :-)

Yep, I forgot about that!

>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>
> If the -L mode should produce Bio::Location::Split objects, I guess if
> -X is used
> it should produce Bio::Location::Fuzzy objects too...
>
> --Torsten

True, didn't think about that one.  Def. something to consider adding  
in.

chris


From johnsonm at gmail.com  Tue May 22 14:04:31 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 22 May 2007 13:04:31 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
	<A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
Message-ID: <ebf5eb170705221104s486ff488u1d8c0b87dd193861@mail.gmail.com>

Yes, Glimmer3 outputs the length of the input sequence.  I don't
believe Glimmer2 does.

> The most complete file format to parse seems to be the details file;
> it contains the sequence length:
>
>  >BCTDNA
> Sequence length = 29940

> Since the parser currently only parses predict files, you could
> optionally supply the parser with the seq length and emit a warning
> if seqfeatures requiring it are produced, such as the sporadic ones
> which wrap around.  If one were using the bioperl-run module this
> could be automated a bit by passing the seq length in to the parser
> object by adding the seq length to the constructor argument list.

I think we can spot wrap-around genes easily enough without knowing
the length of the input sequence.  Having it just means we can perform
a sanity check or two, such as making sure 'wraparound' genes are
within N bases of the end of the input sequence.  Any suggestions on a
good default value for N?

Parsing both output files for glimmer3 will be a little tricky.  The
constructor for Bio::Tools::Glimmer calls $class->SUPER::new(@args);,
which hits the constructor for Bio::Tools::AnalysisResult, which does
the same thing.  It all ends up in Bio::Root::IO::_initialize_io,
which grabs the -file arg and opens it.  So, either let, Bio::Root::IO
handle -file and have Bio::Tools::Glimmer handle, say -detail file, or
have Bio::Tools::Glimmer just implement   intialize_io() and hopefully
that will fly..

From ClarkeW at AGR.GC.CA  Tue May 22 17:10:08 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Tue, 22 May 2007 15:10:08 -0600
Subject: [Bioperl-l] TextResultWriter
Message-ID: <C278B850.1002%ClarkeW@AGR.GC.CA>

Hi, 

I am interested in becoming a bioperl developer as I have recently found a
bug in TextResultWriter. I know that I should submit the bug fixes using the
protocol outlined in the How To but I haven't been able to login to the CVS
anonymously to check it out. However, I have checked that the bug still
exists in the most recent version of the code using the web interface to the
CVS repositories. The bug is between lines 433 and 443, and deals with the
reporting of the number of letters in the database and the number of entries
in the database. My fix would be to change the existing code block:

from:

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_entries()),
        &_numwithcommas($result->database_letters()),
        $result->get_parameter('matrix') || '');

to: 

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_letters()),
        &_numwithcommas($result->database_entries()),
        $result->get_parameter('matrix') || '');

I believe that this is a simple enough modification that it does not require
any new test cases.

Cheers, Wayne


From dmessina at wustl.edu  Wed May 23 02:06:52 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 23 May 2007 01:06:52 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C278B850.1002%ClarkeW@AGR.GC.CA>
References: <C278B850.1002%ClarkeW@AGR.GC.CA>
Message-ID: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>

Hi Wayne,

I submitted the bug report on your behalf

	http://bugzilla.open-bio.org/show_bug.cgi?id=2300

and committed your patch. Thanks for reporting this, and thanks even  
more for including a patch!

Regarding your trouble checking out the repository via anonymous CVS,  
could you post the transcript of your attempt so we can get a better  
look at what's going wrong?

Dave


From ClarkeW at AGR.GC.CA  Wed May 23 10:39:17 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Wed, 23 May 2007 08:39:17 -0600
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>
Message-ID: <C279AE35.1008%ClarkeW@AGR.GC.CA>

With regards to not being able to connect, I have discovered that the reason
I cannot connect is that our firewall is blocking my access. It appears that
I am not the first person to have this problem but that the people in charge
are firm in their position to block the anonymous access port. However, if I
obtain a developer account I will be able to access the CVS.

Cheers, Wayne


On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:

> Hi Wayne,
> 
> I submitted the bug report on your behalf
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
> 
> and committed your patch. Thanks for reporting this, and thanks even
> more for including a patch!
> 
> Regarding your trouble checking out the repository via anonymous CVS,
> could you post the transcript of your attempt so we can get a better
> look at what's going wrong?
> 
> Dave
> 
> 


From cjfields at uiuc.edu  Wed May 23 12:16:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 23 May 2007 11:16:32 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C279AE35.1008%ClarkeW@AGR.GC.CA>
References: <C279AE35.1008%ClarkeW@AGR.GC.CA>
Message-ID: <7077B4AB-A3B5-4EAE-9994-0EF629D2DE2B@uiuc.edu>

You can always use the browsable CVS link to download a tarball if  
that works for you.

http://www.bioperl.org/wiki/Using_CVS
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/? 
cvsroot=bioperl

The link to download is at the bottom of the page.

chris

On May 23, 2007, at 9:39 AM, ClarkeW wrote:

> With regards to not being able to connect, I have discovered that  
> the reason
> I cannot connect is that our firewall is blocking my access. It  
> appears that
> I am not the first person to have this problem but that the people  
> in charge
> are firm in their position to block the anonymous access port.  
> However, if I
> obtain a developer account I will be able to access the CVS.
>
> Cheers, Wayne
>
>
> On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:
>
>> Hi Wayne,
>>
>> I submitted the bug report on your behalf
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
>>
>> and committed your patch. Thanks for reporting this, and thanks even
>> more for including a patch!
>>
>> Regarding your trouble checking out the repository via anonymous CVS,
>> could you post the transcript of your attempt so we can get a better
>> look at what's going wrong?
>>
>> Dave
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Xianjun.Dong at bccs.uib.no  Tue May 29 07:57:39 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 13:57:39 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
Message-ID: <465C1533.6070900@ii.uib.no>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kaks_methods.pl
Type: application/x-perl
Size: 2732 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment.bin 

From avilella at gmail.com  Tue May 29 09:02:44 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:02:44 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>

codeml in PAML can give different results in cases where the optimization
reaches different local maxima depending on the different starting points of
each run (seed values). So, at least for some methods and options, this
instability is inherent to the underlying algorithm.

Even more, for some methods and options, it is even recommended in PAML
documentation to run the same data more than once, to see if the results are
the same, which would be a good indication that the model is robust given
the data.

Maybe PAML's author can give a more specific answer for your data at:

http://www.rannala.org/gsf/viewforum.php?f=1

Cheers,

    Albert.

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
>
> I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> for my sequences. Could you help me?
>
> 1.  Codeml could produce different Ka/Ks value if I run it twice. I check
> it both in command line and in Perl wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml;
>
> The input sequences are:
> >seq1
> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> >seq2
>
> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
>
> For command-line program, I used Codeml in PAML3.14, with specifications
> in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program four
> times.  The output are like below (from the output file). We could see that
> they are different from each other. they should be same or slightly
> different. Right? But they are NOT.  Weird!
>
> ----------------------------------------------------------------------------------------------------------------------------------
> t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
>
> ----------------------------------------------------------------------------------------------------------------------------------
> I found the same problem when I use the Perl Wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> similar to the one in BioPerl HOWTO).
>
> 2. Another strange thing is, if I switch to use program YN00 in the
> package of PAML, the output are stable. However, it's much different from
> Codeml. (see below)
>
> ----------------------------------------------------------------------------------------------------------------------------------
> seq. seq.     S       N        t   kappa    omega      dN +- SE
> dS +- SE
>    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> 2.1300 +- 1.2272
>
> ----------------------------------------------------------------------------------------------------------------------------------
> Why like this? Which one I should believe?
>
>
> Is there any guy who would kindly help me to run the perl script (twice to
> check whether they are different)? or help to run the codeml in command
> line?
> I don't know whether there is anyone noticed this before, or because of
> the wrong version of PAML.
>
> Regards,
>
> Xianjun
>
>
>
> Himanshu Ardawatia wrote:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
>
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::Tools::Run::Alignment::Clustalw;
>
> # for projecting alignments from protein to R/DNA space
> use Bio::Align::Utilities qw(aa_to_dna_aln);
>
> # for input of the sequence data
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>
> #my $seqdata = 'chuck.fa';
> my $seqdata = 'xianjun.fa ';
>
> my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>                            -format => 'fasta');
> my %seqs;
> my @prots;
>
> my $output;
> # process each sequence
> while( my $seq = $seqIO->next_seq ) {
>     $seqs{$seq->display_id} = $seq;
>     # translate them into protein
>     my $protein = $seq->translate();
>     my $pseq = $protein->seq();
>     if( $pseq =~ /\*/ &&
>     $pseq !~ /\*$/ ) {
>     warn("provided a cDNA sequence with a stop codon, PAML will choke!");
>     exit(0);
>     }
>     # Tcoffee can't handle '*' even if it is trailing
>     $pseq =~ s/\*//g;
>     $protein->seq($pseq);
>     push @prots, $protein;
> }
>
> if( @prots < 2 ) {
>     warn("Need at least 2 cDNA sequences to proceed");
>     exit(0);
> }
>
> open(OUT, ">align_output.txt") ||
>       die("cannot open output $output for writing");
> # Align the sequences with clustalw
>
> my $aa_aln = $aln_factory->align(\@prots);
>
> # project the protein alignment back to cDNA coordinates
> my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>
> my @each = $dna_aln->each_seq();
>
> my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>                   ( -params => { 'runmode' => -2,
>                          'seqtype' => 1,
>                  'model' => 1,
>                 }
>               );
>
> # set the alignment object
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map {
>     my $c= 1;
>     foreach my $s ( @each ) {
>     last if( $s->display_id eq $_->display_id );
>     $c++;
>     }
>     $c;
> } @otus;
>
> print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> CDNA_PERCENTID)), "\n";
> for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print OUT join("\t",
>                $otus[$i]->display_id,
>                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                $MLmatrix->[$i]->[$j]->{'dS'},
>                $MLmatrix->[$i]->[$j]->{'omega'},
>                sprintf("%.2f",$sub_aa_aln->percentage_identity),
>                sprintf("%.2f",$sub_dna_aln->percentage_identity),
>                ), "\n";
>     }
> }
>
>
> On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> >
> > Hi Xianjun,
> >
> > I recognize this script. But it was a bit cumbersom to use this as many
> > things are done in the script (like multiple alignment, aa to dna alignment
> > and ka/ks calculation) so one does not have real control on these different
> > aspect.
> > I do not remeber getting different Ka/Ks in different runs though. But I
> > remeber that one I ran the script with different versions of clustalw and it
> > REALLY gave different results !! So please make sure if the clustalw
> > versions are the same in all your runs. Best is to use the latest version.
> >
> > Finally I wrote my simple script which would generate a codeml.ctl file
> > for each set of sequences and run the codeml based on that and then more on.
> > Disadvantage of this can be that some files keep getting over-written (like
> > the one which have their names hard-coded in codeml program) and if one
> > needs those files as well then one needs to run the codeml cycles for each
> > set of sequences in different directories.
> >
> > One advantage of this kind of script is that you can use whichever
> > alignment program you want to use and so on....But then its also extra steps
> > of yourself doing multiple alignment and aa to dna alignment etc....
> >
> > Does it make sense? If you still get different outputs with same version
> > of clustalw then I can sit with you and look at things together. Or else try
> > the script method which I mentioned.
> >
> > Cheers  and Fu
> > Himanshu
> > \\
> > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > >
> > > HI, Himanshu
> > >
> > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > question
> > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > not
> > > stable(different for each runtime), and also different from the output
> > >
> > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > >
> > > Here I attached the script. Could you help to have a look and try to
> > > run
> > > the script? How is your way to calculate the Kaks ratio?
> > >
> > > Thanks
> > >
> > > --
> > > ---------------------------
> > > Sterding (Xianjun) Dong
> > > PhD student, Boris Lenhard's group
> > > Bergen Center of Computational Science
> > > Bergen University, Norway
> > > Mobile: 0047-47361688
> > > Telephone: 0047-55276381
> > > Skype: xianjun.dong
> > >
> > >
> > >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From Xianjun.Dong at bccs.uib.no  Tue May 29 09:30:09 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 15:30:09 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>	
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
Message-ID: <465C2AE1.30101@ii.uib.no>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/532a333d/attachment.html 

From avilella at gmail.com  Tue May 29 09:45:28 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:45:28 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2AE1.30101@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no>
Message-ID: <358f4d650705290645s65f596cbp37715f12064a5ced@mail.gmail.com>

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  Thanks for information, Albert.
>
> But still in two questions:
> Albert Vilella wrote:
>
> codeml in PAML can give different results in cases where the optimization
> reaches different local maxima depending on the different starting points of
> each run (seed values). So, at least for some methods and options, this
> instability is inherent to the underlying algorithm.
>
> 1. How to set the initial value in order to get a reasonable estimation?
> Do you have some experience for that?
>

People usually change the initial omega in the conf. For example, 3 runs
with 0.001, 1 and 5.

Even more, for some methods and options, it is even recommended in PAML
> documentation to run the same data more than once, to see if the results are
> the same, which would be a good indication that the model is robust given
> the data.
>
> 2. Is there a recommend way to test the significance if the results are
> different? For example, in my case, dS could range from 10.1852 to 14.9961for the four runtime. If that means the model is not robust(how to check
> this?), should I change to use another model?
>

I would prefer PAML's author to answer this question :)

How could YN00 reach stable result? (Is it because YN00 does not require
> initial value for optimization?) Why could YN00 produce so different result
> from Codeml? (for YN00, dS=2.1300 with SE=1.2272; for Codeml, dS=
> 10.1852-14.9961)
>

I think Yn00 is less prone to give different local maxima than some codeml
models, but then, codeml is better in giving true positives in cases where
yn00 will give false negatives...

Maybe PAML's author can give a more specific answer for your data at:
> http://www.rannala.org/gsf/viewforum.php?f=1
>
>
> Actually I already post my question in the author's forum. Let's wait and
> see.
>

Yes, I would wait for his answers, which should be way more reliable than
mine :)

Cheers,
>
>     Albert.
>
> On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
> >
> > HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
> >
> > I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> > for my sequences. Could you help me?
> >
> > 1.  Codeml could produce different Ka/Ks value if I run it twice. I
> > check it both in command line and in Perl wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml;
> >
> > The input sequences are:
> > >seq1
> > TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> > >seq2
> >
> > TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
> >
> > For command-line program, I used Codeml in PAML3.14, with specifications
> > in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program
> > four times.  The output are like below (from the output file). We could see
> > that they are different from each other. they should be same or slightly
> > different. Right? But they are NOT.  Weird!
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> > t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> > t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> > t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > I found the same problem when I use the Perl Wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> > similar to the one in BioPerl HOWTO).
> >
> > 2. Another strange thing is, if I switch to use program YN00 in the
> > package of PAML, the output are stable. However, it's much different from
> > Codeml. (see below)
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > seq. seq.     S       N        t   kappa    omega      dN +- SE
> > dS +- SE
> >    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> > 2.1300 +- 1.2272
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > Why like this? Which one I should believe?
> >
> >
> > Is there any guy who would kindly help me to run the perl script (twice
> > to check whether they are different)? or help to run the codeml in command
> > line?
> > I don't know whether there is anyone noticed this before, or because of
> > the wrong version of PAML.
> >
> > Regards,
> >
> > Xianjun
> >
> >
> >
> > Himanshu Ardawatia wrote:
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> >
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::Tools::Run::Alignment::Clustalw;
> >
> > # for projecting alignments from protein to R/DNA space
> > use Bio::Align::Utilities qw(aa_to_dna_aln);
> >
> > # for input of the sequence data
> > use Bio::SeqIO;
> > use Bio::AlignIO;
> >
> > my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
> >
> > #my $seqdata = 'chuck.fa';
> > my $seqdata = 'xianjun.fa ';
> >
> > my $seqIO = new Bio::SeqIO(-file   => $seqdata,
> >                            -format => 'fasta');
> > my %seqs;
> > my @prots;
> >
> > my $output;
> > # process each sequence
> > while( my $seq = $seqIO->next_seq ) {
> >     $seqs{$seq->display_id} = $seq;
> >     # translate them into protein
> >     my $protein = $seq->translate();
> >     my $pseq = $protein->seq();
> >     if( $pseq =~ /\*/ &&
> >     $pseq !~ /\*$/ ) {
> >     warn("provided a cDNA sequence with a stop codon, PAML will
> > choke!");
> >     exit(0);
> >     }
> >     # Tcoffee can't handle '*' even if it is trailing
> >     $pseq =~ s/\*//g;
> >     $protein->seq($pseq);
> >     push @prots, $protein;
> > }
> >
> > if( @prots < 2 ) {
> >     warn("Need at least 2 cDNA sequences to proceed");
> >     exit(0);
> > }
> >
> > open(OUT, ">align_output.txt") ||
> >       die("cannot open output $output for writing");
> > # Align the sequences with clustalw
> >
> > my $aa_aln = $aln_factory->align(\@prots);
> >
> > # project the protein alignment back to cDNA coordinates
> > my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
> >
> > my @each = $dna_aln->each_seq();
> >
> > my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
> >                   ( -params => { 'runmode' => -2,
> >                          'seqtype' => 1,
> >                  'model' => 1,
> >                 }
> >               );
> >
> > # set the alignment object
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >     last if( $s->display_id eq $_->display_id );
> >     $c++;
> >     }
> >     $c;
> > } @otus;
> >
> > print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> > CDNA_PERCENTID)), "\n";
> > for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
> >     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print OUT join("\t",
> >                $otus[$i]->display_id,
> >                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                $MLmatrix->[$i]->[$j]->{'dS'},
> >                $MLmatrix->[$i]->[$j]->{'omega'},
> >                sprintf("%.2f",$sub_aa_aln->percentage_identity),
> >                sprintf("%.2f",$sub_dna_aln->percentage_identity),
> >                ), "\n";
> >     }
> > }
> >
> >
> > On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> > >
> > > Hi Xianjun,
> > >
> > > I recognize this script. But it was a bit cumbersom to use this as
> > > many things are done in the script (like multiple alignment, aa to dna
> > > alignment and ka/ks calculation) so one does not have real control on these
> > > different aspect.
> > > I do not remeber getting different Ka/Ks in different runs though. But
> > > I remeber that one I ran the script with different versions of clustalw and
> > > it REALLY gave different results !! So please make sure if the clustalw
> > > versions are the same in all your runs. Best is to use the latest version.
> > >
> > > Finally I wrote my simple script which would generate a codeml.ctlfile for each set of sequences and run the codeml based on that and then
> > > more on. Disadvantage of this can be that some files keep getting
> > > over-written (like the one which have their names hard-coded in codeml
> > > program) and if one needs those files as well then one needs to run the
> > > codeml cycles for each set of sequences in different directories.
> > >
> > > One advantage of this kind of script is that you can use whichever
> > > alignment program you want to use and so on....But then its also extra steps
> > > of yourself doing multiple alignment and aa to dna alignment etc....
> > >
> > > Does it make sense? If you still get different outputs with same
> > > version of clustalw then I can sit with you and look at things together. Or
> > > else try the script method which I mentioned.
> > >
> > > Cheers  and Fu
> > > Himanshu
> > > \\
> > > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > > >
> > > > HI, Himanshu
> > > >
> > > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > > question
> > > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > > not
> > > > stable(different for each runtime), and also different from the
> > > > output
> > > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > > >
> > > > Here I attached the script. Could you help to have a look and try to
> > > > run
> > > > the script? How is your way to calculate the Kaks ratio?
> > > >
> > > > Thanks
> > > >
> > > > --
> > > > ---------------------------
> > > > Sterding (Xianjun) Dong
> > > > PhD student, Boris Lenhard's group
> > > > Bergen Center of Computational Science
> > > > Bergen University, Norway
> > > > Mobile: 0047-47361688
> > > > Telephone: 0047-55276381
> > > > Skype: xianjun.dong
> > > >
> > > >
> > > >
> > >
> >
> > --
> > ---------------------------
> > Sterding (Xianjun) Dong
> > PhD student, Boris Lenhard's group
> > Bergen Center of Computational Science
> > Bergen University, Norway
> > Mobile: 0047-47361688
> > Telephone: 0047-55276381
> >
> > Skype: xianjun.dong
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>

From roy at colibase.bham.ac.uk  Tue May 29 10:05:12 2007
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Tue, 29 May 2007 15:05:12 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <465C3318.5080201@colibase.bham.ac.uk>

Hi Xianjun,

I'm not sure if it is the cause of your problem, but your sequences seem
to be quite short. This paper:
http://mbe.oxfordjournals.org/cgi/content/full/21/12/2290

suggests that the codeml method of calculating Ka and Ks may be
unreliable for sequences shorter than 300 codons.

Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.


From gbr0wn at comcast.net  Wed May 30 11:44:13 2007
From: gbr0wn at comcast.net (gbr0wn at comcast.net)
Date: Wed, 30 May 2007 15:44:13 +0000
Subject: [Bioperl-l] getting started in windows
Message-ID: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070530/2f640e16/attachment.pl 

From golharam at umdnj.edu  Wed May 30 11:40:28 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 30 May 2007 11:40:28 -0400
Subject: [Bioperl-l] ClustalW Score?
Message-ID: <00c201c7a2d0$d971f550$2d01a8c0@PICO>

How do I get the clustalw score from a clustalw alignment?  I'm using the
following code to align my sequences:

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();

$seq[0] = ...
$seq[1] = ...
$seq[2] = ...
$seq[3] = ...

$aln = $aln_factory->align(\@seq);

I can get the percentage identity from the Bio::SimpleAlign object, but
there is no score.  I looked into it further and it doesn't look like the
score is being captured anywhere.  So, how does one get the score from
ClustalW using this method?

Ryan


From barry.moore at genetics.utah.edu  Wed May 30 12:21:16 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 30 May 2007 10:21:16 -0600
Subject: [Bioperl-l] getting started in windows
In-Reply-To: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
References: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
Message-ID: <CA090066-0624-4C52-8306-E783278484B0@genetics.utah.edu>

Try opening up a terminal window (I think you'll find that under  
accessories).  Change to the directory where you code is and run it  
off the command line.

B

On May 30, 2007, at 9:44 AM, gbr0wn at comcast.net wrote:

> I am a perl novice trying to run perl 5.8.8 on windows xp system.   
> I have used 'wordpad' to paste tutorial code into an executable  
> file and when I double click the icon for the file a window opens  
> up briefly with output and/or error message but closes too fast for  
> me to read.  Any idea why this might be happening?
> Thanks, Greg Brown - gbr0wn at comcast.net
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed May 30 13:16:49 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 10:16:49 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
References: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
Message-ID: <1A4207F8295607498283FE9E93B775B403349DAB@EX02.asurite.ad.asu.edu>

> How do I get the clustalw score from a clustalw alignment?  
> I'm using the following code to align my sequences:
> 
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> 
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
> 
> $aln = $aln_factory->align(\@seq);
> 
> I can get the percentage identity from the Bio::SimpleAlign 
> object, but there is no score.  I looked into it further and 
> it doesn't look like the score is being captured anywhere.  
> So, how does one get the score from ClustalW using this method?


        open(OUTCOPY, ">&STDOUT")  or die "Couldn't dup STDOUT: $!";
        open(STDOUT,  ">log.test") or die "Couldn't open log.test: $!";
        push @aln, $factory->align(\@seq);
        close STDOUT;
        open(STDOUT, ">&OUTCOPY");
        open(TEMP,   "log.test");
        while (<TEMP>)
        {

                if ($_ =~ /Score:(\d+)/)
                {
                        $aln->score($1);
                        print "Found score of $1\n";
                }
        }
        close TEMP;
        unlink("log.test");


From jason at bioperl.org  Wed May 30 14:54:20 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 May 2007 11:54:20 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
Message-ID: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>

You can do it without redirecting STDOUT or creating a new file, just  
change the system call to:

Here is the code for running in _run in the module:
    my $commandstring = $self->executable."$instring"."$param_string";
     $self->debug( "clustal command = $commandstring");
     my $status = system($commandstring);
      unless( $status == 0 ) {
           $self->warn( "Clustalw call ($commandstring) crashed: $?  
\n");
           return undef;
      }

Do something like:

my $fh;
open($fh, "$commandstring |");
my $score;
while(<$fh>) {
   $score = $1 if ($_ =~ /Score:(\d+)/);
}
close($fh);

... then at the bottom after the alignment is created do:

$aln->score($score);


There may be some more debugging b/c if you invoke the quiet => 1  
parameter there may be an automatic ">& /dev/null" appended to the  
end of the parameter string that you'll need to figure out how to  
override.

Sorry I don't have more time to help; I hope this gets you started.

-jason
On May 30, 2007, at 10:18 AM, Ryan Golhar wrote:

> Did you see Kevin's response?  That's one possible solution that  
> could be
> implemented...
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Wednesday, May 30, 2007 12:05 PM
> To: golharam at umdnj.edu
> Subject: Re: [Bioperl-l] ClustalW Score?
>
>
> Nope it isn't parsed since it is part of the STDOUT from the  
> program not the
> alignment.  If you want to add parsing of the STDOUT from Clustalw  
> someone
> will need to refactor how the program is run and capture and parse the
> STDOUT. The score can be added to the score field of the  
> SimpleAlign object,
> but again since there is no where for it to be stored in a clustalw
> alignment file it won't be round tripped anywhere. I think  
> stockholm will
> manage it for you though.
>
> Do you know what the score represents - can it be computed from the
> alignment itsself?
>
> -jason
>
> On May 30, 2007, at 8:40 AM, Ryan Golhar wrote:
>
>
> How do I get the clustalw score from a clustalw alignment?  I'm  
> using the
> following code to align my sequences:
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
>
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
>
> $aln = $aln_factory->align(\@seq);
>
> I can get the percentage identity from the Bio::SimpleAlign object,  
> but
> there is no score.  I looked into it further and it doesn't look  
> like the
> score is being captured anywhere.  So, how does one get the score from
> ClustalW using this method?
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Wed May 30 15:52:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 12:52:01 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403349E4D@EX02.asurite.ad.asu.edu>

> You can do it without redirecting STDOUT or creating a new 
> file, just change the system call to:
> 
> Here is the code for running in _run in the module:
>     my $commandstring = $self->executable."$instring"."$param_string";
>      $self->debug( "clustal command = $commandstring");
>      my $status = system($commandstring);
>       unless( $status == 0 ) {
>            $self->warn( "Clustalw call ($commandstring) crashed: $?  
> \n");
>            return undef;
>       }
> 
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/); } close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet 
> => 1 parameter there may be an automatic ">& /dev/null" 
> appended to the end of the parameter string that you'll need 
> to figure out how to override.
> 
> Sorry I don't have more time to help; I hope this gets you started.

I did it my way as I was doing it without modifying the Bioperl code (in
case I later updated to a new version and forgot about the changes I had
put into it).  So that code just sits in my perl script where it calls
the Bioperl module to create the Clustal alignment object.


From Xianjun.Dong at bccs.uib.no  Tue May 29 11:02:21 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 17:02:21 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2F8E.2070309@ed.ac.uk>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>		<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>		<465C1533.6070900@ii.uib.no>	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no> <465C2F8E.2070309@ed.ac.uk>
Message-ID: <465C407D.608@ii.uib.no>

HI, Darren

The sequences are from Human and zebrafish. I currently use two 
sequences. I just want to see what's the substitution pattern there is. 
But your comment remind me whether I should get the other species 
involved, like mouse, chicken.

BTW, what's you mean 'per codon, not per site'? Do you mean the Ds(Ks) 
of Codeml is for per codon, and Yn00 is for per site?
I think there should be a possible/reasonable way to calculate the 
synonymous substitution, even if the divergence is big enough. If the 
Codeml is not a good solution for that case, do you have better suggestion?

Thanks

Xianjun

Darren Obbard wrote:
> Out of interest, what are the species, and how much sequence are you 
> using?
>
> - Estimating Ds when it is >>1 is very hard anyway, since the 
> substitutions are saturated. i.e. Regardless of the method, there will 
> be some level of divergence for which Ds can no longer be estimated. A 
> Ds of ~14 (for PAML I think this is per codon, not per site) sounds 
> very high to me - higher than I would want to try to estimate Ds.
>
> Dong Xianjun wrote:
>> Thanks for information, Albert.
>>
>> But still in two questions:
>> Albert Vilella wrote:
>>> codeml in PAML can give different results in cases where the 
>>> optimization reaches different local maxima depending on the 
>>> different starting points of each run (seed values). So, at least 
>>> for some methods and options, this instability is inherent to the 
>>> underlying algorithm.
>> 1. How to set the initial value in order to get a reasonable 
>> estimation? Do you have some experience for that?
>>> Even more, for some methods and options, it is even recommended in 
>>> PAML documentation to run the same data more than once, to see if 
>>> the results are the same, which would be a good indication that the 
>>> model is robust given the data.
>> 2. Is there a recommend way to test the significance if the results 
>> are different? For example, in my case, dS could range from 10.1852 
>> to 14.9961 for the four runtime. If that means the model is not 
>> robust(how to check this?), should I change to use another model?
>>
>> How could YN00 reach stable result? (Is it because YN00 does not 
>> require initial value for optimization?) Why could YN00 produce so 
>> different result from Codeml? (for YN00, dS=2.1300 with SE=1.2272; 
>> for Codeml, dS=10.1852-14.9961)
>>> Maybe PAML's author can give a more specific answer for your data at:
>>> http://www.rannala.org/gsf/viewforum.php?f=1
>>
>> Actually I already post my question in the author's forum. Let's wait 
>> and see.
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> On 5/29/07, *Dong Xianjun* <Xianjun.Dong at bccs.uib.no 
>>> <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>
>>>     HI, dear all, //sorry for duplicated msg for /Jason/ and /Neil/
>>>
>>>     I'm bothering by two problems when I use PAML module to calculate
>>>     Ka/Ks for my sequences. Could you help me?
>>>
>>>     1.  Codeml could produce different Ka/Ks value if I run it twice.
>>>     I check it both in command line and in Perl wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml;
>>>
>>>     The input sequences are:
>>>     >seq1
>>>     
>>> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG 
>>>
>>>     >seq2
>>>     
>>> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG 
>>>
>>>
>>>     For command-line program, I used Codeml in PAML3.14, with
>>>     specifications in codeml.ctl (runmode = -2, seqtype = 1). I tried
>>>     to run the program four times.  The output are like below (from
>>>     the output file). We could see that they are different from each
>>>     other. they should be same or slightly different. Right? But they
>>>     are NOT.  Weird!
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522     
>>> dS=14.8339
>>>     t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507     
>>> dS=12.2349
>>>     t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510     
>>> dS=14.9961
>>>     t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505     
>>> dS=10.1852
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     I found the same problem when I use the Perl Wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script
>>>     here, similar to the one in BioPerl HOWTO).
>>>
>>>     2. Another strange thing is, if I switch to use program YN00 in
>>>     the package of PAML, the output are stable. However, it's much
>>>     different from Codeml. (see below)
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     seq. seq.     S       N        t   kappa    omega      dN +- SE  
>>>            dS +- SE
>>>        2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +-
>>>     0.0265  2.1300 +- 1.2272
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     Why like this? Which one I should believe?
>>>
>>>
>>>     Is there any guy who would kindly help me to run the perl script
>>>     (twice to check whether they are different)? or help to run the
>>>     codeml in command line?
>>>     I don't know whether there is anyone noticed this before, or
>>>     because of the wrong version of PAML.
>>>
>>>     Regards,
>>>
>>>     Xianjun
>>>
>>>
>>>
>>>     Himanshu Ardawatia wrote:
>>>>     #!/usr/bin/perl
>>>>
>>>>     use strict;
>>>>     use warnings;
>>>>
>>>>
>>>>     use Bio::Tools::Run::Phylo::PAML::Codeml;
>>>>     use Bio::Tools::Run::Alignment::Clustalw;
>>>>
>>>>     # for projecting alignments from protein to R/DNA space
>>>>     use Bio::Align::Utilities qw(aa_to_dna_aln);
>>>>
>>>>     # for input of the sequence data
>>>>     use Bio::SeqIO;
>>>>     use Bio::AlignIO;
>>>>
>>>>     my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>>>>
>>>>     #my $seqdata = 'chuck.fa';
>>>>     my $seqdata = 'xianjun.fa ';
>>>>
>>>>     my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>>>>                                -format => 'fasta');
>>>>     my %seqs;
>>>>     my @prots;
>>>>
>>>>     my $output;
>>>>     # process each sequence
>>>>     while( my $seq = $seqIO->next_seq ) {
>>>>         $seqs{$seq->display_id} = $seq;
>>>>         # translate them into protein
>>>>         my $protein = $seq->translate();
>>>>         my $pseq = $protein->seq();
>>>>         if( $pseq =~ /\*/ &&
>>>>         $pseq !~ /\*$/ ) {
>>>>         warn("provided a cDNA sequence with a stop codon, PAML will
>>>>     choke!");
>>>>         exit(0);
>>>>         }
>>>>         # Tcoffee can't handle '*' even if it is trailing
>>>>         $pseq =~ s/\*//g;
>>>>         $protein->seq($pseq);
>>>>         push @prots, $protein;
>>>>     }
>>>>
>>>>     if( @prots < 2 ) {
>>>>         warn("Need at least 2 cDNA sequences to proceed");
>>>>         exit(0);
>>>>     }
>>>>
>>>>     open(OUT, ">align_output.txt") ||
>>>>           die("cannot open output $output for writing");
>>>>     # Align the sequences with clustalw
>>>>
>>>>     my $aa_aln = $aln_factory->align(\@prots);
>>>>
>>>>     # project the protein alignment back to cDNA coordinates
>>>>     my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>>>>
>>>>     my @each = $dna_aln->each_seq();
>>>>
>>>>     my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>>>>                       ( -params => { 'runmode' => -2,
>>>>                              'seqtype' => 1,
>>>>                      'model' => 1,
>>>>                     }
>>>>                   );
>>>>
>>>>     # set the alignment object
>>>>     $kaks_factory->alignment($dna_aln);
>>>>
>>>>     # run the KaKs analysis
>>>>     my ($rc,$parser) = $kaks_factory->run();
>>>>     my $result = $parser->next_result;
>>>>     my $MLmatrix = $result->get_MLmatrix();
>>>>
>>>>     my @otus = $result->get_seqs();
>>>>     # this gives us a mapping from the PAML order of sequences back to
>>>>     # the input order (since names get truncated)
>>>>     my @pos = map {
>>>>         my $c= 1;
>>>>         foreach my $s ( @each ) {
>>>>         last if( $s->display_id eq $_->display_id );
>>>>         $c++;
>>>>         }
>>>>         $c;
>>>>     } @otus;
>>>>
>>>>     print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
>>>>     CDNA_PERCENTID)), "\n";
>>>>     for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>>>>         for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>>>>         my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         print OUT join("\t",                    $otus[$i]->display_id,
>>>>                    
>>>> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>>>>                    $MLmatrix->[$i]->[$j]->{'dS'},
>>>>                    $MLmatrix->[$i]->[$j]->{'omega'},
>>>>                    sprintf("%.2f",$sub_aa_aln->percentage_identity),
>>>>                    sprintf("%.2f",$sub_dna_aln->percentage_identity),
>>>>                    ), "\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>     On 5/29/07, *Himanshu Ardawatia* <himanshu.ardawatia at bccs.uib.no
>>>>     <mailto:himanshu.ardawatia at bccs.uib.no>> wrote:
>>>>
>>>>         Hi Xianjun,
>>>>
>>>>         I recognize this script. But it was a bit cumbersom to use
>>>>         this as many things are done in the script (like multiple
>>>>         alignment, aa to dna alignment and ka/ks calculation) so one
>>>>         does not have real control on these different aspect.
>>>>         I do not remeber getting different Ka/Ks in different runs
>>>>         though. But I remeber that one I ran the script with
>>>>         different versions of clustalw and it REALLY gave different
>>>>         results !! So please make sure if the clustalw versions are
>>>>         the same in all your runs. Best is to use the latest version.
>>>>
>>>>         Finally I wrote my simple script which would generate a
>>>>         codeml.ctl file for each set of sequences and run the codeml
>>>>         based on that and then more on. Disadvantage of this can be
>>>>         that some files keep getting over-written (like the one
>>>>         which have their names hard-coded in codeml program) and if
>>>>         one needs those files as well then one needs to run the
>>>>         codeml cycles for each set of sequences in different
>>>>         directories.
>>>>
>>>>         One advantage of this kind of script is that you can use
>>>>         whichever alignment program you want to use and so on....But
>>>>         then its also extra steps of yourself doing multiple
>>>>         alignment and aa to dna alignment etc....
>>>>
>>>>         Does it make sense? If you still get different outputs with
>>>>         same version of clustalw then I can sit with you and look at
>>>>         things together. Or else try the script method which I
>>>>         mentioned.
>>>>
>>>>         Cheers  and Fu
>>>>         Himanshu
>>>>         \\
>>>>
>>>>         On 5/28/07, *Dong Xianjun* < Xianjun.Dong at bccs.uib.no
>>>>         <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>>
>>>>             HI, Himanshu
>>>>
>>>>             I am sure you did some work in Ka/Ks calculation. Here I
>>>>             have a question
>>>>             bothering me; the output for
>>>>             Bio::Tools::Run::Phylo::PAML::Codeml is not
>>>>             stable(different for each runtime), and also different
>>>>             from the output
>>>>             with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
>>>>
>>>>             Here I attached the script. Could you help to have a
>>>>             look and try to run
>>>>             the script? How is your way to calculate the Kaks ratio?
>>>>
>>>>             Thanks
>>>>
>>>>             --
>>>>             ---------------------------
>>>>             Sterding (Xianjun) Dong
>>>>             PhD student, Boris Lenhard's group
>>>>             Bergen Center of Computational Science
>>>>             Bergen University, Norway
>>>>             Mobile: 0047-47361688
>>>>             Telephone: 0047-55276381
>>>>             Skype: xianjun.dong
>>>>
>>>>
>>>>
>>>>
>>>
>>>     --     ---------------------------
>>>     Sterding (Xianjun) Dong
>>>     PhD student, Boris Lenhard's group
>>>     Bergen Center of Computational Science
>>>     Bergen University, Norway
>>>     Mobile: 0047-47361688
>>>     Telephone: 0047-55276381
>>>
>>>     Skype: xianjun.dong
>>>        
>>>
>>>     _______________________________________________
>>>     Bioperl-l mailing list
>>>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> ---------------------------
>> Sterding (Xianjun) Dong
>> PhD student, Boris Lenhard's group
>> Bergen Center of Computational Science
>> Bergen University, Norway
>> Mobile: 0047-47361688
>> Telephone: 0047-55276381
>> Skype: xianjun.dong
>>   
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
---------------------------
Sterding (Xianjun) Dong
PhD student, Boris Lenhard's group
Bergen Center of Computational Science
Bergen University, Norway
Mobile: 0047-47361688
Telephone: 0047-55276381
Skype: xianjun.dong


From bix at sendu.me.uk  Thu May 31 04:34:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 09:34:38 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E889E.3090304@sendu.me.uk>

Jason Stajich wrote:
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/);
> }
> close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet => 1  
> parameter there may be an automatic ">& /dev/null" appended to the  
> end of the parameter string that you'll need to figure out how to  
> override.

Is there any particular reason for not having something along these 
lines committed to the module? Shall I go ahead and implement?

From bix at sendu.me.uk  Thu May 31 05:54:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 10:54:32 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E9B58.1020403@sendu.me.uk>

Jason Stajich wrote:
>    $score = $1 if ($_ =~ /Score:(\d+)/);

I see that there are lots of lines in the output that match the above 
regex, but there is also a single /Alignment Score (\d+)/ line printed 
at the end. Isn't that the score that should get stored in $aln->score()?


From jason at bioperl.org  Thu May 31 14:08:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 31 May 2007 11:08:19 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <465E9B58.1020403@sendu.me.uk>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
	<465E9B58.1020403@sendu.me.uk>
Message-ID: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>

you're right --- it is not really my code, I was just elaborating  
Kevin's example --- it would probably need to be more specific or  
perhaps the last Score seen is sufficient for what one is trying to  
capture?

-j
On May 31, 2007, at 2:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>    $score = $1 if ($_ =~ /Score:(\d+)/);
>
> I see that there are lots of lines in the output that match the  
> above regex, but there is also a single /Alignment Score (\d+)/  
> line printed at the end. Isn't that the score that should get  
> stored in $aln->score()?
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Thu May 31 14:15:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 31 May 2007 11:15:38 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO><DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org><465E9B58.1020403@sendu.me.uk>
	<49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B40334A01A@EX02.asurite.ad.asu.edu>

> you're right --- it is not really my code, I was just 
> elaborating Kevin's example --- it would probably need to be 
> more specific or perhaps the last Score seen is sufficient 
> for what one is trying to capture?

I took that code from a pairwise clustal alignment script that I wrote
to deal with aligning a bunch of short sequences against a long one to
see where they line up at.  When all of them were fed to Clustal the
short sequences all ended up aligned to each other and not well aligned
to the longer sequence.  I only saw one score in the output from the
pairwise, so that is what I used to find a reasonable value.


From shameer at ncbs.res.in  Tue May  1 07:36:31 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 17:06:31 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
Message-ID: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>

Dear All,

I am trying to impliment a bioperl based program to generate a dynamic,
clickable image. I have used Dr. Lincoln Steins's code provided in
example3 at this URL :
http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
be perfect for my purpose.

I need to add few modifications to the image. I reffered the Bio::Graphics
HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
(may be am missing something imp.. ? )  but I couldnt get a quick
solution, Thought I will ask about it to the experts for some tips and
tricks.

This is what I am looking for :

1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
changed according to length of the sequence. My sequence length is usually
in a range of 70 - 200.

2. I also need to make the image interactive / clickable on the various
blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
used instead of name of the blast hits)


Many thanks in advance for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 12:04:13 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:13 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42403.192.168.1.1.1178035453.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 12:04:11 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:11 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From cain at cshl.edu  Tue May  1 10:04:09 2007
From: cain at cshl.edu (Scott Cain)
Date: Tue, 01 May 2007 10:04:09 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
Message-ID: <1178028249.2644.13.camel@localhost.localdomain>

Hi Shameer,

There is a fair amount of documentation in the perldoc for
Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
you read that?  Also, for changing the scale, that should happen
automatically--have you tried yet?

Scott


On Tue, 2007-05-01 at 17:06 +0530, Shameer Khadar wrote:
> Dear All,
> 
> I am trying to impliment a bioperl based program to generate a dynamic,
> clickable image. I have used Dr. Lincoln Steins's code provided in
> example3 at this URL :
> http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
> be perfect for my purpose.
> 
> I need to add few modifications to the image. I reffered the Bio::Graphics
> HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
> (may be am missing something imp.. ? )  but I couldnt get a quick
> solution, Thought I will ask about it to the experts for some tips and
> tricks.
> 
> This is what I am looking for :
> 
> 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
> changed according to length of the sequence. My sequence length is usually
> in a range of 70 - 200.
> 
> 2. I also need to make the image interactive / clickable on the various
> blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
> used instead of name of the blast hits)
> 
> 
> Many thanks in advance for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/f84a3220/attachment-0002.bin>

From cjfields at uiuc.edu  Tue May  1 13:10:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 12:10:10 -0500
Subject: [Bioperl-l] Pb makefile
In-Reply-To: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
References: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
Message-ID: <D975B11D-1303-4CF4-AE3B-878881964DB9@uiuc.edu>

Is there any reason you want to install bioperl 1.4 (which is over 3  
yrs old)?  The latest is v.1.5.2 (Dec. 2006); man page generation has  
been fixed for that version, which uses Module::Build.

The man page generation was turned off prior to 1.4, though I may be  
wrong.  Based on the Extutils::MakeMaker FAQ you should be able to  
prevent man page generation this way:

perl Makefile.PL INSTALLMAN1DIR=none INSTALLMAN3DIR=none

chris

On Apr 30, 2007, at 5:35 AM, Francoise.LECOMTE at biogemma.com wrote:

> Hi
> I try to install biopoerl1.4 on Tru64 plateform and I've got a message
> "make:line too long" when I run the command make install
> How can I solve it ? How disable man pages installaton in  
> Makefile.PL if
> it can sove this problem
>
> Best regards
>
> Fran?oise Lecomte


From cain.cshl at gmail.com  Tue May  1 15:50:42 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 01 May 2007 15:50:42 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <1178049042.2644.36.camel@localhost.localdomain>

Perhaps if you provided some code and sample data we might be able to
help you better.

Scott


On Tue, 2007-05-01 at 21:34 +0530, Shameer Khadar wrote:
> Dear Scot,
> 
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
> 
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
> 
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
> 
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>  to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
> 
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
> 
> Thanks a lot for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/9c655e4c/attachment-0002.bin>

From agathman at semo.edu  Tue May  1 19:10:20 2007
From: agathman at semo.edu (Gathman, Allen)
Date: Tue, 1 May 2007 18:10:20 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
Message-ID: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>

Hi, all --
 
I've been using BioPerl 1.4 for a while; recently I installed 1.5.2, and
found that scripts that had been using spliced_seq are now broken.  Any
thoughts on what might be going on? 
 
Here's a sample script:
 
*********************************************
 
#!/usr/bin/perl -w
 
use strict;
use Bio::DB::GFF;

my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
                               -dsn        =>
'dbi:mysql:database=cc;host=localhost',
                               -fasta      => '/gbrowse/databases/cc'
                               );
$db->add_aggregator('transcript{CDS/mRNA}');
my $seg=$db->segment('ccin_Contig120');
my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
 
for my $gene (@genes) {
    my $gid = $gene->display_id;
 
    print STDERR "Gene is $gid\n";
    my $splgene = $gene->spliced_seq();
}

********************************************
The line with "spliced_seq" in it crashes the program.  Here's the
STDERR output:
 
Gene is Jan06m400_GLEAN_11487

-------------------- WARNING ---------------------

MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
absolute set to 1 -- be warned you may not be getting things on the
correct strand

---------------------------------------------------

-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is
::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::,(0,881935
,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to
[Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170)Bio::Prim
arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308)Bio::PrimarySeq=HAS
H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH(0x881f4a
4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] which
does not look healthy

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359

STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258

STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210

STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484

STACK: Bio::SeqFeatureI::spliced_seq
/usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498

STACK: /transfer/testsplice.pl:20

-----------------------------------------------------------

Allen Gathman

http://cstl-csm.semo.edu/gathman

 
From cjfields at uiuc.edu  Tue May  1 20:27:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 19:27:46 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
In-Reply-To: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
References: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
Message-ID: <9F00B020-AFF0-40DB-9694-6061B5A11A73@uiuc.edu>

Can you file a bug on this?  Attach the script and maybe detail what  
data is loaded into your local MySQL database (if possible).

chris

On May 1, 2007, at 6:10 PM, Gathman, Allen wrote:

> Hi, all --
>
> I've been using BioPerl 1.4 for a while; recently I installed  
> 1.5.2, and
> found that scripts that had been using spliced_seq are now broken.   
> Any
> thoughts on what might be going on?
>
> Here's a sample script:
>
> *********************************************
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::DB::GFF;
>
> my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
>                                -dsn        =>
> 'dbi:mysql:database=cc;host=localhost',
>                                -fasta      => '/gbrowse/databases/cc'
>                                );
> $db->add_aggregator('transcript{CDS/mRNA}');
> my $seg=$db->segment('ccin_Contig120');
> my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
>
> for my $gene (@genes) {
>     my $gid = $gene->display_id;
>
>     print STDERR "Gene is $gid\n";
>     my $splgene = $gene->spliced_seq();
> }
>
> ********************************************
> The line with "spliced_seq" in it crashes the program.  Here's the
> STDERR output:
>
> Gene is Jan06m400_GLEAN_11487
>
> -------------------- WARNING ---------------------
>
> MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
> absolute set to 1 -- be warned you may not be getting things on the
> correct strand
>
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
>
> MSG: seq doesn't validate, mismatch is
> ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::, 
> (0,881935
> ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)
>
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
>
> MSG: Attempting to set the sequence to
> [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170) 
> Bio::Prim
> arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308) 
> Bio::PrimarySeq=HAS
> H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH 
> (0x881f4a
> 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)]  
> which
> does not look healthy
>
> STACK: Error::throw
>
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359
>
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258
>
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210
>
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484
>
> STACK: Bio::SeqFeatureI::spliced_seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498
>
> STACK: /transfer/testsplice.pl:20
>
> -----------------------------------------------------------
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From shameer at ncbs.res.in  Tue May  1 23:46:59 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 2 May 2007 09:16:59 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178049042.2644.36.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<1178049042.2644.36.camel@localhost.localdomain>
Message-ID: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>

Dear Scott,

Once thanks a lot for your inputs.

I am following same  data formats as in
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
The purpose is to display these results in a web page.

I am using the program in Stein's Bio::Graphics example
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

I need exactly same image as in
http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
only difference is I need the scale (0.1k - 0.9k) in a range of simple
1-XXX , here XXX depends on the length of the sequence input.

Many thanks for your help,


> Perhaps if you provided some code and sample data we might be able to
> help you better.
>
> Scott
>

-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed May  2 06:02:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 2 May 2007 06:02:48 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<1178049042.2644.36.camel@localhost.localdomain>
	<59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
Message-ID: <200705020602.48404.sdavis2@mail.nih.gov>

On Tuesday 01 May 2007 23:46, Shameer Khadar wrote:
> Dear Scott,
>
> Once thanks a lot for your inputs.
>
> I am following same  data formats as in
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
> Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
> blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
> The purpose is to display these results in a web page.

Do you have your data loaded into bioperl objects?  What code did you use for 
that (post that code)?

> I am using the program in Stein's Bio::Graphics example
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

Does this example run on your computer?  Have you been able to use the bioperl 
objects you created in the first step in the creation of a graphic?  If not, 
what have you tried (post the code) and any error messages.

> I need exactly same image as in
> http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
> only difference is I need the scale (0.1k - 0.9k) in a range of simple
> 1-XXX , here XXX depends on the length of the sequence input.

Again, what have you tried?  Posting code is helpful here, also.  

I'm not an expert in bioperl graphics, but it does really help those that know 
to see the code that you have written to know how best to help.  

Sean


From lzlgboy at gmail.com  Wed May  2 09:58:14 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 2 May 2007 21:58:14 +0800
Subject: [Bioperl-l] Extract CDS from CDNA given Protein SEQs
Message-ID: <d78b3d40705020658w1bee4c68s3058a63ef23c62a1@mail.gmail.com>

Hi ,everyone

   I got a task to extract cds sequences from cdna , and I have the protein
sequence for each cdna, what should I do?
   Should I try 3_frmae_translate? But how.
   Thanks.

-- 
??????
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From MEC at stowers-institute.org  Wed May  2 18:38:31 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 2 May 2007 17:38:31 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>

Lincoln, 
 
Here for your comment and review is a very reworked version of
Bio::Graphics::FeatureBase->gff3_string.
 
The main difference is to that homogenous children get ALL their
attributes except for start/stop from the parent, including the group.
I also provide option as to whether or now to "remove extraneous level
of parentage" called $preserveHomegenousParent.
 
There is an in-line comment and question for you in the code body.
 
It works well in my hands to my use cases, but, I'm not positive it is
in the spirit of your intentions.
 
Cheers,
 
Malcolm
 
 
sub gff3_string {
  my ($self, $recurse, $preserveHomegenousParent,
 
      # Note: the following parameters, whose name begins with '$_',
      # are intended for recursive call only.
 
      $_parent,
      $_self_is_hsf,  # is $self the child in a homogeneous parent/child
relationship?
      $_hsf_parentgroup, # if so, what is the group (GFF column 9) of
the parent
     ) = @_;
 
  # PURPOSE: Return GFF3 format for the feature $self.  Optionally
  # $recurse to include GFF for any subfeatures of the feature. If
  # recursing, provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have subfeatures all of whose types are the same as the
  # feature itself (the "homogenous parent/child" case). This usage is
  # a convention for representing discontiguous features; they may be
  # created by using the -segment directive without specifying a
  # distinct -subtype in to `new` when creating a
  # Bio::Graphics::FeatureBase (i.e.  Bio::DB::SeqFeature,
  # Bio::Graphics::Feature).  Such homogenous subfeatures created in
  # this fashion DO NOT have the parent (GFF column 9) attributes
  # propogated to them; so, since they are all part of the same
  # parent, the ONLY difference relevant to GFF production SHOULD be
  # the $start and $end coordinates for their segment, and ALL THIER
  # OTHER ATTRIBUTES should be copied down from the parent (including:
  # strand, score, Name, ID, Parent, etc).
 

  my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's
parent, if it is a homogenous child, otherwise $self.
 
  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
    my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be
TRUE only if  all subfeatures are the same type as $self.
    my $mygroup =
      # compute $self's group if it is needed to be passed down to
      # subfeatures, unless it is already being passed down (in which
      # case there are (at least) 3 levels of homogenous parent child
      # (will this ever happen in practice???))
      ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent); 
    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
  } else {
    my $name  = $hparentORself->name;
    my $class = $hparentORself->class;
    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
    my $strand = ('-','.','+')[$self->strand+1]; 
    # TODO: understand conditions under which this could be other than
    # hparentORself->strand.  In particular, why does add_segment flip
    # the strand when start > stop?  I thought this was not allowed!
    # Lincoln - any ideas?
    my $p      = join("\t",
 
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
        $self->start||'.',$self->stop||'.',
        defined($hparentORself->score) ? $hparentORself->score : '.',
        $strand||'.',
        defined($hparentORself->phase) ? $hparentORself->phase : '.',
        $group||'');
  }
}
 

________________________________

	From: Cook, Malcolm 
	Sent: Friday, April 27, 2007 1:45 PM
	To: 'lincoln.stein at gmail.com'
	Cc: 'lstein at cshl.org'; 'bioperl list'
	Subject: RE: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Lincoln,
	 
	Cool.
	 
	The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
	 

	Malcolm Cook
	Database Applications Manager - Bioinformatics
	Stowers Institute for Medical Research - Kansas City, Missouri
	  

________________________________

		From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
		Sent: Friday, April 27, 2007 12:45 PM
		To: Cook, Malcolm
		Cc: lstein at cshl.org; bioperl list
		Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
		
		
		Hi Malcom,
		
		This is absolutely ok and you can go ahead and commit.
Thanks for figuring this out!
		
		Lincoln
		
		
		On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

			Lincoln, et al,
			
			I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
			from a Bio::DB::SeqFeature::Store that were
initially created with
			-seqments (i.e. whose location was
discontiguous) does not display any
			other attributes in column 9 than "Name".
			
			What do you think of the following patch to
Bio::Graphics::FeatureBase, 
			whose effect is to "contrive to return
(duplicated) common group values"
			(which otherwise get lost when "collapsing"
"homogenous" parent/child
			features)
			
			Another approach would be to copy the attributes
from the parent to the 
			children when the -seqments are first created.
			
			Another approach would be to use
Bio::SeqFeature::Generic  as the db's
			-seqfeature_class and save with -location being
a Bio::Location::Split,
			but this was wrougth with other problems. 
			
			Any other suggestions?  Do you want me to commit
this patch?
			
			Cheers,
			
			Malcolm
			
			Patch follows:
			
			
			Index: FeatureBase.pm
	
=================================================================== 
			RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
			retrieving revision 1.29
			diff -c -r1.29 FeatureBase.pm
			*** FeatureBase.pm      16 Apr 2007 19:55:33
-0000      1.29
			--- FeatureBase.pm       26 Apr 2007 16:30:23
-0000
			***************
			*** 581,587 ****
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     return join "\n", at children;
			    }
			
			    return join("\n",$p, at children);
			--- 581,589 ----
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     #return join "\n", at children; 
			!     # Instead of above, additionally, contrive
to return (duplicated)
			common group values
			!     return(join("$group\n", at children) .
$group);
			    }
			
			    return join("\n",$p, at children); 
			

		-- 
		Lincoln D. Stein
		Cold Spring Harbor Laboratory
		1 Bungtown Road
		Cold Spring Harbor, NY 11724
		(516) 367-8380 (voice)
		(516) 367-8389 (fax)
		FOR URGENT MESSAGES & SCHEDULING, 
		PLEASE CONTACT MY ASSISTANT, 
		SANDRA MICHELSEN, AT michelse at cshl.edu 


From lstein at cshl.edu  Thu May  3 12:01:38 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 12:01:38 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>

The width of the image is determined by the -width attribute and is given in
pixels. You cannot control the height of the image as it is computed
dynamically based on the number of features and bumping options.

Lincoln

On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>
> Dear Scot,
>
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
>
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
>
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
>
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
> to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
>
> Thanks a lot for your inputs,
> --
> Shameer Khadar
> Lab (# 25) The Computational Biology Group
> National Centre for Biological Sciences (TIFR)
> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
> T - 91-080-23666001 EXT - 6251
> W - http://www.ncbs.res.in
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bioperlanand at yahoo.com  Thu May  3 16:09:18 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Thu, 3 May 2007 13:09:18 -0700 (PDT)
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
Message-ID: <922386.19570.qm@web36808.mail.mud.yahoo.com>

Hi

I am using Bioperl 1.4 and I am trying to obtain protein sequences for specific Uniprot records.

For some records (ROA1_HUMAN), it prints the correct sequence, but  it first prints the warning "Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <STREAM> line 43." 

For other records (BOLA_HAEIN), it prints the correct sequence (without any warnings).

Here is the code:
-------------------------------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Bio::Perl;
use Bio::DB::SwissProt;

my $sp = new Bio::DB::SwissProt;

#my $seq_object  = $sp->get_Seq_by_id('ROA1_HUMAN');
my $seq_object  = $sp->get_Seq_by_id('BOLA_HAEIN');

my $sequence_as_a_string = $seq_object->seq();
print "$sequence_as_a_string\n";
-------------------------------------------------------------------------------------------

 Is there something I need to fix.

Thanks in advance for the help.
 
 Anand

       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.


From MEC at stowers-institute.org  Thu May  3 16:19:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 3 May 2007 15:19:00 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
	<CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>
	<6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E230A6@exchkc02.stowers-institute.org>

Lincoln,
 
Ah, yes, round-tripping GFF, the holy grail....
 
Unfortunately, I don't really have a baseline to go against for an
example that roundtrips successfully now.  Do you?
 
For example, after loading test data: 
 
> bp_seqfeature_load.PLS  bioperl-live/t/data/biodbgff/test.gff3
 
the Contig1 portion of which looks like this:
 
##gff-version 3
## sequence-region Contig1 1 37450
Contig1 confirmed transcript 1001 2000 42 + .
ID=Transcript:trans-1;Gene=abc-1;Gene=xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . ID=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 ID=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 ID=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 ID=Transcript:trans-1
Contig1 est similarity 1001 1100 96 . . Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . . Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . . Target=EST:CEESC13F 201 250 +
Contig1 tc1 transposon 5001 6000 . + . ID=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
ID=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - . ID=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - . ID=Transcript:trans-2
 
 
and then generating output with
 
>bp_seqfeature_gff3.PLS --gff=1 -- seq_id Contig1  #  using a script I
just committed - I hope you like it.  Note: gff=1 => recurse 
 
we get output gff with problems such as:
 
    1 IDs get turned into Aliases
    2 the seqid of a Target attributes gets copied into the features
Name attribute
    3 supression of parents of homogeneous subfeatures doesn't work when
the parent has other subfeatures that those with its same type (i.e. the
transcript feature also has exon subfeatures)
 
look:
 
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed transcript 1001 2000 42 + .
Parent=2;Alias=Transcript:trans-1;Note=function+unknown;Gene=abc-1,xyz-2
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed transcript 30001 31000 . - .
Parent=9;Alias=Transcript:trans-2;Note=Terribly+interesting;Gene=xyz-2
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
with my new version of gff3_string (not yet commited), only the 3rd
problem is addressed, generating
 
bp_seqfeature_gff3.PLS --gff 1  -- seq_id Contig1
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
 
I had to make another change to get this output though, since I had to
change the behaviour to
 
  # provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have at least one subfeature with the same type as the
  # feature itself (thus redefining Lincoln's "homogenous
  # parent/child" case, which previously required all children to have
  # the same type as parent)
 
 
I think you will agree this is the more desirable behaviour.
 
I would be happy to test any other GFF you suggest might be (more or
less) roundtripped.
 
What think you?
 
--Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Thursday, May 03, 2007 9:46 AM
	To: Cook, Malcolm
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcolm,
	
	For me, the major use case is that GFF3 files round-trip
correctly through the database. Do any of your use cases cover that?
	
	Lincoln
	
	
	On 5/2/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, 
		 
		Here for your comment and review is a very reworked
version of Bio::Graphics::FeatureBase->gff3_string.
		 
		The main difference is to that homogenous children get
ALL their attributes except for start/stop from the parent, including
the group.  I also provide option as to whether or now to "remove
extraneous level of parentage" called $preserveHomegenousParent.
		 
		There is an in-line comment and question for you in the
code body.
		 
		It works well in my hands to my use cases, but, I'm not
positive it is in the spirit of your intentions.
		 
		Cheers,
		 
		Malcolm
		 
		 
		sub gff3_string {
		  my ($self, $recurse, $preserveHomegenousParent,
		 
		      # Note: the following parameters, whose name
begins with '$_',
		      # are intended for recursive call only.
		 
		      $_parent,
		      $_self_is_hsf,  # is $self the child in a
homogeneous parent/child relationship?
		      $_hsf_parentgroup, # if so, what is the group (GFF
column 9) of the parent
		     ) = @_;
		 
		  # PURPOSE: Return GFF3 format for the feature $self.
Optionally
		  # $recurse to include GFF for any subfeatures of the
feature. If
		  # recursing, provide special handling to "remove an
extraneous level
		  # of parentage" (unless $preserveHomegenousParent) for
features
		  # which have subfeatures all of whose types are the
same as the
		  # feature itself (the "homogenous parent/child" case).
This usage is
		  # a convention for representing discontiguous
features; they may be
		  # created by using the -segment directive without
specifying a
		  # distinct -subtype in to `new` when creating a
		  # Bio::Graphics::FeatureBase (i.e.
Bio::DB::SeqFeature,
		  # Bio::Graphics::Feature).  Such homogenous
subfeatures created in
		  # this fashion DO NOT have the parent (GFF column 9)
attributes
		  # propogated to them; so, since they are all part of
the same
		  # parent, the ONLY difference relevant to GFF
production SHOULD be
		  # the $start and $end coordinates for their segment,
and ALL THIER
		  # OTHER ATTRIBUTES should be copied down from the
parent (including:
		  # strand, score, Name, ID, Parent, etc).
		 
		
		  my $hparentORself = $_self_is_hsf ? $_parent : $self;
# $self's parent, if it is a homogenous child, otherwise $self.
		 
		  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
		    my $homogenous = ! grep {$_->type ne $self->type}
@ssf; # will be TRUE only if  all subfeatures are the same type as
$self.
		    my $mygroup =
		      # compute $self's group if it is needed to be
passed down to
		      # subfeatures, unless it is already being passed
down (in which
		      # case there are (at least) 3 levels of homogenous
parent child
		      # (will this ever happen in practice???))
		      ! $homogenous ? '' : $_self_is_hsf ?
$_hsf_parentgroup : $self->format_attributes($_parent); 
		    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
		  } else {
		    my $name  = $hparentORself->name;
		    my $class = $hparentORself->class;
		    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
		    my $strand = ('-','.','+')[$self->strand+1]; 
		    # TODO: understand conditions under which this could
be other than
		    # hparentORself->strand.  In particular, why does
add_segment flip
		    # the strand when start > stop?  I thought this was
not allowed!
		    # Lincoln - any ideas?
		    my $p      = join("\t",
	
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
		        $self->start||'.',$self->stop||'.',
		        defined($hparentORself->score) ?
$hparentORself->score : '.',
		        $strand||'.',
		        defined($hparentORself->phase) ?
$hparentORself->phase : '.',
		        $group||'');
		  }
		}
		 
		
________________________________

			From: Cook, Malcolm 
			Sent: Friday, April 27, 2007 1:45 PM
			To: 'lincoln.stein at gmail.com'
			Cc: 'lstein at cshl.org'; 'bioperl list'
			Subject: RE: Handling discontiguous feature
locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
			
			
			Hi Lincoln,
			 
			Cool.
			 
			The principal of what I figured out I still
think holds but the implementation is slightly broke.  Improved patch
forthoming next week.
			 

			Malcolm Cook
			Database Applications Manager - Bioinformatics
			Stowers Institute for Medical Research - Kansas
City, Missouri
			  

________________________________

				From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
				Sent: Friday, April 27, 2007 12:45 PM
				To: Cook, Malcolm
				Cc: lstein at cshl.org; bioperl list
				Subject: Re: Handling discontiguous
feature locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
				
				
				Hi Malcom,
				
				This is absolutely ok and you can go
ahead and commit. Thanks for figuring this out!
				
				Lincoln
				
				
				On 4/26/07, Cook, Malcolm <
MEC at stowers-institute.org <mailto:MEC at stowers-institute.org> > wrote: 

				Lincoln, et al,
				
				I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
				from a Bio::DB::SeqFeature::Store that
were initially created with
				-seqments (i.e. whose location was
discontiguous) does not display any
				other attributes in column 9 than
"Name".
				
				What do you think of the following patch
to Bio::Graphics::FeatureBase, 
				whose effect is to "contrive to return
(duplicated) common group values"
				(which otherwise get lost when
"collapsing" "homogenous" parent/child
				features)
				
				Another approach would be to copy the
attributes from the parent to the 
				children when the -seqments are first
created.
				
				Another approach would be to use
Bio::SeqFeature::Generic  as the db's
				-seqfeature_class and save with
-location being a Bio::Location::Split,
				but this was wrougth with other
problems. 
				
				Any other suggestions?  Do you want me
to commit this patch?
				
				Cheers,
				
				Malcolm
				
				Patch follows:
				
				
				Index: FeatureBase.pm
	
=================================================================== 
				RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
				retrieving revision 1.29
				diff -c -r1.29 FeatureBase.pm
				*** FeatureBase.pm      16 Apr 2007
19:55:33 -0000      1.29
				--- FeatureBase.pm       26 Apr 2007
16:30:23 -0000
				***************
				*** 581,587 ****
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     return join "\n", at children;
				    }
				
				    return join("\n",$p, at children);
				--- 581,589 ----
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     #return join "\n", at children; 
				!     # Instead of above, additionally,
contrive to return (duplicated)
				common group values
				!     return(join("$group\n", at children)
. $group);
				    }
				
				    return join("\n",$p, at children); 
				

				-- 
				Lincoln D. Stein
				Cold Spring Harbor Laboratory
				1 Bungtown Road
				Cold Spring Harbor, NY 11724
				(516) 367-8380 (voice)
				(516) 367-8389 (fax)
				FOR URGENT MESSAGES & SCHEDULING, 
				PLEASE CONTACT MY ASSISTANT, 
				SANDRA MICHELSEN, AT michelse at cshl.edu 


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Thu May  3 16:57:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 3 May 2007 15:57:43 -0500
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
In-Reply-To: <922386.19570.qm@web36808.mail.mud.yahoo.com>
References: <922386.19570.qm@web36808.mail.mud.yahoo.com>
Message-ID: <2930F3F1-2BFB-4320-9A2C-50DFE6F808A1@uiuc.edu>

I would update to BioPerl 1.5.2.  v.1.4 is 3 yrs old and there have  
been tons of changes both for sequence retrieval and parsers.

We can't predict when a new 'stable' release will be available but  
1.5.2 works well for most purposes.

chris

On May 3, 2007, at 3:09 PM, Anand Venkatraman wrote:

> Hi
>
> I am using Bioperl 1.4 and I am trying to obtain protein sequences  
> for specific Uniprot records.
> ...
>  Is there something I need to fix.
>
> Thanks in advance for the help.
>
>  Anand
>
>
> ---------------------------------
> Ahhh...imagining that irresistible "new car" smell?
>  Check outnew cars at Yahoo! Autos.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thiago.venancio at gmail.com  Thu May  3 17:12:35 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Thu, 3 May 2007 18:12:35 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
	<54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
Message-ID: <44255ea80705031412n7abef247je70d2681bb3cc7ed@mail.gmail.com>

Hi all,

Just for record. I am getting good results to extract CDS from protein X dna
alignments by using the following procedure:

- BLASTX to identify the hits for each dna sequence (if you want to process
sequences for further multiple sequence alignment, it is important to record
the frames);

- fastx/y to refine the alignment between the protein and the dna. FASTX/Y
is is quite good, because it performs well with frame shifts and a allows
better identification of premature stop codons. In addition, the alignment
(and the CDS prediction) is better.

This is interesting to note, to avoid analysis of "phantom" mRNAs, which are
sequences that have stops, so merely looking at the blast can raise
misleading results sometimes.

Best.

Thiago


On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Hi -
> There are some tools that do this for you -- I've listed a few from a
> google search or from what I remember reading.  It would be great If you
> (and others!) are willing to contribute a little of the info of what you
> find that works for you to the wiki, that would be great as well.   A little
> HOWTO would be cool - here or on openwetware.org.
>
> Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
> EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2
>
> Ewan Birney's estwise as part of wise package also can help if you have a
> likely protein from BLAST you want to align to the est - estwise can handle
> frameshifts, but can be too slow for some people.  Exonerate's protein2dna
> model may also work here, but I haven't tried it.
>
> -jason
> On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:
>
> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed
> to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From lstein at cshl.edu  Thu May  3 17:35:57 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 17:35:57 -0400
Subject: [Bioperl-l] CSHL is hiring
Message-ID: <6dce9a0b0705031435r3bc2d2ddlfca5ac02844b4ef0@mail.gmail.com>

Hi Folks,

Sorry for the spam. My group at CSHL is looking for a scientific programmer
with good software development credentials and some experience in
bioinformatics. Experience in object-oriented Perl programming is a strict
requirement.

This is to work on user interface development for several projects
including:

   - BioMart (data warehouse) project (www.biomart.org)
   - GBrowse genome browser (www.gmod.org/GBrowse)
   - Reactome pathways database (www.reactome.org)

I can offer salaries in the 60-80K range, depending on level of experience.
Please reply to lstein at cshl.edu.

Best,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From MEC at stowers-institute.org  Tue May  8 12:59:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 8 May 2007 11:59:10 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start
	and stop coordinates??
Message-ID: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>

Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
coordinates, 

as in:
  ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
&& $start > $stop;

I thought it is not legal for a feature to be so composed.  

Anyone know?

Cheers,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue May  8 13:12:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 12:12:45 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>

I believe all seqfeature location coordinates are designed to have  
start < stop for consistency; in cases where the strand matters (CDS,  
gene, etc.) then the strand is set to 1 or -1.  When start > stop,  
the two are reversed and the strand is flipped; at least that's the  
way locations are set up in BioPerl.

chris

On May 8, 2007, at 11:59 AM, Cook, Malcolm wrote:

> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Tue May  8 14:37:20 2007
From: juheymann at yahoo.com (Bohr)
Date: Tue, 8 May 2007 11:37:20 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
Message-ID: <10381379.post@talk.nabble.com>


Hi,

I installed bioperl under OSX Tiger via Fink. I tested the installation
using the test tutorial via: perl -w bptutorial.pl 5

The script failed indicating that the file to retrieve was missing. To
identify the problem, I used a script using 'get_sequence' that will
retrieve a file from 'genbank' or 'embl'. Both succeeded. If I replace it
with 'swiss' or 'swissprot' and substitute the ID with the identical ID as
in the tutorial, I am recreating the problem found with bptutorial.pl. Other
ID's do the same.

Any pointers on the origin of this finding would be greatly appreciated.
-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10381379
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Tue May  8 17:53:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 16:53:04 -0500
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
In-Reply-To: <10381379.post@talk.nabble.com>
References: <10381379.post@talk.nabble.com>
Message-ID: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>

The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
etc) accessed via bioperl.

As a note, the bptutorial.pl has been moved to the bioperl wiki:

http://www.bioperl.org/wiki/Bptutorial

chris

On May 8, 2007, at 1:37 PM, Bohr wrote:

>
> Hi,
>
> I installed bioperl under OSX Tiger via Fink. I tested the  
> installation
> using the test tutorial via: perl -w bptutorial.pl 5
>
> The script failed indicating that the file to retrieve was missing. To
> identify the problem, I used a script using 'get_sequence' that will
> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
> replace it
> with 'swiss' or 'swissprot' and substitute the ID with the  
> identical ID as
> in the tutorial, I am recreating the problem found with  
> bptutorial.pl. Other
> ID's do the same.
>
> Any pointers on the origin of this finding would be greatly  
> appreciated.
> -- 
> View this message in context: http://www.nabble.com/problem-with- 
> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
> tf3711391.html#a10381379
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Wed May  9 18:17:27 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:17:27 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10403903.post@talk.nabble.com>


Thank you for the feedback and the suggestion.

I installed 1.5.2 via Build.pl and the results were the same e.g. embl and
genbank worked fine, swissprot failed

Here is the output:

MSG: acc (CALX_YEAST) does not exist
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Did not provide a valid Bio::PrimarySeqI object
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SeqIO::fasta::write_seq
/sw/lib/perl5/5.8.6/Bio/SeqIO/fasta.pm:181

Before contemplating too much:
Here my question: how do I verify the update to 1.5.2? (I ran ./Build test
and that came back positive.) And what else could have gone wrong here?

What might be a clever way to troubleshoot this?


---------------------------------------------------------------------------

Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10403903
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From ursula_cox at btinternet.com  Wed May  9 18:12:26 2007
From: ursula_cox at btinternet.com (Ursula at BT)
Date: Wed, 9 May 2007 23:12:26 +0100
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
Message-ID: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>

Dear BioPerl List,

 
I'm new to BioPerl (and Perl for that matter). I have an array of enzyme
names, and a larger collection of enzymes (guaranteed to be a superset by
the way it's constructed). I need to make a new collection containing just
the enzymes corresponding to the names I have in the array.

 
I was hoping that something like:

 
my $all_rebase =
Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');

my $all_rebase_collection = $all_rebase->read();

 
my @enzymes =
('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','AccB1I','
AccB7I','AccI');

 
my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

foreach $enzyme (all_rebase_collection)

            {

            $new_collection($enzyme) if grep $_ eq $enzyme->name, @enzymes;

            }

 
would work, but I get a syntax error near "$new_collection(".

 
Any clues much appreciated,

 
Ursula Cox


From juheymann at yahoo.com  Wed May  9 18:38:42 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:38:42 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10404211.post@talk.nabble.com>


Thank you for pointing that out! I installed 1.5.2 via Build.pl. The scripts
work as expected now.


Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10404211
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Wed May  9 19:37:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 May 2007 18:37:33 -0500
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
In-Reply-To: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
References: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
Message-ID: <E4472E55-AADB-4697-8C4D-2EC231923F0B@uiuc.edu>


On May 9, 2007, at 5:12 PM, Ursula at BT wrote:

> Dear BioPerl List,
>
>
>
> I'm new to BioPerl (and Perl for that matter). I have an array of  
> enzyme
> names, and a larger collection of enzymes (guaranteed to be a  
> superset by
> the way it's constructed). I need to make a new collection  
> containing just
> the enzymes corresponding to the names I have in the array.

First, prior to using BioPerl you should really brush up on perl  
itself (Learning Perl, or James Tisdall's Perl for Bioinformatics  
books, the former preferred).  Though there are several scripts  
available to get you started with Bioperl, much of the code is  
written with the expectation that you can write and debug a basic  
perl script (and there is some expectation that you are somewhat  
familiar with OO Perl).

Saying that, let's see what's wrong...

> I was hoping that something like:
>
>
>
> my $all_rebase =
> Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');
>
> my $all_rebase_collection = $all_rebase->read();

The 'bionet' format is not supported; only 'withrefm', 'itype2',  
'bairoch' are (the latter only experimentally).  See 'perldoc  
Bio::Restriction::IO'.

> my @enzymes =
> ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','Acc 
> B1I','
> AccB7I','AccI');
>
>
>
> my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

Missing a new() constructor here.

> foreach $enzyme (all_rebase_collection)

Not sure what this is.  No '$' sigil for $all_rebase_collection will  
make the compiler look for (and fail to find) the sub  
all_rebase_collection().

>
>             {
>
>             $new_collection($enzyme) if grep $_ eq $enzyme->name,  
> @enzymes;
>
>             }
>
>
>
> would work, but I get a syntax error near "$new_collection(".

Yep.  You don't have your grep sub block in brackets {}, hence the  
error.  See 'perldoc -f grep'.

> Any clues much appreciated,
>
>
>
> Ursula Cox

No prob, but again you might want to brush up on perl.

chris


From darin.london at duke.edu  Thu May 10 12:17:38 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Thu, 10 May 2007 12:17:38 -0400
Subject: [Bioperl-l] BOSC 2007 Second Call For Papers
Message-ID: <200705101617.l4AGHceI002463@tenero.duhs.duke.edu>


The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th.  The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions.   Thank you, and we hope to see you in Vienna.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From lstein at cshl.edu  Thu May 10 13:13:09 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 10 May 2007 13:12:09 -0401
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0705101013w1923c173l5ec5d9288c67c9a2@mail.gmail.com>

It's a workaround for some broken data sources. It should "never happen."

Lincoln

On 5/8/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Bank.Beszteri at awi.de  Thu May 10 12:13:00 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Thu, 10 May 2007 18:13:00 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
Message-ID: <4643448C.4000807@awi.de>

Dear Bioperl folks,

I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
but in some things it did not behave as I expected it to, so I had to 
look inside a bit.
In particular, I had problems with mixed up bootstrap values after 
re-rooting. After looking into the Bio::Tree::Tree data structures, it 
seems that

a) bootstrap values are stored as attributes of nodes of the tree [to my 
understanding, they should rather be attributes of branches but 
Bio::Tree::Tree apparently tries to simplify away branches]; each node 
stores the bootstrap value belonging to the branch that connects it to 
its ancestor node (I?m reading in trees from Newick strings, and 
bootstrap values arrive in the id fields of internal branches)

b) when re-rooting a tree, bootstrap values stay with the same node 
where they were before. Because the node that used to be the ancestor of 
a particular node in the original tree might have become its descendant 
after re-rooting, the bootstrap values are being mixed up.

Can you confirm my conclusion? Whether yes or no, have you got an easy 
workaround or alternative solution to re-rooting trees (without having 
to touch the reroot method) or any other hints that could be useful for 
me to deal with this issue?

Cheers,

Bank


--
Dr. B?nk Beszteri
Alfred Wegener Institute for Polar and Marine Research


From dmessina at wustl.edu  Thu May 10 16:16:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 10 May 2007 15:16:48 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
Message-ID: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>

Hi everyone,

Shin Leong here at the Wash U GSC has written SearchIO-compliant  
cross_match parsing and result modules. Specifically,  
Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.

To my knowledge this functionality doesn't exist in BioPerl. Any  
comments or objections before I commit these to CVS?

Thanks,
Dave


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From aperezp at uma.es  Thu May 10 13:58:32 2007
From: aperezp at uma.es (=?ISO-8859-1?Q?=22Antonio_J=2E_P=E9rez=22?=)
Date: Thu, 10 May 2007 19:58:32 +0200
Subject: [Bioperl-l] Get Swiss Entry
Message-ID: <46435D48.4020309@uma.es>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/ca4e893e/attachment-0002.html>

From jason at bioperl.org  Thu May 10 16:53:28 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 10 May 2007 13:53:28 -0700
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <FDBE1855-6252-4902-B32B-E984EC6B22E9@bioperl.org>

Awesome!
On May 10, 2007, at 1:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment-0002.bin>

From cjfields at uiuc.edu  Fri May 11 00:55:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 10 May 2007 23:55:05 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>

Sounds good to me!  Any tests to be added?

chris

On May 10, 2007, at 3:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Fri May 11 01:42:53 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 11 May 2007 00:42:53 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
Message-ID: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>

> Sounds good to me!  Any tests to be added?

No tests right now as far as I can tell. I'm swamped personally, but  
perhaps I can persuade Mark Johnson over here to crank out a few.


From cjfields at uiuc.edu  Fri May 11 11:25:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 10:25:34 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
	<57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
Message-ID: <B654B314-FE39-4DB2-9B2F-5C812CF3E257@uiuc.edu>

Thanks Mark!  I don't think you'll need to add a ton of tests; just  
enough to demo anything that you feel is necessary or specific to the  
parser.  These could go into SearchIO.t or their own test suite.

chris

On May 11, 2007, at 10:14 AM, Mark Johnson wrote:

>>> Sounds good to me!  Any tests to be added?
>>
>> No tests right now as far as I can tell. I'm swamped personally, but
>> perhaps I can persuade Mark Johnson over here to crank out a few.
>
> I'll see what I can do.  I just had to open my mouth about getting  
> this
> contributed back after I noticed it, so I suppose this is appropriate
> retribution.  8)
>
>


From mjohnson at watson.wustl.edu  Fri May 11 11:14:56 2007
From: mjohnson at watson.wustl.edu (Mark Johnson)
Date: Fri, 11 May 2007 10:14:56 -0500 (CDT)
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
Message-ID: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>

>> Sounds good to me!  Any tests to be added?
>
> No tests right now as far as I can tell. I'm swamped personally, but
> perhaps I can persuade Mark Johnson over here to crank out a few.

I'll see what I can do.  I just had to open my mouth about getting this
contributed back after I noticed it, so I suppose this is appropriate
retribution.  8)


From golharam at umdnj.edu  Fri May 11 16:20:41 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 16:20:41 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
Message-ID: <000501c79409$d8c03480$f6028a0a@PICO>

I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


From jason at bioperl.org  Fri May 11 16:53:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 11 May 2007 13:53:19 -0700
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>

Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup();


-jason
On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri May 11 16:57:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 15:57:23 -0500
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <41E91E58-48A5-4E29-B6BA-E9417BF17513@uiuc.edu>

cleanup() is supposed to clean up temp directory stuff; it's  
inherited from Bio::Tools::Run::WrapperBase.

chris

On May 11, 2007, at 3:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From golharam at umdnj.edu  Fri May 11 18:11:47 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 18:11:47 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
 after itself
In-Reply-To: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>
Message-ID: <001301c79419$5e794e90$f6028a0a@PICO>

No, I didn't, but I will now.  Thanks.  Interestingly enough ClustalW
removes the files from within the temp directory, but not the temp directory
itself.
 
 
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Friday, May 11, 2007 4:53 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
after itself


Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup(); 


-jason

On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:


I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From goshng at gmail.com  Sat May 12 11:21:59 2007
From: goshng at gmail.com (Sang Chul Choi)
Date: Sat, 12 May 2007 11:21:59 -0400
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
Message-ID: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>

Hi,

One Bio::Seq's sequence is "ACGT" and I want this object to have
"ACGA" by changing the fouth letter from T to A. I thought I could do
this by reading sequence string through the method of seq(), changing
the string by perl's general function, and generating another Bio::Seq
object with the new string. This seems to be silly, a little bit.

Is there any simple way to do this? Or, is there any method of
Bio::Seq to do this: to change one letter at a particular position, or
additionally to change letters with some range?

Thank you,

Sang Chul


From jason at bioperl.org  Sat May 12 12:50:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 09:50:10 -0700
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
In-Reply-To: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
References: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
Message-ID: <22C99635-C22D-4F51-AADD-5CCF595222DF@bioperl.org>

You can get/set the seq data via the seq() method.

use Bio::Seq;
my $seq = Bio::Seq->new(-seq => 'ACGT');

my $str = $seq->seq;
print $str, "\n";

substr($str,3,1,'A');
$seq->seq($str);
print $seq->seq, "\n";

On May 12, 2007, at 8:21 AM, Sang Chul Choi wrote:

> Hi,
>
> One Bio::Seq's sequence is "ACGT" and I want this object to have
> "ACGA" by changing the fouth letter from T to A. I thought I could do
> this by reading sequence string through the method of seq(), changing
> the string by perl's general function, and generating another Bio::Seq
> object with the new string. This seems to be silly, a little bit.
>
> Is there any simple way to do this? Or, is there any method of
> Bio::Seq to do this: to change one letter at a particular position, or
> additionally to change letters with some range?
>
> Thank you,
>
> Sang Chul
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Sat May 12 18:12:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 15:12:56 -0700
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <4643448C.4000807@awi.de>
References: <4643448C.4000807@awi.de>
Message-ID: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>


On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:

> Dear Bioperl folks,
>
> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees,
> but in some things it did not behave as I expected it to, so I had to
> look inside a bit.
> In particular, I had problems with mixed up bootstrap values after
> re-rooting. After looking into the Bio::Tree::Tree data structures, it
> seems that
>
> a) bootstrap values are stored as attributes of nodes of the tree  
> [to my
> understanding, they should rather be attributes of branches but
> Bio::Tree::Tree apparently tries to simplify away branches]; each node
> stores the bootstrap value belonging to the branch that connects it to
> its ancestor node (I?m reading in trees from Newick strings, and
> bootstrap values arrive in the id fields of internal branches)

Please feel free to suggest an alternative implementation if you  
don't agree with the object model.    It has worked quite well in our  
hands so I'd be all ears for someone wanting to get in an do some  
more work on it.

We have answered the question as to why bootstrap values are internal  
ids many times on this list and I believe on the wiki -- the parser  
can't tell the difference between a node id and a bootstrap value  
because nexus uses the same slot for both.  if you know you have  
bootstrap values in the internal node it is trivial to process your  
tree and copy the values over.


for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
  $node->bootstrap($node->id);
  $node->id('');
}

I just added this as a method to TreeFunctionI so that it can be  
easily called now to help satisfy everyone who hopes that the toolkit  
can guess whether the internal nodes are bootstraps or identifiers.


>
> b) when re-rooting a tree, bootstrap values stay with the same node
> where they were before. Because the node that used to be the  
> ancestor of
> a particular node in the original tree might have become its  
> descendant
> after re-rooting, the bootstrap values are being mixed up.
>
> Can you confirm my conclusion? Whether yes or no, have you got an easy
> workaround or alternative solution to re-rooting trees (without having
> to touch the reroot method) or any other hints that could be useful  
> for
> me to deal with this issue?
>

I think you are right, but I am not clear what should be value for  
the internal node attached to the root now.

Note that is always helpful to provide example code illustrating your  
problem.  Here is an example which I think illustrates your problem.

use Bio::TreeIO;

my $in = Bio::TreeIO->new(-format => 'newick',
			  -fh => \*DATA);
my $out = Bio::TreeIO->new(-format => 'newick');
while( my $t = $in->next_tree ){
     my ($a) = $t->find_node(-id =>"A");
     $out->write_tree($t);
     $t->reroot($a);
     $out->write_tree($t);
}
__DATA__
(((A:5,B:5)90:2,C:4)25:3,D:10);


> Cheers,
>
> Bank
>
>
>
> --
> Dr. B?nk Beszteri
> Alfred Wegener Institute for Polar and Marine Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From darin.london at duke.edu  Mon May 14 10:44:56 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Mon, 14 May 2007 10:44:56 -0400
Subject: [Bioperl-l] BOSC 2007 Abstract Submission Deadline Extended
Message-ID: <200705141444.l4EEium2026969@tenero.duhs.duke.edu>


Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st.  The announcement day will remain the same so that it remains before the Early Discount Date.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From thiago.venancio at gmail.com  Mon May 14 14:54:44 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Mon, 14 May 2007 15:54:44 -0300
Subject: [Bioperl-l] get regions
Message-ID: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>

Hi all,

Using Bio::Seq, is there any easy way to get the coordinates where a
regular expression matches or should I build a sliding window?

For example, looking for a given promoter region in a FASTA file. If
the region is found, I would like to recover exactly the coordinates
where it matches.

Thanks in advance.

Thiago
-- 
"Doubt is not a pleasant condition, but certainty is absurd."
            Voltaire

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From jason at bioperl.org  Mon May 14 15:06:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 12:06:11 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>

I assume you are doing the matches on the string with =~ so Bio::Seq  
doesn't really help you here I don't think.
See the $` variable in Perl for how to capture the position of where  
a regexp matches.

-jason
On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:

> Hi all,
>
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
>
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
>
> Thanks in advance.
>
> Thiago
> -- 
> "Doubt is not a pleasant condition, but certainty is absurd."
>             Voltaire
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Mon May 14 15:15:09 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 14 May 2007 12:15:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>

I do this in perl with the pos() function.  This requires the use of the
match operator (m) like

if ($gene =~ m/$pattern/gi)
{
	$start = pos($gene) - length($pattern) + 1;
}

pos() returns the location of the pointer where the regex left off after
finding a match.  I remove the length of my pattern (which is just a
string with a few placeholder (.) wildcards, so I know how long the
match will always be).

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Monday, May 14, 2007 12:06 PM
> To: Thiago Venancio
> Cc: bioperl-l list
> Subject: Re: [Bioperl-l] get regions
> 
> I assume you are doing the matches on the string with =~ so 
> Bio::Seq doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position 
> of where a regexp matches.
> 
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> 
> > Hi all,
> >
> > Using Bio::Seq, is there any easy way to get the 
> coordinates where a 
> > regular expression matches or should I build a sliding window?
> >
> > For example, looking for a given promoter region in a FASTA 
> file. If 
> > the region is found, I would like to recover exactly the 
> coordinates 
> > where it matches.
> >
> > Thanks in advance.
> >
> > Thiago
> > --
> > "Doubt is not a pleasant condition, but certainty is absurd."
> >             Voltaire
> >
> > ========================
> > Thiago Motta Venancio, MSc
> > PhD student in Bioinformatics
> > University of Sao Paulo
> > ========================
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Bank.Beszteri at awi.de  Mon May 14 09:20:07 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Mon, 14 May 2007 15:20:07 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
References: <4643448C.4000807@awi.de>
	<1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
Message-ID: <46486207.60304@awi.de>

Dear Jason,

thanks for your answer! Sorry about having been ambiguous - it is clear 
that bootstrap values are parsed as ids from newick files, I had no 
problem with that, it was only the first step of the explanation of my 
problem, which was the rerooting issue.

Thanks for your example code as well, it is indeed really useful to 
illustrate the problem. I modified the original tree a bit to make my 
point clearer:

In your example, there are two internal node ids in a four-taxon tree. 
This is not a realistic situtation for bootstrap values, because 
bootstrap values are attached to bipartitions of terminal nodes, i.e., 
edges / branches of a tree (in what proportion of the bootstrap 
replicates was a particular bipartition recovered - an alternative 
representation of bootstraps, like produced e.g. by PAUP, is indeed a 
"taxon bipartition table"). This means that in a four taxon tree, we can 
have at most one bootstrap value - corresponding to the single 
non-trivial bipartition (all other bipartitions are trivial, i.e., they 
separate a terminal node from the rest).

So here is an example 4-taxon tree with a bootstrap value:

(A:52,(B:46,C:50)68:11,D:70);

After rerooting at node B (using your example code) it looks like

((B:46,C:50,(A:52,D:70):11)68);

Now there are two problems:
    1) this seems to be a small problem with TreeIO rather than with 
rerooting: there is an extra pair of parentheses around the whole tree;

but more importantly: 
    2) the bootstrap value appears at the root node, which is not 
sensible according to the convention that "each node stores the 
bootstrap value belonging to the branch linking it to its ancestor". You 
would like the bootstrap value appear at the node connecting A & D in 
this situation, which would look like

(B:46,C:50,(A:52,D:70)68:11);

because in  this new situation, this position would correspond to the 
same bipartition as in the original tree [which is (A,D)(B,C)].

In the meanwhile, I got a mail showing me the solution (thx Daniel!), 
which is in fact pretty simple: all that has to be done is go through 
the nodes on the path from the old to the new root after rerooting, and 
for each node, take the bootstrap values from its ancestor (and remove 
it from the ancestor). This leaves the root node without a bootstrap 
value, which is exactly what you want (because it has no branch 
connecting it to its ancestor, there is no sensible bootstrap value 
attached to a root node).

So this exercise tells me that bootstraps and "real" node ids should be 
handled in different manners when rerooting: real ids should of course 
stay with the nodes, whereas bootstrap values on the path between the 
new and old root should move over to the other end of the corresponding 
branch.

Best wishes,

Bank

Jason Stajich wrote:
>
> On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:
>
>> Dear Bioperl folks,
>>
>> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
>> but in some things it did not behave as I expected it to, so I had to 
>> look inside a bit.
>> In particular, I had problems with mixed up bootstrap values after 
>> re-rooting. After looking into the Bio::Tree::Tree data structures, it 
>> seems that
>>
>> a) bootstrap values are stored as attributes of nodes of the tree [to my 
>> understanding, they should rather be attributes of branches but 
>> Bio::Tree::Tree apparently tries to simplify away branches]; each node 
>> stores the bootstrap value belonging to the branch that connects it to 
>> its ancestor node (I?m reading in trees from Newick strings, and 
>> bootstrap values arrive in the id fields of internal branches)
>
> Please feel free to suggest an alternative implementation if you don't 
> agree with the object model.    It has worked quite well in our hands 
> so I'd be all ears for someone wanting to get in an do some more work 
> on it.
>
> We have answered the question as to why bootstrap values are internal 
> ids many times on this list and I believe on the wiki -- the parser 
> can't tell the difference between a node id and a bootstrap value 
> because nexus uses the same slot for both.  if you know you have 
> bootstrap values in the internal node it is trivial to process your 
> tree and copy the values over.  
>
>
> for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
>  $node->bootstrap($node->id); 
>  $node->id('');
> }
>
> I just added this as a method to TreeFunctionI so that it can be 
> easily called now to help satisfy everyone who hopes that the toolkit 
> can guess whether the internal nodes are bootstraps or identifiers.
>
>
>>
>> b) when re-rooting a tree, bootstrap values stay with the same node 
>> where they were before. Because the node that used to be the ancestor of 
>> a particular node in the original tree might have become its descendant 
>> after re-rooting, the bootstrap values are being mixed up.
>>
>> Can you confirm my conclusion? Whether yes or no, have you got an easy 
>> workaround or alternative solution to re-rooting trees (without having 
>> to touch the reroot method) or any other hints that could be useful for 
>> me to deal with this issue?
>>
>
> I think you are right, but I am not clear what should be value for the 
> internal node attached to the root now.
>
> Note that is always helpful to provide example code illustrating your 
> problem.  Here is an example which I think illustrates your problem.
>
> use Bio::TreeIO;
>
> my $in = Bio::TreeIO->new(-format => 'newick',
>   -fh => \*DATA);
> my $out = Bio::TreeIO->new(-format => 'newick');
> while( my $t = $in->next_tree ){
>     my ($a) = $t->find_node(-id =>"A");
>     $out->write_tree($t);
>     $t->reroot($a);
>     $out->write_tree($t);
> }
> __DATA__
> (((A:5,B:5)90:2,C:4)25:3,D:10);
>
>
>> Cheers,
>>
>> Bank
>>
>>
>>
>> --
>> Dr. B?nk Beszteri
>> Alfred Wegener Institute for Polar and Marine Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
>
>


From basu at pharm.sunysb.edu  Mon May 14 15:10:33 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Mon, 14 May 2007 15:10:33 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <4648B429.2030907@pharm.sunysb.edu>

Thiago Venancio wrote:
> Hi all,
> 
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
The perl core function "pos" should help you in this case. Do a 'perldoc
-f pos' for details.

-sidd


> 
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
> 
> Thanks in advance.
> 
> Thiago


From cjfields at uiuc.edu  Mon May 14 16:48:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 May 2007 15:48:36 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>

I use pos() with m{}g; the quoted globals tend to slow things down  
for me.

Ah, see Kevin's answered that...

chris

On May 14, 2007, at 2:06 PM, Jason Stajich wrote:

> I assume you are doing the matches on the string with =~ so Bio::Seq
> doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position of where
> a regexp matches.
>
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>
>> Hi all,
>>
>> Using Bio::Seq, is there any easy way to get the coordinates where a
>> regular expression matches or should I build a sliding window?
>>
>> For example, looking for a given promoter region in a FASTA file. If
>> the region is found, I would like to recover exactly the coordinates
>> where it matches.
>>
>> Thanks in advance.
>>
>> Thiago
>> -- 
>> "Doubt is not a pleasant condition, but certainty is absurd."
>>             Voltaire
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon May 14 17:50:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 14:50:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
Message-ID: <A5BECADC-6516-41FF-A5DB-EE865AD63842@bioperl.org>

yep you are right pos() much better and faster for getting the position.

-j
On May 14, 2007, at 1:48 PM, Chris Fields wrote:

> I use pos() with m{}g; the quoted globals tend to slow things down  
> for me.
>
> Ah, see Kevin's answered that...
>
> chris
>
> On May 14, 2007, at 2:06 PM, Jason Stajich wrote:
>
>> I assume you are doing the matches on the string with =~ so Bio::Seq
>> doesn't really help you here I don't think.
>> See the $` variable in Perl for how to capture the position of where
>> a regexp matches.
>>
>> -jason
>> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>>
>>> Hi all,
>>>
>>> Using Bio::Seq, is there any easy way to get the coordinates where a
>>> regular expression matches or should I build a sliding window?
>>>
>>> For example, looking for a given promoter region in a FASTA file. If
>>> the region is found, I would like to recover exactly the coordinates
>>> where it matches.
>>>
>>> Thanks in advance.
>>>
>>> Thiago
>>> -- 
>>> "Doubt is not a pleasant condition, but certainty is absurd."
>>>             Voltaire
>>>
>>> ========================
>>> Thiago Motta Venancio, MSc
>>> PhD student in Bioinformatics
>>> University of Sao Paulo
>>> ========================
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From sac at bioperl.org  Mon May 14 21:46:55 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 14 May 2007 18:46:55 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
Message-ID: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>

On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> I do this in perl with the pos() function.  This requires the use of the
> match operator (m) like
>
> if ($gene =~ m/$pattern/gi)
> {
>         $start = pos($gene) - length($pattern) + 1;
> }
>
> pos() returns the location of the pointer where the regex left off after
> finding a match.

Cool. I hadn't known that was possible.

> I remove the length of my pattern (which is just a
> string with a few placeholder (.) wildcards, so I know how long the
> match will always be).

To generalize your code so that it will work for any pattern, such as
one that can match strings of variable length like "A{5,10}", just
subtract the length of the actual string that was matched:

if ($gene =~ m/$pattern/gi)
{
    $start = pos($gene) - length($&) + 1;
 }

Steve

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > Jason Stajich
> > Sent: Monday, May 14, 2007 12:06 PM
> > To: Thiago Venancio
> > Cc: bioperl-l list
> > Subject: Re: [Bioperl-l] get regions
> >
> > I assume you are doing the matches on the string with =~ so
> > Bio::Seq doesn't really help you here I don't think.
> > See the $` variable in Perl for how to capture the position
> > of where a regexp matches.
> >
> > -jason
> > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> >
> > > Hi all,
> > >
> > > Using Bio::Seq, is there any easy way to get the
> > coordinates where a
> > > regular expression matches or should I build a sliding window?
> > >
> > > For example, looking for a given promoter region in a FASTA
> > file. If
> > > the region is found, I would like to recover exactly the
> > coordinates
> > > where it matches.
> > >
> > > Thanks in advance.
> > >
> > > Thiago
> > > --
> > > "Doubt is not a pleasant condition, but certainty is absurd."
> > >             Voltaire
> > >
> > > ========================
> > > Thiago Motta Venancio, MSc
> > > PhD student in Bioinformatics
> > > University of Sao Paulo
> > > ========================
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shameer at ncbs.res.in  Mon May 14 23:03:57 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 15 May 2007 08:33:57 +0530 (IST)
Subject: [Bioperl-l] How to produce Bio::Graphics images using PROSITE
	output ?
In-Reply-To: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
Message-ID: <49697.192.168.1.1.1179198237.squirrel@mail.ncbs.res.in>

Dear All,

Thanks a lot for all your inputs [Help : Imagemaps using Bio::Graphics ].
I am still working on the other part of this project. Now, I am sure that
I can impliment it using Bio::Graphics. I will come back to imagemaps with
in a week or two.

Meanwhile, I need to parse a prosite output to present it as a
Bio::Graphics image. Any one had tries Bio::Graphics to create images
using prosite output ? I tried in the How-to I couldnt find anything
related to prosite.

My output looks like this :
    >Sequence : PS00001 ASN_GLYCOSYLATION N-glycosylation site.
          75 - 78  NGSM
    >Sequence : PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation
site.
         41 - 43  SpK
    >Sequence : PS00008 MYRISTYL N-myristoylation site.
           6 - 11  GTitNQ
    >Sequence : PS00009 AMIDATION Amidation site.
          78 - 81  mGKR

I need to impliment an image like blast-parser image.
Thanks to any inputs/pointers.

> The width of the image is determined by the -width attribute and is given
> in
> pixels. You cannot control the height of the image as it is computed
> dynamically based on the number of features and bumping options.
>
> Lincoln
>
> On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>>
>> Dear Scot,
>>
>> > There is a fair amount of documentation in the perldoc for
>> > Bio::Graphics::Panel under the section called 'Creating Imagemaps';
>> have
>> > you read that?
>>
>> I agreed, but I couldnt the exact information I needed :( (may be I
>> missed
>> something important).
>>
>> >  Also, for changing the scale, that should happen
>> > automatically--have you tried yet?
>>
>> I tried by changing the Lincoln's program eg: blast3.pl
>> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>> to my
>> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>>
>> But it had given me a smaller scale of length upto 300. I was looking
>> for
>> an option where I need same width and height of given image and a
>> dynamic
>> start and end values depending on length of my sequence. Since I couldnt
>> accomplish, I thought of getting some help from you guys. I think I need
>> to play a little bit with the value for reformat the scale to accomodate
>> my hits as well.
>>
>> Thanks a lot for your inputs,
>> --
>> Shameer Khadar


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From bix at sendu.me.uk  Tue May 15 04:23:52 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 15 May 2007 09:23:52 +0100
Subject: [Bioperl-l] New Blast parser
Message-ID: <46496E18.1000809@sendu.me.uk>

Back in August of last year I introduced Bio::PullParserI, a module that 
aids in the creation of fast SearchIO and Search modules. I've finally 
gotten around to implementing a Blast parser using the interface, which 
I've called Bio::SearchIO::blast_pull.

To use it you say:

my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");

or in the near future (when I've committed StandAloneBlast changes):

my $sab = Bio::Tools::Run::StandAloneBlast->new(-_READMETHOD => 
"blast_pull");


Currently the parser is incomplete: I've only tested it with NCBI BLASTN 
and BLASTP. However, results are promising. In one particular real-world 
usage-case involving running and parsing multiple Blast jobs via 
StandAloneBlast (amongst other things), changing only the _READMETHOD 
from 'blast' to 'blast_pull' in the code dropped run time from 20223s to 
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40% 
less).

Please try it out and feed-back any bugs you discover.


Cheers,
Sendu.


From aaron.j.mackey at gsk.com  Tue May 15 10:30:13 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 15 May 2007 10:30:13 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <OFFA3F7652.5382601A-ON852572DC.004F5B2A-852572DC.004FAF72@gsk.com>

Or, use a zero-width, positive look ahead assertion, and don't incur the 
penalty of either $` or $&:

  if ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
  }

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/14/2007 09:46:55 PM:

> On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > I do this in perl with the pos() function.  This requires the use of 
the
> > match operator (m) like
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >         $start = pos($gene) - length($pattern) + 1;
> > }
> >
> > pos() returns the location of the pointer where the regex left off 
after
> > finding a match.
> 
> Cool. I hadn't known that was possible.
> 
> > I remove the length of my pattern (which is just a
> > string with a few placeholder (.) wildcards, so I know how long the
> > match will always be).
> 
> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
> 
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
> 
> Steve
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > > Jason Stajich
> > > Sent: Monday, May 14, 2007 12:06 PM
> > > To: Thiago Venancio
> > > Cc: bioperl-l list
> > > Subject: Re: [Bioperl-l] get regions
> > >
> > > I assume you are doing the matches on the string with =~ so
> > > Bio::Seq doesn't really help you here I don't think.
> > > See the $` variable in Perl for how to capture the position
> > > of where a regexp matches.
> > >
> > > -jason
> > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> > >
> > > > Hi all,
> > > >
> > > > Using Bio::Seq, is there any easy way to get the
> > > coordinates where a
> > > > regular expression matches or should I build a sliding window?
> > > >
> > > > For example, looking for a given promoter region in a FASTA
> > > file. If
> > > > the region is found, I would like to recover exactly the
> > > coordinates
> > > > where it matches.
> > > >
> > > > Thanks in advance.
> > > >
> > > > Thiago
> > > > --
> > > > "Doubt is not a pleasant condition, but certainty is absurd."
> > > >             Voltaire
> > > >
> > > > ========================
> > > > Thiago Motta Venancio, MSc
> > > > PhD student in Bioinformatics
> > > > University of Sao Paulo
> > > > ========================
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > jason at bioperl.org
> > > http://jason.open-bio.org/
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From diogoat at gmail.com  Tue May 15 18:44:59 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 19:44:59 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
Message-ID: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>

Dear All,

I need to download a lot of sequence of Leishmania major in genbank
format...
But i can't download on the page of NCBI, because the downloaded file are
corrupted... when i use a browser to download this sequences
And them i looking for some script to download that`s file and fink
something like that:


#########################################################
use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>teste6.gb');
$out->write_seq($seqio);
#########################################################

And the system return me this erros
[diogo1 at genome perl]$ perl teste6.pl

-------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.

Any Ideia?

Thank`s

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http:biowebdb.org <http://www.ncbs.res.in/>


From diogoat at gmail.com  Tue May 15 19:27:05 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 20:27:05 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
Message-ID: <638512560705151627t2e25f17cg7f820f3097a67748@mail.gmail.com>

Thank for your help Barry!!

It`s work very fine and i`'m using the script... like you said...
The error was on the print that`s right?
I need to use a while to print all sequeces...

Thanks a Lot

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

2007/5/15, Barry Moore <barry.moore at genetics.utah.edu>:
>
> Diogo-
>
> write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO
> object.  Try this
>
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                  (-query   =>'Leishmania major
> [Organism]',
>                                   -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                            -file => '>>teste6.gb');
> while (my $seq = $seqio->next_seq) {
>          $out->write_seq($seq);
> }
>
> Barry
>
> On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:
>
> > Dear All,
> >
> > I need to download a lot of sequence of Leishmania major in genbank
> > format...
> > But i can't download on the page of NCBI, because the downloaded
> > file are
> > corrupted... when i use a browser to download this sequences
> > And them i looking for some script to download that`s file and fink
> > something like that:
> >
> >
> > #########################################################
> > use strict;
> > use warnings;
> >
> > use Bio::Seq;
> > use Bio::SeqIO;
> > use Bio::DB::GenBank;
> >
> > my $query = Bio::DB::Query::GenBank->new
> >                                 (-query   =>'Leishmania major
> > [Organism]',
> >                                 -db      => 'nucleotide');
> > my $gb = new Bio::DB::GenBank;
> > my $seqio = $gb->get_Stream_by_query($query);
> >
> > my $out = Bio::SeqIO->new(-format => 'genbank',
> >                           -file => '>>teste6.gb');
> > $out->write_seq($seqio);
> > #########################################################
> >
> > And the system return me this erros
> > [diogo1 at genome perl]$ perl teste6.pl
> >
> > -------------------- WARNING ---------------------
> > MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant
> > module.
> > Attempting to dump, but may fail!
> > ---------------------------------------------------
> > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
> >
> > Any Ideia?
> >
> > Thank`s
> >
> > Diogo Tschoeke
> > Laboratory of Molecular Biology of Trypanosomatides
> > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> > http://biowebdb.org
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From barry.moore at genetics.utah.edu  Tue May 15 19:17:39 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue, 15 May 2007 17:17:39 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>

Diogo-

write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO  
object.  Try this

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                 (-query   =>'Leishmania major  
[Organism]',
                                  -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                           -file => '>>teste6.gb');
while (my $seq = $seqio->next_seq) {
         $out->write_seq($seq);
}

Barry

On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded  
> file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
> module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 15 22:44:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 May 2007 21:44:43 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>


On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
...

> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
>
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
>
> Steve

Right, but $& (as well as $` and $') inflict a significant penalty  
for their use, as Aaron alludes to.  Their use, even indirectly via a  
library module, can cause a significant performance hit.

chris


From sac at bioperl.org  Wed May 16 04:16:38 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 01:16:38 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
Message-ID: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>

On 5/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
> ...
>
> > To generalize your code so that it will work for any pattern, such as
> > one that can match strings of variable length like "A{5,10}", just
> > subtract the length of the actual string that was matched:
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >     $start = pos($gene) - length($&) + 1;
> >  }
> >
> > Steve
>
> Right, but $& (as well as $` and $') inflict a significant penalty
> for their use, as Aaron alludes to.  Their use, even indirectly via a
> library module, can cause a significant performance hit.
>
> chris

Yes. I had forgotten how poisonous $&, $` and $' were to regex
performance. Please forgive me. We might consider regularly auditing
the bioperl module tree for use of these in committed code.

But regarding the use of the look ahead assertion, there's a problem
if you want to find *all* occurrences of the pattern in a target
string and the pattern can have variable length hits: it may report
overlapping hits because it only collects the starting points of the
match, and does not determine how long each match would be. For
example:

$gene = 'TTTAAAAAAAAGG';
$pattern="A{5,10}";
while ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
    print ++$hit, " hit starts at $start\n";
}

Generates:
1 hit starts at 4
2 hit starts at 5
3 hit starts at 6
4 hit starts at 7

You could get around this by imposing a constraint to avoid trivial
overlaps. OK if you know the length of the pattern, but not so good
for more complex patterns. If there was I way to get the look ahead to
match the longest string possible for a variable length pattern, then
this approach could work, but I'm not sure if that is possible.

Here's a solution I think does the job of reporting the extent of each
match without a performance hit and works for patterns of any
complexity, taking advantage of the special arrays containing hit
indexes, @- and @+:

$gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
while ($gene =~ m/$pattern/gi){
    $hit++;
    printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
}

Generates:
1 hit at:  4 - 11
2 hit at: 16 - 21

You can also use this approach to report the locations of any internal
back references, if the pattern contains any parentheses, via $-[1],
$+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
patterns, but patterns not containing parens won't be penalized.

Steve


From georg.otto at tuebingen.mpg.de  Wed May 16 05:19:06 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 16 May 2007 11:19:06 +0200
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <m17ir9m8hh.fsf@tuebingen.mpg.de>


Dear all,

I have a problem that has to do with downloading data from GenBank as
well, therefor I put it in this thread.

I try to get all entries from organism Danio rerio using the something
like this:


use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $query = "Danio rerio[ORGN]";
my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
					       -query => $query);
my $gb_obj = Bio::DB::GenBank->new;
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);


while (my $seq_obj = $stream_obj->next_seq) {
  my $out = Bio::SeqIO->new(-format => 'fasta',
			    -file => '>>output.fas');
  $out->write_seq($seq_obj);
}


However, the download process aborts after a few thousand entries. I
do not think that this is due to the request itself or problems with
specific entries, since the number of transferred sequences varies
before the stop. It might rather have to do with GenBank terminating
the connection.

Has anybody a suggestion of a better strategy to achieve what I want
(e.g. a different kind of query, a method to reassume the download at
the point where it terminated etc.)?

Best,

Georg


"Diogo Tschoeke" <diogoat at gmail.com> writes:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>


From cjfields at uiuc.edu  Wed May 16 09:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 08:05:59 -0500
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <m17ir9m8hh.fsf@tuebingen.mpg.de>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
Message-ID: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>

It's likely from a timeout issue on the remote server.  One thing  
which will speed things up is to retrieve the remote sequences in  
fasta format to begin with (described in the Bio::DB::GenBank POD):

my $gb_obj = Bio::DB::GenBank->new(-retrievaltype => 'tempfile' ,
			                      -format => 'fasta');
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);

while (my $seq_obj = $stream_obj->next_seq) {
   $out->write_seq($seq_obj);
}

I also suggest using the direct ftp downloads if at all possible  
(i.e. you are downloading WGS or contig sequences).  It's much faster.

chris

On May 16, 2007, at 4:19 AM, Georg Otto wrote:

>
> Dear all,
>
> I have a problem that has to do with downloading data from GenBank as
> well, therefor I put it in this thread.
>
> I try to get all entries from organism Danio rerio using the something
> like this:
>
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $query = "Danio rerio[ORGN]";
> my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
> 					       -query => $query);
> my $gb_obj = Bio::DB::GenBank->new;
> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
>
> while (my $seq_obj = $stream_obj->next_seq) {
>   my $out = Bio::SeqIO->new(-format => 'fasta',
> 			    -file => '>>output.fas');
>   $out->write_seq($seq_obj);
> }
>
>
> However, the download process aborts after a few thousand entries. I
> do not think that this is due to the request itself or problems with
> specific entries, since the number of transferred sequences varies
> before the stop. It might rather have to do with GenBank terminating
> the connection.
>
> Has anybody a suggestion of a better strategy to achieve what I want
> (e.g. a different kind of query, a method to reassume the download at
> the point where it terminated etc.)?
>
> Best,
>
> Georg
>
>
> "Diogo Tschoeke" <diogoat at gmail.com> writes:
>
>> Dear All,
>>
>> I need to download a lot of sequence of Leishmania major in genbank
>> format...
>> But i can't download on the page of NCBI, because the downloaded  
>> file are
>> corrupted... when i use a browser to download this sequences
>> And them i looking for some script to download that`s file and fink
>> something like that:
>>
>>
>> #########################################################
>> use strict;
>> use warnings;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> my $query = Bio::DB::Query::GenBank->new
>>                                 (-query   =>'Leishmania major  
>> [Organism]',
>>                                 -db      => 'nucleotide');
>> my $gb = new Bio::DB::GenBank;
>> my $seqio = $gb->get_Stream_by_query($query);
>>
>> my $out = Bio::SeqIO->new(-format => 'genbank',
>>                           -file => '>>teste6.gb');
>> $out->write_seq($seqio);
>> #########################################################
>>
>> And the system return me this erros
>> [diogo1 at genome perl]$ perl teste6.pl
>>
>> -------------------- WARNING ---------------------
>> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
>> module.
>> Attempting to dump, but may fail!
>> ---------------------------------------------------
>> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
>> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>>
>> Any Ideia?
>>
>> Thank`s
>>
>> Diogo Tschoeke
>> Laboratory of Molecular Biology of Trypanosomatides
>> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
>> http:biowebdb.org <http://www.ncbs.res.in/>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ferraria at gmail.com  Wed May 16 10:38:47 2007
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 16 May 2007 16:38:47 +0200
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
Message-ID: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>

Hi all,

I want to do something relatively simple and I want to know how far Bioperl
tools could help me because I'm having troubles to get to the point.
Here is the pipeline :

"EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
"GeneStructure"

(*) :
>From the EntrezGene ID, I want to retrieve the structure of the gene which
means having the whole genomic sequence and having the start and end
positions of each exons, introns, UTR'....

I thought of 2 ways to accomplish that :

  -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
desired positions.
     this method should work but would take a little time to be ok.

  -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
obtain a Bio::Seq object but I am not able to find any features stored in
it. So it doesn't seem that the get_Seq_by_id function get all information
contained in a EntrezGene entry (?) .

Can somebody help me to make the right choice or show me the right way?

I also saw that some packages detinated to deal with  gene structure exist
but I don't manage to know how to use it properly and even how to create one
of those objects !
Are those packages currently usable ?


Thanks in advance.
Best regards,
tony


From cjfields at uiuc.edu  Wed May 16 12:02:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 11:02:28 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
	<8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
Message-ID: <9C6F4829-4E06-4751-8B10-B2726B5288B9@uiuc.edu>


On May 16, 2007, at 3:16 AM, Steve Chervitz wrote:
...

>>
>> Right, but $& (as well as $` and $') inflict a significant penalty
>> for their use, as Aaron alludes to.  Their use, even indirectly via a
>> library module, can cause a significant performance hit.
>>
>> chris
>
> Yes. I had forgotten how poisonous $&, $` and $' were to regex
> performance. Please forgive me. We might consider regularly auditing
> the bioperl module tree for use of these in committed code.

Already done!  We have run a few audits for gotchas like that:

http://www.bioperl.org/wiki/Auditing

http://www.bioperl.org/wiki/Bioperl_Best_Practices

If there is anything we should be looking for please feel free to add  
as needed.  There shouldn't be any use of the 'naughty' variables in  
CVS, but it might be worth a second look...

> But regarding the use of the look ahead assertion, there's a problem
> if you want to find *all* occurrences of the pattern in a target
> string and the pattern can have variable length hits: it may report
> overlapping hits because it only collects the starting points of the
> match, and does not determine how long each match would be. For
> example:
>
> $gene = 'TTTAAAAAAAAGG';
> $pattern="A{5,10}";
> while ($gene =~ m/(?=$pattern)/gi) {
>     $start = pos($gene) + 1;
>     print ++$hit, " hit starts at $start\n";
> }
>
> Generates:
> 1 hit starts at 4
> 2 hit starts at 5
> 3 hit starts at 6
> 4 hit starts at 7
>
> You could get around this by imposing a constraint to avoid trivial
> overlaps. OK if you know the length of the pattern, but not so good
> for more complex patterns. If there was I way to get the look ahead to
> match the longest string possible for a variable length pattern, then
> this approach could work, but I'm not sure if that is possible.
>
> Here's a solution I think does the job of reporting the extent of each
> match without a performance hit and works for patterns of any
> complexity, taking advantage of the special arrays containing hit
> indexes, @- and @+:
>
> $gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
> while ($gene =~ m/$pattern/gi){
>     $hit++;
>     printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
> }
>
> Generates:
> 1 hit at:  4 - 11
> 2 hit at: 16 - 21
>
> You can also use this approach to report the locations of any internal
> back references, if the pattern contains any parentheses, via $-[1],
> $+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
> patterns, but patterns not containing parens won't be penalized.
>
> Steve

Friedl's Regex book has outlined a few ways to get around the  
'naughty' variables $`, $&, and $' using substr() and $-[0], $+[0],  
or both, which makes sense since @+ and @- are arrays of positions  
instead of actual text.

$`  substr(target, 0, $-[0])
$&  substr(target, $-[0], $+[0] - $-[0])
$'  substr(target, $+[0])

Wonderful book!

chris


From benoit at ebi.ac.uk  Wed May 16 12:35:39 2007
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 16 May 2007 17:35:39 +0100
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
Message-ID: <464B32DB.6080607@ebi.ac.uk>

Hi Tony,

I don't know how simple it is in bioperl, but it is quite simple using 
the ensembl perl API.

Have a look here :

API instalation:
http://www.ensembl.org/info/software/api_installation.html
API tutorial :
http://www.ensembl.org/info/software/core/core_tutorial.html
API Perl module Documentation :
http://www.ensembl.org/info/software/Pdoc/ensembl/index.html

so you can do something similar to the example below :

# Get the 'COG6' gene from human

my $gene = $gene_adaptor->fetch_by_display_label('COG6');

print "GENE ", $gene->stable_id(), "\n";
# here you get gene coordinate

foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
     print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
     #print transcript coordinates
	
	foreach my $exon ( @{ $transcript->get_all_exons() } ) {
	#print the exon coordinates

	}
     }
}

Hope this helps

Benoit


Anthony Ferrari wrote:
 > Hi all,
 >
 > I want to do something relatively simple and I want to know how far 
Bioperl
 > tools could help me because I'm having troubles to get to the point.
 > Here is the pipeline :
 >
 > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
 > "GeneStructure"
 >
 > (*) :
 >>From the EntrezGene ID, I want to retrieve the structure of the gene 
which
 > means having the whole genomic sequence and having the start and end
 > positions of each exons, introns, UTR'....
 >
 > I thought of 2 ways to accomplish that :
 >
 >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
 > desired positions.
 >      this method should work but would take a little time to be ok.
 >
 >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
 > obtain a Bio::Seq object but I am not able to find any features stored in
 > it. So it doesn't seem that the get_Seq_by_id function get all 
information
 > contained in a EntrezGene entry (?) .
 >
 > Can somebody help me to make the right choice or show me the right way?
 >
 > I also saw that some packages detinated to deal with  gene structure 
exist
 > but I don't manage to know how to use it properly and even how to 
create one
 > of those objects !
 > Are those packages currently usable ?
 >
 >
 > Thanks in advance.
 > Best regards,
 > tony
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnsonm at gmail.com  Wed May 16 15:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 16 May 2007 14:11:18 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
Message-ID: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>

On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I believe all seqfeature location coordinates are designed to have
> start < stop for consistency; in cases where the strand matters (CDS,
> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> the two are reversed and the strand is flipped; at least that's the
> way locations are set up in BioPerl.
>
> chris

    Oh yeah?  I always tend to ensure that (start < stop), regardless
of strand, when working with sequence features...the other day, I
caught Glimmer2 emitting a prediction on the plus strand with start >
stop.  I was going to work up a patch for the parser, but I wonder,
should I just force everything to start < stop?  Or only predictions
on the plus strand?  Should all the parsers for all the ab initio
predictors ensure they emit features with coordinates like this?


From diogoat at gmail.com  Wed May 16 16:02:44 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Wed, 16 May 2007 17:02:44 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
Message-ID: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>

Dear all,

The script wich i wrote with your helps is working very good ( I paste the
script in the end of e-mail).
But I have another problem now, all the times wich I use the script im every
all the file have a diferent size...
Any ideia? what is the problem..? My conection? Problem on Ncbi? The script
maybe?

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

#############################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Trypanosoma cruzi [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);
my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>Trypanosoma_cruzi1.gb');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
#########################################################


From barry.moore at genetics.utah.edu  Wed May 16 17:13:27 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 16 May 2007 15:13:27 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
Message-ID: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>

Diogo,

I'd guess that this is a result of NCBI terminating the connection as  
Chris suggested previously.  There are a number of approaches you  
could use:  Download only fasta if that's all you need.  Download  
only IDs, and then use SeqHound, Batch Entrez or BioPerl to download  
those sequences or you could download the genbank files from the ftp  
site as Chris also suggested, and then run a bioperl script on each  
of those files.  I can see that you are looking at Trypanosomes, so  
doing this (on linux or  Mac OSX):

wget ftp://ftp.ncbi.nih.gov/genbank/gbinv*.seq.gz

will get you the 10 files in the invertebrate division from GenBank,  
and you could run a bioperl script  on those 10 files.

Barry

On May 16, 2007, at 2:02 PM, Diogo Tschoeke wrote:

> Dear all,
>
> The script wich i wrote with your helps is working very good ( I  
> paste the
> script in the end of e-mail).
> But I have another problem now, all the times wich I use the script  
> im every
> all the file have a diferent size...
> Any ideia? what is the problem..? My conection? Problem on Ncbi?  
> The script
> maybe?
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org
>
> #############################################################
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Trypanosoma cruzi  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>Trypanosoma_cruzi1.gb');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> #########################################################
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sac at bioperl.org  Wed May 16 18:29:16 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 15:29:16 -0700
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <464B32DB.6080607@ebi.ac.uk>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
	<464B32DB.6080607@ebi.ac.uk>
Message-ID: <8f200b4c0705161529h26e7c44fk54082a1156201861@mail.gmail.com>

Another option is to use DAS ( http://biodas.org ), which was designed
precisely to solve this sort of problem.

A DAS genome query is a URL that specifies the genome assembly version
on which the returned coordinates should be based. For example, get
all features and their coordinates associated with the human actin
gene on hg17:

http://das.biopackages.net/das/genome/human/17/feature?name=ACTA1

Ensembl, UCSC, and  other sites also provide DAS servers for genomic
features, but these serve up a different XML response format (DAS/1.x)
from what biopackages.net is serving (DAS/2). Here's are some links to
these servers, both DAS/1 and DAS/2:

http://www.biodas.org/wiki/DAS/1#Servers
http://www.biodas.org/wiki/DAS/2#Servers

By default, a DAS/2 server will return data in DAS2XML format, but you
can specify alternative formats if a server supports them. This is one
advantage of the DAS/2 retrieval spec, which is stable and is
described here:

http://biodas.org/documents/das2/das2_get.html

You may not be able to user an Entrez gene ID directly in the query.
It depends on whether these IDs are available on the given server.
Accessions and gene names should be OK. You can always map your Entrez
ids to accessions or gene names using this file
ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz .

Steve

On 5/16/07, Benoit Ballester <benoit at ebi.ac.uk> wrote:
> Hi Tony,
>
> I don't know how simple it is in bioperl, but it is quite simple using
> the ensembl perl API.
>
> Have a look here :
>
> API instalation:
> http://www.ensembl.org/info/software/api_installation.html
> API tutorial :
> http://www.ensembl.org/info/software/core/core_tutorial.html
> API Perl module Documentation :
> http://www.ensembl.org/info/software/Pdoc/ensembl/index.html
>
> so you can do something similar to the example below :
>
> # Get the 'COG6' gene from human
>
> my $gene = $gene_adaptor->fetch_by_display_label('COG6');
>
> print "GENE ", $gene->stable_id(), "\n";
> # here you get gene coordinate
>
> foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
>      print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
>      #print transcript coordinates
>
>         foreach my $exon ( @{ $transcript->get_all_exons() } ) {
>         #print the exon coordinates
>
>         }
>      }
> }
>
> Hope this helps
>
> Benoit
>
>
> Anthony Ferrari wrote:
>  > Hi all,
>  >
>  > I want to do something relatively simple and I want to know how far
> Bioperl
>  > tools could help me because I'm having troubles to get to the point.
>  > Here is the pipeline :
>  >
>  > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
>  > "GeneStructure"
>  >
>  > (*) :
>  >>From the EntrezGene ID, I want to retrieve the structure of the gene
> which
>  > means having the whole genomic sequence and having the start and end
>  > positions of each exons, introns, UTR'....
>  >
>  > I thought of 2 ways to accomplish that :
>  >
>  >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
>  > desired positions.
>  >      this method should work but would take a little time to be ok.
>  >
>  >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
>  > obtain a Bio::Seq object but I am not able to find any features stored in
>  > it. So it doesn't seem that the get_Seq_by_id function get all
> information
>  > contained in a EntrezGene entry (?) .
>  >
>  > Can somebody help me to make the right choice or show me the right way?
>  >
>  > I also saw that some packages detinated to deal with  gene structure
> exist
>  > but I don't manage to know how to use it properly and even how to
> create one
>  > of those objects !
>  > Are those packages currently usable ?
>  >
>  >
>  > Thanks in advance.
>  > Best regards,
>  > tony
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From heikki at sanbi.ac.za  Thu May 17 02:46:44 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 17 May 2007 08:46:44 +0200
Subject: [Bioperl-l] Writing OBO fiies
Message-ID: <200705170846.44641.heikki@sanbi.ac.za>


I've started putting together Bio::OntologyIO::obo::write_ontology().
The current parser ignores a number of fields in common obo files.
If anyone knows any issues regarding adding more information into obo ontology 
object, shout now.

I need to start parsing at least "xref_analog" and "subset" to get a 
reasonable roundtrip of obo files representing cell ontology and sequence 
ontology.

I am not aiming at extending the existing ontology interfaces but simply 
patching obo parsing, but I am open to suggestions.

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bernd.web at gmail.com  Thu May 17 06:48:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 17 May 2007 12:48:07 +0200
Subject: [Bioperl-l] (Simple)Align
Message-ID: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>

Hi,

I am playing with alignment and would like to insert strings at
certain columns (so in all sequences in the alignment). I know about
the slice and remove_columns.
Is there already an insert_columns type of functionality?
Otherwise I'll just iterate over the sequences similar to
remove_columns (and give it a try to implement add_columns like
remove_columns).


Regards
Bernd


From Kevin.M.Brown at asu.edu  Thu May 17 11:17:04 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 17 May 2007 08:17:04 -0700
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B403284273@EX02.asurite.ad.asu.edu>

> I am playing with alignment and would like to insert strings 
> at certain columns (so in all sequences in the alignment). I 
> know about the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to 
> remove_columns (and give it a try to implement add_columns 
> like remove_columns).

Try reading the deobfuscator to see all the methods available to the
simplealign object.
http://bioperl.org/cgi-bin/deob_interface.cgi


From diogoat at gmail.com  Thu May 17 14:14:14 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Thu, 17 May 2007 15:14:14 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
Message-ID: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>

Hi Barry thank's for all your help,

I choose download the Invertebrates division of NCBI to machine...
but the I don't have thus script to get the sequences of the local file and
I know how to write...
i tried choose change in the script
the -db => 'nucleotide' for -db => 'local-gbdi.gb'
like I wrote below

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major',
                                -db     => '>local-gbdi.gb );
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

but didn't work because de Bio:DB::Query::GenBank is a perl module wich
conect at Ncbi to do my query and my Database is now local.

 I need the genomes of Trypanosoma cruzi, Trypanosoma brucei, Leishmania
major, Entamoeba and Plasmodium falciparum in the genbank format file.
Any Sugestion? Somebody have this script?
Help!
And thank's for the help!

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org


From barry.moore at genetics.utah.edu  Thu May 17 14:19:46 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 17 May 2007 12:19:46 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
	<638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
Message-ID: <F5104D8D-030D-4F01-884C-623B5F2D63CC@genetics.utah.edu>

Diogo-

Look at the bioperl documentation - there you will find a HowTo on  
SeqIO.  This will help you learn how to write scripts to load genbank  
flat files and you can then iterate over those files and check the  
organism to see if it's one that you want.  You should be able to  
find everything that you need in the documentation.

B

On May 17, 2007, at 12:14 PM, Diogo Tschoeke wrote:

> Hi Barry thank's for all your help,
>
> I choose download the Invertebrates division of NCBI to machine...
> but the I don't have thus script to get the sequences of the local  
> file and I know how to write...
> i tried choose change in the script
> the -db => 'nucleotide' for -db => 'local-gbdi.gb'
> like I wrote below
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major',
>                                 -db     => '>local-gbdi.gb );
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> but didn't work because de Bio:DB::Query::GenBank is a perl module  
> wich conect at Ncbi to do my query and my Database is now local.
>
>  I need the genomes of Trypanosoma cruzi, Trypanosoma brucei,  
> Leishmania major, Entamoeba and Plasmodium falciparum in the  
> genbank format file.
> Any Sugestion? Somebody have this script?
> Help!
> And thank's for the help!
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org


From torsten.seemann at infotech.monash.edu.au  Fri May 18 04:13:38 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 18 May 2007 18:13:38 +1000
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <46496E18.1000809@sendu.me.uk>
References: <46496E18.1000809@sendu.me.uk>
Message-ID: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>

Sendu,

> Back in August of last year I introduced Bio::PullParserI, a module that
> aids in the creation of fast SearchIO and Search modules. I've finally
> gotten around to implementing a Blast parser using the interface, which
> I've called Bio::SearchIO::blast_pull.
> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");
> Please try it out and feed-back any bugs you discover.

This is very cool!
Here's hoping NCBI don't change the default output format too much.

You should be able to add "rpsblast -p T" support as this is identical
to "blastall -p blastp" except for first line:
BLASTP 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]

The only problem is the (rarely used) "rpsblast -p F" mode which
looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
"Frame"

 Score = 29.6 bits (65), Expect = 0.26
 Identities = 10/26 (38%), Positives = 12/26 (46%)
 Frame = -1

BUT has the same header line, so you can't know -p F was used until
you see a "Frame = ??" in a hit (what were they thinking???).

TBLASTN 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...

Thanks for the good work. Shame I converted most of our systems to blastxml :-(

--Torsten


From cjfields at uiuc.edu  Fri May 18 09:39:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 08:39:05 -0500
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
References: <46496E18.1000809@sendu.me.uk>
	<a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
Message-ID: <2219EED8-F721-4586-B029-EF6CD9C32246@uiuc.edu>

I'll be looking at cleaning up SearchIO::blastxml soon myself.  It  
needs to be more memory-friendly with large XML files and PSI-BLAST  
iterations need to be addressed (nope, I haven't forgot about that!).

There is a XML::LibXML pull parser interface (XML::LibXML::Reader) we  
could look into...

chris

On May 18, 2007, at 3:13 AM, Torsten Seemann wrote:

> Sendu,
>
>> Back in August of last year I introduced Bio::PullParserI, a  
>> module that
>> aids in the creation of fast SearchIO and Search modules. I've  
>> finally
>> gotten around to implementing a Blast parser using the interface,  
>> which
>> I've called Bio::SearchIO::blast_pull.
>> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file =>  
>> "file");
>> Please try it out and feed-back any bugs you discover.
>
> This is very cool!
> Here's hoping NCBI don't change the default output format too much.
>
> You should be able to add "rpsblast -p T" support as this is identical
> to "blastall -p blastp" except for first line:
> BLASTP 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]
>
> The only problem is the (rarely used) "rpsblast -p F" mode which
> looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
> "Frame"
>
>  Score = 29.6 bits (65), Expect = 0.26
>  Identities = 10/26 (38%), Positives = 12/26 (46%)
>  Frame = -1
>
> BUT has the same header line, so you can't know -p F was used until
> you see a "Frame = ??" in a hit (what were they thinking???).
>
> TBLASTN 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...
>
> Thanks for the good work. Shame I converted most of our systems to  
> blastxml :-(
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri May 18 10:00:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 09:00:38 -0500
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <239FDEF1-38D4-47B8-AC71-514B61BDF9E0@uiuc.edu>

Sounds great to me!  Sohel Merchant might have some ideas...

chris

On May 17, 2007, at 1:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat May 19 20:54:11 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 20:54:11 -0400
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <221DB1CF-2F4E-47D4-80A8-D8D8BD777423@gmx.net>

Sounds great to me! -hilmar

On May 17, 2007, at 2:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat May 19 21:36:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 21:36:49 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
Message-ID: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>

FYI. Is it worth thinking about implementing a remote access  
interface to the CIPRES tree inference tools, similar to what we have  
for RemoteBlast?

	-hilmar

Begin forwarded message:

From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
Date: May 16, 2007 6:48:49 AM EDT
Subject: FW: release of cipres portal for tree inference

The CIPRES Central Resource team is pleased to announce the first public
release of the CIPRES portal for Tree Inference.

The portal is based on capabilities exposed by the Cipres software
libraries, which were constructed as a Joint Effort between Mark Holder
at Florida State University and the SDSC SW engineering team led by
Terri Liebowitz.

It currently presents Parsimony (PAUP) and Likelihood (GARLI and RAxML)
tools with or without boosting from RecIDCM3 created by Usman Roshan and
co-workers. Nexus and Phylip files are currently supported.

The site is available to all, and is underwritten by the CIPRES cluster
at SDSC.

The portal is fully supported by the SDSC team, with contributions and
new features introduced by the team in collaboration with Mark Holder
and Rutger Vos. At present weekly releases are made with improvements
and new features.

You can visit the portal at the Cipres Web Site.

http://www.phylo.org/sub_sections/portal.htm

Please forward this information to anyone you feel may find the
portal useful.

On behalf of the whole CIPRES team,

Mark

Mark A. Miller, PhD
Principal Investigator, Biology
San Diego Supercomputer Center
University of California, San Diego
La Jolla, CA, 92093-0505
Tel: 858-822-0866
Fax: 858-822-3610

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat May 19 22:10:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 19 May 2007 21:10:53 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
Message-ID: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>

I think it would be worthwhile.  Would we place it in bioperl-run?

chris

On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:

> FYI. Is it worth thinking about implementing a remote access
> interface to the CIPRES tree inference tools, similar to what we have
> for RemoteBlast?
>
> 	-hilmar
>
> Begin forwarded message:
>
> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
> Date: May 16, 2007 6:48:49 AM EDT
> Subject: FW: release of cipres portal for tree inference
>
> The CIPRES Central Resource team is pleased to announce the first  
> public
> release of the CIPRES portal for Tree Inference.
>
> The portal is based on capabilities exposed by the Cipres software
> libraries, which were constructed as a Joint Effort between Mark  
> Holder
> at Florida State University and the SDSC SW engineering team led by
> Terri Liebowitz.
>
> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
> RAxML)
> tools with or without boosting from RecIDCM3 created by Usman  
> Roshan and
> co-workers. Nexus and Phylip files are currently supported.
>
> The site is available to all, and is underwritten by the CIPRES  
> cluster
> at SDSC.
>
> The portal is fully supported by the SDSC team, with contributions and
> new features introduced by the team in collaboration with Mark Holder
> and Rutger Vos. At present weekly releases are made with improvements
> and new features.
>
> You can visit the portal at the Cipres Web Site.
>
> http://www.phylo.org/sub_sections/portal.htm
>
> Please forward this information to anyone you feel may find the
> portal useful.
>
> On behalf of the whole CIPRES team,
>
> Mark
>
> Mark A. Miller, PhD
> Principal Investigator, Biology
> San Diego Supercomputer Center
> University of California, San Diego
> La Jolla, CA, 92093-0505
> Tel: 858-822-0866
> Fax: 858-822-3610
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat May 19 22:19:47 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 22:19:47 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
Message-ID: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>

I guess so. That's where RemoteBlast is too, if I'm not mistaken?

What sucks about the UI from a programming perspective is that it  
goes through multiple screens. There may be a lot of screen-scraping.

	-hilmar

On May 19, 2007, at 10:10 PM, Chris Fields wrote:

> I think it would be worthwhile.  Would we place it in bioperl-run?
>
> chris
>
> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>
>> FYI. Is it worth thinking about implementing a remote access
>> interface to the CIPRES tree inference tools, similar to what we have
>> for RemoteBlast?
>>
>> 	-hilmar
>>
>> Begin forwarded message:
>>
>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>> Date: May 16, 2007 6:48:49 AM EDT
>> Subject: FW: release of cipres portal for tree inference
>>
>> The CIPRES Central Resource team is pleased to announce the first  
>> public
>> release of the CIPRES portal for Tree Inference.
>>
>> The portal is based on capabilities exposed by the Cipres software
>> libraries, which were constructed as a Joint Effort between Mark  
>> Holder
>> at Florida State University and the SDSC SW engineering team led by
>> Terri Liebowitz.
>>
>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
>> RAxML)
>> tools with or without boosting from RecIDCM3 created by Usman  
>> Roshan and
>> co-workers. Nexus and Phylip files are currently supported.
>>
>> The site is available to all, and is underwritten by the CIPRES  
>> cluster
>> at SDSC.
>>
>> The portal is fully supported by the SDSC team, with contributions  
>> and
>> new features introduced by the team in collaboration with Mark Holder
>> and Rutger Vos. At present weekly releases are made with improvements
>> and new features.
>>
>> You can visit the portal at the Cipres Web Site.
>>
>> http://www.phylo.org/sub_sections/portal.htm
>>
>> Please forward this information to anyone you feel may find the
>> portal useful.
>>
>> On behalf of the whole CIPRES team,
>>
>> Mark
>>
>> Mark A. Miller, PhD
>> Principal Investigator, Biology
>> San Diego Supercomputer Center
>> University of California, San Diego
>> La Jolla, CA, 92093-0505
>> Tel: 858-822-0866
>> Fax: 858-822-3610
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sun May 20 01:06:53 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 19 May 2007 22:06:53 -0700
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
Message-ID: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>

technically remoteblast is in bioperl-live, but for historical/ease  
of user-install purposes (i.e. so many people want to use blast out  
of the box, we kept it in bioperl-live to not force them to install  
bioperl-run).

I think it would be great to have the interface - can we do it all  
via HTTP or will it require some installation of client software and/ 
or CORBA?

-jason
On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:

> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>
> What sucks about the UI from a programming perspective is that it
> goes through multiple screens. There may be a lot of screen-scraping.
>
> 	-hilmar
>
> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>
>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>
>> chris
>>
>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>
>>> FYI. Is it worth thinking about implementing a remote access
>>> interface to the CIPRES tree inference tools, similar to what we  
>>> have
>>> for RemoteBlast?
>>>
>>> 	-hilmar
>>>
>>> Begin forwarded message:
>>>
>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>> Date: May 16, 2007 6:48:49 AM EDT
>>> Subject: FW: release of cipres portal for tree inference
>>>
>>> The CIPRES Central Resource team is pleased to announce the first
>>> public
>>> release of the CIPRES portal for Tree Inference.
>>>
>>> The portal is based on capabilities exposed by the Cipres software
>>> libraries, which were constructed as a Joint Effort between Mark
>>> Holder
>>> at Florida State University and the SDSC SW engineering team led by
>>> Terri Liebowitz.
>>>
>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>> RAxML)
>>> tools with or without boosting from RecIDCM3 created by Usman
>>> Roshan and
>>> co-workers. Nexus and Phylip files are currently supported.
>>>
>>> The site is available to all, and is underwritten by the CIPRES
>>> cluster
>>> at SDSC.
>>>
>>> The portal is fully supported by the SDSC team, with contributions
>>> and
>>> new features introduced by the team in collaboration with Mark  
>>> Holder
>>> and Rutger Vos. At present weekly releases are made with  
>>> improvements
>>> and new features.
>>>
>>> You can visit the portal at the Cipres Web Site.
>>>
>>> http://www.phylo.org/sub_sections/portal.htm
>>>
>>> Please forward this information to anyone you feel may find the
>>> portal useful.
>>>
>>> On behalf of the whole CIPRES team,
>>>
>>> Mark
>>>
>>> Mark A. Miller, PhD
>>> Principal Investigator, Biology
>>> San Diego Supercomputer Center
>>> University of California, San Diego
>>> La Jolla, CA, 92093-0505
>>> Tel: 858-822-0866
>>> Fax: 858-822-3610
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0002.bin>

From bernd.web at gmail.com  Sun May 20 10:56:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 20 May 2007 16:56:07 +0200
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
	<C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
Message-ID: <716af09c0705200756h46bf2134x3d6841d2a98744c0@mail.gmail.com>

Hi

I have made a simple add_columns function in SimpleAlign along the
lines of remove_columns. I only need to insert characters that are the
same for all sequences:

=head2 add_columns

 Title     : add_columns
  Usage     : $aln2 = $aln->add_columns([0, 10, '.'], [12, 15])
  Function  : Creates an alignment with columns added by specifying
the columns by number and supplying the character (optional) to insert
in all sequences. Default character is gap_char.
  Returns   : Bio::SimpleAlign object
  Args      : Array ref where the referenced array contains a pair of
integers that
             that specify a column range and optionally the character to insert.
             The first column is 0.

=cut

The functionalilty could be extended:
- possibility to supply a string to insert (for all sequences)
- possibility to define the string to insert on a per sequence basis
(although this may be more transparant to do outside SimpleAlign).

After some final checks I could supply it (e.g. via bugzilla).


Regards,
Bernd


On 5/17/07, Jason Stajich <jason at bioperl.org> wrote:
> not yet - when I did this to insert intron positions I just manipulated the
> sequence strings outside of SimpleAlign, but I think it would be nice to
> have an insert function.
>
> -jason
>
> On May 17, 2007, at 3:48 AM, Bernd Web wrote:
>
> Hi,
>
> I am playing with alignment and would like to insert strings at
> certain columns (so in all sequences in the alignment). I know about
> the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to
> remove_columns (and give it a try to implement add_columns like
> remove_columns).
>
>
> Regards
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


From hlapp at gmx.net  Sun May 20 11:59:03 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 20 May 2007 11:59:03 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
Message-ID: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>

Just HTTP, no CORBA or other stuff needed client-side.

Ultimately it would of course be nice if they offered a more SOA  
compliant interface too, to obviate the screen-scraping need.  
However, if I understand the UI correctly the screen scraping is - if  
at all - only needed for walking through the steps, and for  
extracting the location of the result. The result itself is in NEXUS  
format, as a separate file.

	-hilmar

On May 20, 2007, at 1:06 AM, Jason Stajich wrote:

> technically remoteblast is in bioperl-live, but for historical/ease  
> of user-install purposes (i.e. so many people want to use blast out  
> of the box, we kept it in bioperl-live to not force them to install  
> bioperl-run).
>
> I think it would be great to have the interface - can we do it all  
> via HTTP or will it require some installation of client software  
> and/or CORBA?
>
> -jason
> On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:
>
>> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>>
>> What sucks about the UI from a programming perspective is that it
>> goes through multiple screens. There may be a lot of screen-scraping.
>>
>> 	-hilmar
>>
>> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>>
>>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>>
>>> chris
>>>
>>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>>
>>>> FYI. Is it worth thinking about implementing a remote access
>>>> interface to the CIPRES tree inference tools, similar to what we  
>>>> have
>>>> for RemoteBlast?
>>>>
>>>> 	-hilmar
>>>>
>>>> Begin forwarded message:
>>>>
>>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>>> Date: May 16, 2007 6:48:49 AM EDT
>>>> Subject: FW: release of cipres portal for tree inference
>>>>
>>>> The CIPRES Central Resource team is pleased to announce the first
>>>> public
>>>> release of the CIPRES portal for Tree Inference.
>>>>
>>>> The portal is based on capabilities exposed by the Cipres software
>>>> libraries, which were constructed as a Joint Effort between Mark
>>>> Holder
>>>> at Florida State University and the SDSC SW engineering team led by
>>>> Terri Liebowitz.
>>>>
>>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>>> RAxML)
>>>> tools with or without boosting >from RecIDCM3 created by Usman
>>>> Roshan and
>>>> co-workers. Nexus and Phylip files are currently supported.
>>>>
>>>> The site is available to all, and is underwritten by the CIPRES
>>>> cluster
>>>> at SDSC.
>>>>
>>>> The portal is fully supported by the SDSC team, with contributions
>>>> and
>>>> new features introduced by the team in collaboration with Mark  
>>>> Holder
>>>> and Rutger Vos. At present weekly releases are made with  
>>>> improvements
>>>> and new features.
>>>>
>>>> You can visit the portal at the Cipres Web Site.
>>>>
>>>> http://www.phylo.org/sub_sections/portal.htm
>>>>
>>>> Please forward this information to anyone you feel may find the
>>>> portal useful.
>>>>
>>>> On behalf of the whole CIPRES team,
>>>>
>>>> Mark
>>>>
>>>> Mark A. Miller, PhD
>>>> Principal Investigator, Biology
>>>> San Diego Supercomputer Center
>>>> University of California, San Diego
>>>> La Jolla, CA, 92093-0505
>>>> Tel: 858-822-0866
>>>> Fax: 858-822-3610
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Mon May 21 11:19:56 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 10:19:56 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
Message-ID: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>

Sounds like time to bust out WWW::Mechanize.  I didn't step through
the whole process, but the first screen/step looks okay.  Plain HTML
form with plain buttons.  Looks like the Javascript is only getting
involved for client-side sanity checking.  Should be easy to automate
(Don't look at me, I've bitten off a bit too much as it is).

On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> Just HTTP, no CORBA or other stuff needed client-side.
>
> Ultimately it would of course be nice if they offered a more SOA
> compliant interface too, to obviate the screen-scraping need.
> However, if I understand the UI correctly the screen scraping is - if
> at all - only needed for walking through the steps, and for
> extracting the location of the result. The result itself is in NEXUS
> format, as a separate file.
>
>         -hilmar


From cjfields at uiuc.edu  Mon May 21 16:11:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:11:36 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
	<ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
Message-ID: <61E0D74B-77F7-499B-A0B7-B1E5106964E6@uiuc.edu>

It would be nice to have a generalized interface (SOAP, CGI,  
anything), as Hilmar states.  I agree WWW::Mechanize is prob. the way  
to go for now.  Don't know who wants to take it up...

chris

On May 21, 2007, at 10:19 AM, Mark Johnson wrote:

> Sounds like time to bust out WWW::Mechanize.  I didn't step through
> the whole process, but the first screen/step looks okay.  Plain HTML
> form with plain buttons.  Looks like the Javascript is only getting
> involved for client-side sanity checking.  Should be easy to automate
> (Don't look at me, I've bitten off a bit too much as it is).
>
> On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> Just HTTP, no CORBA or other stuff needed client-side.
>>
>> Ultimately it would of course be nice if they offered a more SOA
>> compliant interface too, to obviate the screen-scraping need.
>> However, if I understand the UI correctly the screen scraping is - if
>> at all - only needed for walking through the steps, and for
>> extracting the location of the result. The result itself is in NEXUS
>> format, as a separate file.
>>
>>         -hilmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon May 21 16:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:35:41 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
Message-ID: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>

On May 16, 2007, at 2:11 PM, Mark Johnson wrote:

> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> I believe all seqfeature location coordinates are designed to have
>> start < stop for consistency; in cases where the strand matters (CDS,
>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>> the two are reversed and the strand is flipped; at least that's the
>> way locations are set up in BioPerl.
>>
>> chris
>
>     Oh yeah?  I always tend to ensure that (start < stop), regardless
> of strand, when working with sequence features...the other day, I
> caught Glimmer2 emitting a prediction on the plus strand with start >
> stop.  I was going to work up a patch for the parser, but I wonder,
> should I just force everything to start < stop?  Or only predictions
> on the plus strand?  Should all the parsers for all the ab initio
> predictors ensure they emit features with coordinates like this?

Odd that it would predict a start > stop on the plus strand, though  
it may be corrected in Glimmer3.  Does the same prediction show up in  
Glimmer3?

chris


From johnsonm at gmail.com  Mon May 21 16:48:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 15:48:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>

Check the test data for Glimmer2 and Glimmer3.  They both predict one
large gene, I'd guess covering most of the sequence, in frame +1.
That's probably a bogus prediction, but that's not up to the parser to
decide.  I hadn't noticed it until recently.

I sent a patch via bugzilla to swap the coordinates if start > end and
strand > 0.

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
> > On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> I believe all seqfeature location coordinates are designed to have
> >> start < stop for consistency; in cases where the strand matters (CDS,
> >> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> >> the two are reversed and the strand is flipped; at least that's the
> >> way locations are set up in BioPerl.
> >>
> >> chris
> >
> >     Oh yeah?  I always tend to ensure that (start < stop), regardless
> > of strand, when working with sequence features...the other day, I
> > caught Glimmer2 emitting a prediction on the plus strand with start >
> > stop.  I was going to work up a patch for the parser, but I wonder,
> > should I just force everything to start < stop?  Or only predictions
> > on the plus strand?  Should all the parsers for all the ab initio
> > predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris
>


From cjfields at uiuc.edu  Mon May 21 16:56:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:56:50 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <6186D928-A47E-4EED-B06A-50E25A4893CC@uiuc.edu>

On May 21, 2007, at 3:35 PM, Chris Fields wrote:

> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
>> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> I believe all seqfeature location coordinates are designed to have
>>> start < stop for consistency; in cases where the strand matters  
>>> (CDS,
>>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>>> the two are reversed and the strand is flipped; at least that's the
>>> way locations are set up in BioPerl.
>>>
>>> chris
>>
>>     Oh yeah?  I always tend to ensure that (start < stop), regardless
>> of strand, when working with sequence features...the other day, I
>> caught Glimmer2 emitting a prediction on the plus strand with start >
>> stop.  I was going to work up a patch for the parser, but I wonder,
>> should I just force everything to start < stop?  Or only predictions
>> on the plus strand?  Should all the parsers for all the ab initio
>> predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris

... and I see that it does (per your bug report).  The next thing to  
ask is how often these odd Glimmer hits occur and whether others have  
seen the same thing.  Maybe there's an explanation (bug, etc) but I  
can't immediately think of anything that makes sense unless it's  
running the reverse of the + strand as a control for some reason.

chris


From cjfields at uiuc.edu  Mon May 21 17:17:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 16:17:37 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
Message-ID: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>


On May 21, 2007, at 3:48 PM, Mark Johnson wrote:

> Check the test data for Glimmer2 and Glimmer3.  They both predict one
> large gene, I'd guess covering most of the sequence, in frame +1.
> That's probably a bogus prediction, but that's not up to the parser to
> decide.  I hadn't noticed it until recently.
>
> I sent a patch via bugzilla to swap the coordinates if start > end and
> strand > 0.

I think I know what it is.  If you mean these predictions:

Glimmer2:

    27    29263        6  [+1 L= 684 r=-1.187]

Glimmer3:

orf00001    29263        9  +1     9.60

Glimmer2/3 are predicting a gene for a circular chromosome that  
starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off  
the stop codon).  Note in Glimmer2 detailed output the end is 29946  
and the length of the sequence is 29940, so Glimmer2 artificially  
extends the end of the sequence with part of the start.

This is handled as a split location in bioperl and in most GenBank  
files; the above would be a location string like 'join 
(29263..29940,1..9)'.  If you switched the start and stop the  
location would be '9..29263' which wouldn't be correct (and would be  
a huge gene).

chris


From johnsonm at gmail.com  Mon May 21 17:21:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 16:21:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
Message-ID: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>

    That makes sense.  Is that behavior documented anywhere?  I'll
feel like less of an idiot if it's not.  8)  Either way, if you're
sure that's whats going on, I'll fix up the parser to handle that as a
split location.

> I think I know what it is.  If you mean these predictions:
>
> Glimmer2:
>
>     27    29263        6  [+1 L= 684 r=-1.187]
>
> Glimmer3:
>
> orf00001    29263        9  +1     9.60
>
> Glimmer2/3 are predicting a gene for a circular chromosome that
> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> and the length of the sequence is 29940, so Glimmer2 artificially
> extends the end of the sequence with part of the start.
>
> This is handled as a split location in bioperl and in most GenBank
> files; the above would be a location string like 'join
> (29263..29940,1..9)'.  If you switched the start and stop the
> location would be '9..29263' which wouldn't be correct (and would be
> a huge gene).
>
> chris
>


From cjfields at uiuc.edu  Mon May 21 19:13:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 18:13:24 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
Message-ID: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>

glimmer2/3 both assume the genome is circular by default (I'm  
assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to  
the Glimmer3 release notes the detail file has the information in the  
header; from the Glimmer3 data used for tests:

Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA  
Glimmer3.icm Glimmer3

Sequence file = ../BCTDNA
ICM model file = Glimmer3.icm
Excluded regions file = none
List of orfs file = none
Truncated orfs = false
Circular genome = true
...

There are options available for glimmer3 (-L, -X) that specify a  
linear sequence or allow ORFs to extend past the end of the sequence  
analyzed (the latter assumes a linear sequence).

chris

On May 21, 2007, at 4:21 PM, Mark Johnson wrote:

>     That makes sense.  Is that behavior documented anywhere?  I'll
> feel like less of an idiot if it's not.  8)  Either way, if you're
> sure that's whats going on, I'll fix up the parser to handle that as a
> split location.
>
>> I think I know what it is.  If you mean these predictions:
>>
>> Glimmer2:
>>
>>     27    29263        6  [+1 L= 684 r=-1.187]
>>
>> Glimmer3:
>>
>> orf00001    29263        9  +1     9.60
>>
>> Glimmer2/3 are predicting a gene for a circular chromosome that
>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>> and the length of the sequence is 29940, so Glimmer2 artificially
>> extends the end of the sequence with part of the start.
>>
>> This is handled as a split location in bioperl and in most GenBank
>> files; the above would be a location string like 'join
>> (29263..29940,1..9)'.  If you switched the start and stop the
>> location would be '9..29263' which wouldn't be correct (and would be
>> a huge gene).
>>
>> chris
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnsonm at gmail.com  Mon May 21 19:57:03 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 18:57:03 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>

    Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
this for a fix?  For plus strand predictions with start > end, use a
split location.  For minus strand predictions with start < end, use a
split location.  Without knowing the length of the sequence, that's
the best that can be done, I think.
    Unless there are objections, I'll go code that up.  Close that bug
out as 'requester is an idiot'.  8)

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:
>
> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3
>
> Sequence file = ../BCTDNA
> ICM model file = Glimmer3.icm
> Excluded regions file = none
> List of orfs file = none
> Truncated orfs = false
> Circular genome = true
> ...
>
> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).
>
> chris
>
> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>
> >     That makes sense.  Is that behavior documented anywhere?  I'll
> > feel like less of an idiot if it's not.  8)  Either way, if you're
> > sure that's whats going on, I'll fix up the parser to handle that as a
> > split location.
> >
> >> I think I know what it is.  If you mean these predictions:
> >>
> >> Glimmer2:
> >>
> >>     27    29263        6  [+1 L= 684 r=-1.187]
> >>
> >> Glimmer3:
> >>
> >> orf00001    29263        9  +1     9.60
> >>
> >> Glimmer2/3 are predicting a gene for a circular chromosome that
> >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> >> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> >> and the length of the sequence is 29940, so Glimmer2 artificially
> >> extends the end of the sequence with part of the start.
> >>
> >> This is handled as a split location in bioperl and in most GenBank
> >> files; the above would be a location string like 'join
> >> (29263..29940,1..9)'.  If you switched the start and stop the
> >> location would be '9..29263' which wouldn't be correct (and would be
> >> a huge gene).
> >>
> >> chris
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From torsten.seemann at infotech.monash.edu.au  Mon May 21 20:29:47 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 22 May 2007 10:29:47 +1000
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>

> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:

You beat me to the reply Chris - yes, Glimmer2/3 assume circular
chromosome by default. I had forgotten about this in earlier
discussions of the new Glimmer parsers as I normally run it in
--linear / -L mode (even if I know it is circular) because it is
easier to handle, and our sequencer/assembler team usually gets the
origin of replication right.

> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3

I did a double-take here - that's the path to my Glimmer3
installation! It took me a couple of minutes to realise that you got
it from the bioperl test data I created. D'oh! :-)

> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).

If the -L mode should produce Bio::Location::Split objects, I guess if
-X is used
it should produce Bio::Location::Fuzzy objects too...

--Torsten


From cjfields at uiuc.edu  Mon May 21 20:59:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 19:59:20 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
Message-ID: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>

You can add the necessary patch to the bug report when it's ready; no  
need to close it out.

The most complete file format to parse seems to be the details file;  
it contains the sequence length:

 >BCTDNA
Sequence length = 29940

which can be used for the split location.  As Torsten points out, use  
of -X could also potentially produce fuzzy locations.

Since the parser currently only parses predict files, you could  
optionally supply the parser with the seq length and emit a warning  
if seqfeatures requiring it are produced, such as the sporadic ones  
which wrap around.  If one were using the bioperl-run module this  
could be automated a bit by passing the seq length in to the parser  
object by adding the seq length to the constructor argument list.

chris

On May 21, 2007, at 6:57 PM, Mark Johnson wrote:

>     Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
> this for a fix?  For plus strand predictions with start > end, use a
> split location.  For minus strand predictions with start < end, use a
> split location.  Without knowing the length of the sequence, that's
> the best that can be done, I think.
>     Unless there are objections, I'll go code that up.  Close that bug
> out as 'requester is an idiot'.  8)
>
> On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>>
>> Sequence file = ../BCTDNA
>> ICM model file = Glimmer3.icm
>> Excluded regions file = none
>> List of orfs file = none
>> Truncated orfs = false
>> Circular genome = true
>> ...
>>
>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>>
>> chris
>>
>> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>>
>>>     That makes sense.  Is that behavior documented anywhere?  I'll
>>> feel like less of an idiot if it's not.  8)  Either way, if you're
>>> sure that's whats going on, I'll fix up the parser to handle that  
>>> as a
>>> split location.
>>>
>>>> I think I know what it is.  If you mean these predictions:
>>>>
>>>> Glimmer2:
>>>>
>>>>     27    29263        6  [+1 L= 684 r=-1.187]
>>>>
>>>> Glimmer3:
>>>>
>>>> orf00001    29263        9  +1     9.60
>>>>
>>>> Glimmer2/3 are predicting a gene for a circular chromosome that
>>>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>>>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>>>> and the length of the sequence is 29940, so Glimmer2 artificially
>>>> extends the end of the sequence with part of the start.
>>>>
>>>> This is handled as a split location in bioperl and in most GenBank
>>>> files; the above would be a location string like 'join
>>>> (29263..29940,1..9)'.  If you switched the start and stop the
>>>> location would be '9..29263' which wouldn't be correct (and  
>>>> would be
>>>> a huge gene).
>>>>
>>>> chris
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon May 21 21:00:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 20:00:58 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
Message-ID: <E22A8442-E00D-4732-9D80-EE61C75732B7@uiuc.edu>


On May 21, 2007, at 7:29 PM, Torsten Seemann wrote:

>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>
> You beat me to the reply Chris - yes, Glimmer2/3 assume circular
> chromosome by default. I had forgotten about this in earlier
> discussions of the new Glimmer parsers as I normally run it in
> --linear / -L mode (even if I know it is circular) because it is
> easier to handle, and our sequencer/assembler team usually gets the
> origin of replication right.
>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>
> I did a double-take here - that's the path to my Glimmer3
> installation! It took me a couple of minutes to realise that you got
> it from the bioperl test data I created. D'oh! :-)

Yep, I forgot about that!

>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>
> If the -L mode should produce Bio::Location::Split objects, I guess if
> -X is used
> it should produce Bio::Location::Fuzzy objects too...
>
> --Torsten

True, didn't think about that one.  Def. something to consider adding  
in.

chris


From johnsonm at gmail.com  Tue May 22 14:04:31 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 22 May 2007 13:04:31 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
	<A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
Message-ID: <ebf5eb170705221104s486ff488u1d8c0b87dd193861@mail.gmail.com>

Yes, Glimmer3 outputs the length of the input sequence.  I don't
believe Glimmer2 does.

> The most complete file format to parse seems to be the details file;
> it contains the sequence length:
>
>  >BCTDNA
> Sequence length = 29940

> Since the parser currently only parses predict files, you could
> optionally supply the parser with the seq length and emit a warning
> if seqfeatures requiring it are produced, such as the sporadic ones
> which wrap around.  If one were using the bioperl-run module this
> could be automated a bit by passing the seq length in to the parser
> object by adding the seq length to the constructor argument list.

I think we can spot wrap-around genes easily enough without knowing
the length of the input sequence.  Having it just means we can perform
a sanity check or two, such as making sure 'wraparound' genes are
within N bases of the end of the input sequence.  Any suggestions on a
good default value for N?

Parsing both output files for glimmer3 will be a little tricky.  The
constructor for Bio::Tools::Glimmer calls $class->SUPER::new(@args);,
which hits the constructor for Bio::Tools::AnalysisResult, which does
the same thing.  It all ends up in Bio::Root::IO::_initialize_io,
which grabs the -file arg and opens it.  So, either let, Bio::Root::IO
handle -file and have Bio::Tools::Glimmer handle, say -detail file, or
have Bio::Tools::Glimmer just implement   intialize_io() and hopefully
that will fly..


From ClarkeW at AGR.GC.CA  Tue May 22 17:10:08 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Tue, 22 May 2007 15:10:08 -0600
Subject: [Bioperl-l] TextResultWriter
Message-ID: <C278B850.1002%ClarkeW@AGR.GC.CA>

Hi, 

I am interested in becoming a bioperl developer as I have recently found a
bug in TextResultWriter. I know that I should submit the bug fixes using the
protocol outlined in the How To but I haven't been able to login to the CVS
anonymously to check it out. However, I have checked that the bug still
exists in the most recent version of the code using the web interface to the
CVS repositories. The bug is between lines 433 and 443, and deals with the
reporting of the number of letters in the database and the number of entries
in the database. My fix would be to change the existing code block:

from:

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_entries()),
        &_numwithcommas($result->database_letters()),
        $result->get_parameter('matrix') || '');

to: 

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_letters()),
        &_numwithcommas($result->database_entries()),
        $result->get_parameter('matrix') || '');

I believe that this is a simple enough modification that it does not require
any new test cases.

Cheers, Wayne


From dmessina at wustl.edu  Wed May 23 02:06:52 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 23 May 2007 01:06:52 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C278B850.1002%ClarkeW@AGR.GC.CA>
References: <C278B850.1002%ClarkeW@AGR.GC.CA>
Message-ID: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>

Hi Wayne,

I submitted the bug report on your behalf

	http://bugzilla.open-bio.org/show_bug.cgi?id=2300

and committed your patch. Thanks for reporting this, and thanks even  
more for including a patch!

Regarding your trouble checking out the repository via anonymous CVS,  
could you post the transcript of your attempt so we can get a better  
look at what's going wrong?

Dave


From ClarkeW at AGR.GC.CA  Wed May 23 10:39:17 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Wed, 23 May 2007 08:39:17 -0600
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>
Message-ID: <C279AE35.1008%ClarkeW@AGR.GC.CA>

With regards to not being able to connect, I have discovered that the reason
I cannot connect is that our firewall is blocking my access. It appears that
I am not the first person to have this problem but that the people in charge
are firm in their position to block the anonymous access port. However, if I
obtain a developer account I will be able to access the CVS.

Cheers, Wayne


On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:

> Hi Wayne,
> 
> I submitted the bug report on your behalf
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
> 
> and committed your patch. Thanks for reporting this, and thanks even
> more for including a patch!
> 
> Regarding your trouble checking out the repository via anonymous CVS,
> could you post the transcript of your attempt so we can get a better
> look at what's going wrong?
> 
> Dave
> 
> 


From cjfields at uiuc.edu  Wed May 23 12:16:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 23 May 2007 11:16:32 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C279AE35.1008%ClarkeW@AGR.GC.CA>
References: <C279AE35.1008%ClarkeW@AGR.GC.CA>
Message-ID: <7077B4AB-A3B5-4EAE-9994-0EF629D2DE2B@uiuc.edu>

You can always use the browsable CVS link to download a tarball if  
that works for you.

http://www.bioperl.org/wiki/Using_CVS
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/? 
cvsroot=bioperl

The link to download is at the bottom of the page.

chris

On May 23, 2007, at 9:39 AM, ClarkeW wrote:

> With regards to not being able to connect, I have discovered that  
> the reason
> I cannot connect is that our firewall is blocking my access. It  
> appears that
> I am not the first person to have this problem but that the people  
> in charge
> are firm in their position to block the anonymous access port.  
> However, if I
> obtain a developer account I will be able to access the CVS.
>
> Cheers, Wayne
>
>
> On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:
>
>> Hi Wayne,
>>
>> I submitted the bug report on your behalf
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
>>
>> and committed your patch. Thanks for reporting this, and thanks even
>> more for including a patch!
>>
>> Regarding your trouble checking out the repository via anonymous CVS,
>> could you post the transcript of your attempt so we can get a better
>> look at what's going wrong?
>>
>> Dave
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Xianjun.Dong at bccs.uib.no  Tue May 29 07:57:39 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 13:57:39 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
Message-ID: <465C1533.6070900@ii.uib.no>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kaks_methods.pl
Type: application/x-perl
Size: 2732 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment-0002.bin>

From avilella at gmail.com  Tue May 29 09:02:44 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:02:44 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>

codeml in PAML can give different results in cases where the optimization
reaches different local maxima depending on the different starting points of
each run (seed values). So, at least for some methods and options, this
instability is inherent to the underlying algorithm.

Even more, for some methods and options, it is even recommended in PAML
documentation to run the same data more than once, to see if the results are
the same, which would be a good indication that the model is robust given
the data.

Maybe PAML's author can give a more specific answer for your data at:

http://www.rannala.org/gsf/viewforum.php?f=1

Cheers,

    Albert.

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
>
> I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> for my sequences. Could you help me?
>
> 1.  Codeml could produce different Ka/Ks value if I run it twice. I check
> it both in command line and in Perl wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml;
>
> The input sequences are:
> >seq1
> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> >seq2
>
> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
>
> For command-line program, I used Codeml in PAML3.14, with specifications
> in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program four
> times.  The output are like below (from the output file). We could see that
> they are different from each other. they should be same or slightly
> different. Right? But they are NOT.  Weird!
>
> ----------------------------------------------------------------------------------------------------------------------------------
> t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
>
> ----------------------------------------------------------------------------------------------------------------------------------
> I found the same problem when I use the Perl Wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> similar to the one in BioPerl HOWTO).
>
> 2. Another strange thing is, if I switch to use program YN00 in the
> package of PAML, the output are stable. However, it's much different from
> Codeml. (see below)
>
> ----------------------------------------------------------------------------------------------------------------------------------
> seq. seq.     S       N        t   kappa    omega      dN +- SE
> dS +- SE
>    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> 2.1300 +- 1.2272
>
> ----------------------------------------------------------------------------------------------------------------------------------
> Why like this? Which one I should believe?
>
>
> Is there any guy who would kindly help me to run the perl script (twice to
> check whether they are different)? or help to run the codeml in command
> line?
> I don't know whether there is anyone noticed this before, or because of
> the wrong version of PAML.
>
> Regards,
>
> Xianjun
>
>
>
> Himanshu Ardawatia wrote:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
>
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::Tools::Run::Alignment::Clustalw;
>
> # for projecting alignments from protein to R/DNA space
> use Bio::Align::Utilities qw(aa_to_dna_aln);
>
> # for input of the sequence data
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>
> #my $seqdata = 'chuck.fa';
> my $seqdata = 'xianjun.fa ';
>
> my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>                            -format => 'fasta');
> my %seqs;
> my @prots;
>
> my $output;
> # process each sequence
> while( my $seq = $seqIO->next_seq ) {
>     $seqs{$seq->display_id} = $seq;
>     # translate them into protein
>     my $protein = $seq->translate();
>     my $pseq = $protein->seq();
>     if( $pseq =~ /\*/ &&
>     $pseq !~ /\*$/ ) {
>     warn("provided a cDNA sequence with a stop codon, PAML will choke!");
>     exit(0);
>     }
>     # Tcoffee can't handle '*' even if it is trailing
>     $pseq =~ s/\*//g;
>     $protein->seq($pseq);
>     push @prots, $protein;
> }
>
> if( @prots < 2 ) {
>     warn("Need at least 2 cDNA sequences to proceed");
>     exit(0);
> }
>
> open(OUT, ">align_output.txt") ||
>       die("cannot open output $output for writing");
> # Align the sequences with clustalw
>
> my $aa_aln = $aln_factory->align(\@prots);
>
> # project the protein alignment back to cDNA coordinates
> my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>
> my @each = $dna_aln->each_seq();
>
> my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>                   ( -params => { 'runmode' => -2,
>                          'seqtype' => 1,
>                  'model' => 1,
>                 }
>               );
>
> # set the alignment object
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map {
>     my $c= 1;
>     foreach my $s ( @each ) {
>     last if( $s->display_id eq $_->display_id );
>     $c++;
>     }
>     $c;
> } @otus;
>
> print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> CDNA_PERCENTID)), "\n";
> for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print OUT join("\t",
>                $otus[$i]->display_id,
>                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                $MLmatrix->[$i]->[$j]->{'dS'},
>                $MLmatrix->[$i]->[$j]->{'omega'},
>                sprintf("%.2f",$sub_aa_aln->percentage_identity),
>                sprintf("%.2f",$sub_dna_aln->percentage_identity),
>                ), "\n";
>     }
> }
>
>
> On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> >
> > Hi Xianjun,
> >
> > I recognize this script. But it was a bit cumbersom to use this as many
> > things are done in the script (like multiple alignment, aa to dna alignment
> > and ka/ks calculation) so one does not have real control on these different
> > aspect.
> > I do not remeber getting different Ka/Ks in different runs though. But I
> > remeber that one I ran the script with different versions of clustalw and it
> > REALLY gave different results !! So please make sure if the clustalw
> > versions are the same in all your runs. Best is to use the latest version.
> >
> > Finally I wrote my simple script which would generate a codeml.ctl file
> > for each set of sequences and run the codeml based on that and then more on.
> > Disadvantage of this can be that some files keep getting over-written (like
> > the one which have their names hard-coded in codeml program) and if one
> > needs those files as well then one needs to run the codeml cycles for each
> > set of sequences in different directories.
> >
> > One advantage of this kind of script is that you can use whichever
> > alignment program you want to use and so on....But then its also extra steps
> > of yourself doing multiple alignment and aa to dna alignment etc....
> >
> > Does it make sense? If you still get different outputs with same version
> > of clustalw then I can sit with you and look at things together. Or else try
> > the script method which I mentioned.
> >
> > Cheers  and Fu
> > Himanshu
> > \\
> > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > >
> > > HI, Himanshu
> > >
> > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > question
> > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > not
> > > stable(different for each runtime), and also different from the output
> > >
> > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > >
> > > Here I attached the script. Could you help to have a look and try to
> > > run
> > > the script? How is your way to calculate the Kaks ratio?
> > >
> > > Thanks
> > >
> > > --
> > > ---------------------------
> > > Sterding (Xianjun) Dong
> > > PhD student, Boris Lenhard's group
> > > Bergen Center of Computational Science
> > > Bergen University, Norway
> > > Mobile: 0047-47361688
> > > Telephone: 0047-55276381
> > > Skype: xianjun.dong
> > >
> > >
> > >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From Xianjun.Dong at bccs.uib.no  Tue May 29 09:30:09 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 15:30:09 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>	
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
Message-ID: <465C2AE1.30101@ii.uib.no>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/532a333d/attachment-0002.html>

From avilella at gmail.com  Tue May 29 09:45:28 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:45:28 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2AE1.30101@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no>
Message-ID: <358f4d650705290645s65f596cbp37715f12064a5ced@mail.gmail.com>

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  Thanks for information, Albert.
>
> But still in two questions:
> Albert Vilella wrote:
>
> codeml in PAML can give different results in cases where the optimization
> reaches different local maxima depending on the different starting points of
> each run (seed values). So, at least for some methods and options, this
> instability is inherent to the underlying algorithm.
>
> 1. How to set the initial value in order to get a reasonable estimation?
> Do you have some experience for that?
>

People usually change the initial omega in the conf. For example, 3 runs
with 0.001, 1 and 5.

Even more, for some methods and options, it is even recommended in PAML
> documentation to run the same data more than once, to see if the results are
> the same, which would be a good indication that the model is robust given
> the data.
>
> 2. Is there a recommend way to test the significance if the results are
> different? For example, in my case, dS could range from 10.1852 to 14.9961for the four runtime. If that means the model is not robust(how to check
> this?), should I change to use another model?
>

I would prefer PAML's author to answer this question :)

How could YN00 reach stable result? (Is it because YN00 does not require
> initial value for optimization?) Why could YN00 produce so different result
> from Codeml? (for YN00, dS=2.1300 with SE=1.2272; for Codeml, dS=
> 10.1852-14.9961)
>

I think Yn00 is less prone to give different local maxima than some codeml
models, but then, codeml is better in giving true positives in cases where
yn00 will give false negatives...

Maybe PAML's author can give a more specific answer for your data at:
> http://www.rannala.org/gsf/viewforum.php?f=1
>
>
> Actually I already post my question in the author's forum. Let's wait and
> see.
>

Yes, I would wait for his answers, which should be way more reliable than
mine :)

Cheers,
>
>     Albert.
>
> On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
> >
> > HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
> >
> > I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> > for my sequences. Could you help me?
> >
> > 1.  Codeml could produce different Ka/Ks value if I run it twice. I
> > check it both in command line and in Perl wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml;
> >
> > The input sequences are:
> > >seq1
> > TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> > >seq2
> >
> > TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
> >
> > For command-line program, I used Codeml in PAML3.14, with specifications
> > in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program
> > four times.  The output are like below (from the output file). We could see
> > that they are different from each other. they should be same or slightly
> > different. Right? But they are NOT.  Weird!
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> > t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> > t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> > t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > I found the same problem when I use the Perl Wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> > similar to the one in BioPerl HOWTO).
> >
> > 2. Another strange thing is, if I switch to use program YN00 in the
> > package of PAML, the output are stable. However, it's much different from
> > Codeml. (see below)
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > seq. seq.     S       N        t   kappa    omega      dN +- SE
> > dS +- SE
> >    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> > 2.1300 +- 1.2272
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > Why like this? Which one I should believe?
> >
> >
> > Is there any guy who would kindly help me to run the perl script (twice
> > to check whether they are different)? or help to run the codeml in command
> > line?
> > I don't know whether there is anyone noticed this before, or because of
> > the wrong version of PAML.
> >
> > Regards,
> >
> > Xianjun
> >
> >
> >
> > Himanshu Ardawatia wrote:
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> >
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::Tools::Run::Alignment::Clustalw;
> >
> > # for projecting alignments from protein to R/DNA space
> > use Bio::Align::Utilities qw(aa_to_dna_aln);
> >
> > # for input of the sequence data
> > use Bio::SeqIO;
> > use Bio::AlignIO;
> >
> > my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
> >
> > #my $seqdata = 'chuck.fa';
> > my $seqdata = 'xianjun.fa ';
> >
> > my $seqIO = new Bio::SeqIO(-file   => $seqdata,
> >                            -format => 'fasta');
> > my %seqs;
> > my @prots;
> >
> > my $output;
> > # process each sequence
> > while( my $seq = $seqIO->next_seq ) {
> >     $seqs{$seq->display_id} = $seq;
> >     # translate them into protein
> >     my $protein = $seq->translate();
> >     my $pseq = $protein->seq();
> >     if( $pseq =~ /\*/ &&
> >     $pseq !~ /\*$/ ) {
> >     warn("provided a cDNA sequence with a stop codon, PAML will
> > choke!");
> >     exit(0);
> >     }
> >     # Tcoffee can't handle '*' even if it is trailing
> >     $pseq =~ s/\*//g;
> >     $protein->seq($pseq);
> >     push @prots, $protein;
> > }
> >
> > if( @prots < 2 ) {
> >     warn("Need at least 2 cDNA sequences to proceed");
> >     exit(0);
> > }
> >
> > open(OUT, ">align_output.txt") ||
> >       die("cannot open output $output for writing");
> > # Align the sequences with clustalw
> >
> > my $aa_aln = $aln_factory->align(\@prots);
> >
> > # project the protein alignment back to cDNA coordinates
> > my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
> >
> > my @each = $dna_aln->each_seq();
> >
> > my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
> >                   ( -params => { 'runmode' => -2,
> >                          'seqtype' => 1,
> >                  'model' => 1,
> >                 }
> >               );
> >
> > # set the alignment object
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >     last if( $s->display_id eq $_->display_id );
> >     $c++;
> >     }
> >     $c;
> > } @otus;
> >
> > print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> > CDNA_PERCENTID)), "\n";
> > for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
> >     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print OUT join("\t",
> >                $otus[$i]->display_id,
> >                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                $MLmatrix->[$i]->[$j]->{'dS'},
> >                $MLmatrix->[$i]->[$j]->{'omega'},
> >                sprintf("%.2f",$sub_aa_aln->percentage_identity),
> >                sprintf("%.2f",$sub_dna_aln->percentage_identity),
> >                ), "\n";
> >     }
> > }
> >
> >
> > On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> > >
> > > Hi Xianjun,
> > >
> > > I recognize this script. But it was a bit cumbersom to use this as
> > > many things are done in the script (like multiple alignment, aa to dna
> > > alignment and ka/ks calculation) so one does not have real control on these
> > > different aspect.
> > > I do not remeber getting different Ka/Ks in different runs though. But
> > > I remeber that one I ran the script with different versions of clustalw and
> > > it REALLY gave different results !! So please make sure if the clustalw
> > > versions are the same in all your runs. Best is to use the latest version.
> > >
> > > Finally I wrote my simple script which would generate a codeml.ctlfile for each set of sequences and run the codeml based on that and then
> > > more on. Disadvantage of this can be that some files keep getting
> > > over-written (like the one which have their names hard-coded in codeml
> > > program) and if one needs those files as well then one needs to run the
> > > codeml cycles for each set of sequences in different directories.
> > >
> > > One advantage of this kind of script is that you can use whichever
> > > alignment program you want to use and so on....But then its also extra steps
> > > of yourself doing multiple alignment and aa to dna alignment etc....
> > >
> > > Does it make sense? If you still get different outputs with same
> > > version of clustalw then I can sit with you and look at things together. Or
> > > else try the script method which I mentioned.
> > >
> > > Cheers  and Fu
> > > Himanshu
> > > \\
> > > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > > >
> > > > HI, Himanshu
> > > >
> > > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > > question
> > > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > > not
> > > > stable(different for each runtime), and also different from the
> > > > output
> > > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > > >
> > > > Here I attached the script. Could you help to have a look and try to
> > > > run
> > > > the script? How is your way to calculate the Kaks ratio?
> > > >
> > > > Thanks
> > > >
> > > > --
> > > > ---------------------------
> > > > Sterding (Xianjun) Dong
> > > > PhD student, Boris Lenhard's group
> > > > Bergen Center of Computational Science
> > > > Bergen University, Norway
> > > > Mobile: 0047-47361688
> > > > Telephone: 0047-55276381
> > > > Skype: xianjun.dong
> > > >
> > > >
> > > >
> > >
> >
> > --
> > ---------------------------
> > Sterding (Xianjun) Dong
> > PhD student, Boris Lenhard's group
> > Bergen Center of Computational Science
> > Bergen University, Norway
> > Mobile: 0047-47361688
> > Telephone: 0047-55276381
> >
> > Skype: xianjun.dong
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>


From roy at colibase.bham.ac.uk  Tue May 29 10:05:12 2007
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Tue, 29 May 2007 15:05:12 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <465C3318.5080201@colibase.bham.ac.uk>

Hi Xianjun,

I'm not sure if it is the cause of your problem, but your sequences seem
to be quite short. This paper:
http://mbe.oxfordjournals.org/cgi/content/full/21/12/2290

suggests that the codeml method of calculating Ka and Ks may be
unreliable for sequences shorter than 300 codons.

Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.


From gbr0wn at comcast.net  Wed May 30 11:44:13 2007
From: gbr0wn at comcast.net (gbr0wn at comcast.net)
Date: Wed, 30 May 2007 15:44:13 +0000
Subject: [Bioperl-l] getting started in windows
Message-ID: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070530/2f640e16/attachment-0001.pl>

From golharam at umdnj.edu  Wed May 30 11:40:28 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 30 May 2007 11:40:28 -0400
Subject: [Bioperl-l] ClustalW Score?
Message-ID: <00c201c7a2d0$d971f550$2d01a8c0@PICO>

How do I get the clustalw score from a clustalw alignment?  I'm using the
following code to align my sequences:

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();

$seq[0] = ...
$seq[1] = ...
$seq[2] = ...
$seq[3] = ...

$aln = $aln_factory->align(\@seq);

I can get the percentage identity from the Bio::SimpleAlign object, but
there is no score.  I looked into it further and it doesn't look like the
score is being captured anywhere.  So, how does one get the score from
ClustalW using this method?

Ryan


From barry.moore at genetics.utah.edu  Wed May 30 12:21:16 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 30 May 2007 10:21:16 -0600
Subject: [Bioperl-l] getting started in windows
In-Reply-To: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
References: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
Message-ID: <CA090066-0624-4C52-8306-E783278484B0@genetics.utah.edu>

Try opening up a terminal window (I think you'll find that under  
accessories).  Change to the directory where you code is and run it  
off the command line.

B

On May 30, 2007, at 9:44 AM, gbr0wn at comcast.net wrote:

> I am a perl novice trying to run perl 5.8.8 on windows xp system.   
> I have used 'wordpad' to paste tutorial code into an executable  
> file and when I double click the icon for the file a window opens  
> up briefly with output and/or error message but closes too fast for  
> me to read.  Any idea why this might be happening?
> Thanks, Greg Brown - gbr0wn at comcast.net
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed May 30 13:16:49 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 10:16:49 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
References: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
Message-ID: <1A4207F8295607498283FE9E93B775B403349DAB@EX02.asurite.ad.asu.edu>

> How do I get the clustalw score from a clustalw alignment?  
> I'm using the following code to align my sequences:
> 
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> 
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
> 
> $aln = $aln_factory->align(\@seq);
> 
> I can get the percentage identity from the Bio::SimpleAlign 
> object, but there is no score.  I looked into it further and 
> it doesn't look like the score is being captured anywhere.  
> So, how does one get the score from ClustalW using this method?


        open(OUTCOPY, ">&STDOUT")  or die "Couldn't dup STDOUT: $!";
        open(STDOUT,  ">log.test") or die "Couldn't open log.test: $!";
        push @aln, $factory->align(\@seq);
        close STDOUT;
        open(STDOUT, ">&OUTCOPY");
        open(TEMP,   "log.test");
        while (<TEMP>)
        {

                if ($_ =~ /Score:(\d+)/)
                {
                        $aln->score($1);
                        print "Found score of $1\n";
                }
        }
        close TEMP;
        unlink("log.test");


From jason at bioperl.org  Wed May 30 14:54:20 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 May 2007 11:54:20 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
Message-ID: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>

You can do it without redirecting STDOUT or creating a new file, just  
change the system call to:

Here is the code for running in _run in the module:
    my $commandstring = $self->executable."$instring"."$param_string";
     $self->debug( "clustal command = $commandstring");
     my $status = system($commandstring);
      unless( $status == 0 ) {
           $self->warn( "Clustalw call ($commandstring) crashed: $?  
\n");
           return undef;
      }

Do something like:

my $fh;
open($fh, "$commandstring |");
my $score;
while(<$fh>) {
   $score = $1 if ($_ =~ /Score:(\d+)/);
}
close($fh);

... then at the bottom after the alignment is created do:

$aln->score($score);


There may be some more debugging b/c if you invoke the quiet => 1  
parameter there may be an automatic ">& /dev/null" appended to the  
end of the parameter string that you'll need to figure out how to  
override.

Sorry I don't have more time to help; I hope this gets you started.

-jason
On May 30, 2007, at 10:18 AM, Ryan Golhar wrote:

> Did you see Kevin's response?  That's one possible solution that  
> could be
> implemented...
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Wednesday, May 30, 2007 12:05 PM
> To: golharam at umdnj.edu
> Subject: Re: [Bioperl-l] ClustalW Score?
>
>
> Nope it isn't parsed since it is part of the STDOUT from the  
> program not the
> alignment.  If you want to add parsing of the STDOUT from Clustalw  
> someone
> will need to refactor how the program is run and capture and parse the
> STDOUT. The score can be added to the score field of the  
> SimpleAlign object,
> but again since there is no where for it to be stored in a clustalw
> alignment file it won't be round tripped anywhere. I think  
> stockholm will
> manage it for you though.
>
> Do you know what the score represents - can it be computed from the
> alignment itsself?
>
> -jason
>
> On May 30, 2007, at 8:40 AM, Ryan Golhar wrote:
>
>
> How do I get the clustalw score from a clustalw alignment?  I'm  
> using the
> following code to align my sequences:
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
>
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
>
> $aln = $aln_factory->align(\@seq);
>
> I can get the percentage identity from the Bio::SimpleAlign object,  
> but
> there is no score.  I looked into it further and it doesn't look  
> like the
> score is being captured anywhere.  So, how does one get the score from
> ClustalW using this method?
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Wed May 30 15:52:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 12:52:01 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403349E4D@EX02.asurite.ad.asu.edu>

> You can do it without redirecting STDOUT or creating a new 
> file, just change the system call to:
> 
> Here is the code for running in _run in the module:
>     my $commandstring = $self->executable."$instring"."$param_string";
>      $self->debug( "clustal command = $commandstring");
>      my $status = system($commandstring);
>       unless( $status == 0 ) {
>            $self->warn( "Clustalw call ($commandstring) crashed: $?  
> \n");
>            return undef;
>       }
> 
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/); } close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet 
> => 1 parameter there may be an automatic ">& /dev/null" 
> appended to the end of the parameter string that you'll need 
> to figure out how to override.
> 
> Sorry I don't have more time to help; I hope this gets you started.

I did it my way as I was doing it without modifying the Bioperl code (in
case I later updated to a new version and forgot about the changes I had
put into it).  So that code just sits in my perl script where it calls
the Bioperl module to create the Clustal alignment object.


From Xianjun.Dong at bccs.uib.no  Tue May 29 11:02:21 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 17:02:21 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2F8E.2070309@ed.ac.uk>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>		<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>		<465C1533.6070900@ii.uib.no>	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no> <465C2F8E.2070309@ed.ac.uk>
Message-ID: <465C407D.608@ii.uib.no>

HI, Darren

The sequences are from Human and zebrafish. I currently use two 
sequences. I just want to see what's the substitution pattern there is. 
But your comment remind me whether I should get the other species 
involved, like mouse, chicken.

BTW, what's you mean 'per codon, not per site'? Do you mean the Ds(Ks) 
of Codeml is for per codon, and Yn00 is for per site?
I think there should be a possible/reasonable way to calculate the 
synonymous substitution, even if the divergence is big enough. If the 
Codeml is not a good solution for that case, do you have better suggestion?

Thanks

Xianjun

Darren Obbard wrote:
> Out of interest, what are the species, and how much sequence are you 
> using?
>
> - Estimating Ds when it is >>1 is very hard anyway, since the 
> substitutions are saturated. i.e. Regardless of the method, there will 
> be some level of divergence for which Ds can no longer be estimated. A 
> Ds of ~14 (for PAML I think this is per codon, not per site) sounds 
> very high to me - higher than I would want to try to estimate Ds.
>
> Dong Xianjun wrote:
>> Thanks for information, Albert.
>>
>> But still in two questions:
>> Albert Vilella wrote:
>>> codeml in PAML can give different results in cases where the 
>>> optimization reaches different local maxima depending on the 
>>> different starting points of each run (seed values). So, at least 
>>> for some methods and options, this instability is inherent to the 
>>> underlying algorithm.
>> 1. How to set the initial value in order to get a reasonable 
>> estimation? Do you have some experience for that?
>>> Even more, for some methods and options, it is even recommended in 
>>> PAML documentation to run the same data more than once, to see if 
>>> the results are the same, which would be a good indication that the 
>>> model is robust given the data.
>> 2. Is there a recommend way to test the significance if the results 
>> are different? For example, in my case, dS could range from 10.1852 
>> to 14.9961 for the four runtime. If that means the model is not 
>> robust(how to check this?), should I change to use another model?
>>
>> How could YN00 reach stable result? (Is it because YN00 does not 
>> require initial value for optimization?) Why could YN00 produce so 
>> different result from Codeml? (for YN00, dS=2.1300 with SE=1.2272; 
>> for Codeml, dS=10.1852-14.9961)
>>> Maybe PAML's author can give a more specific answer for your data at:
>>> http://www.rannala.org/gsf/viewforum.php?f=1
>>
>> Actually I already post my question in the author's forum. Let's wait 
>> and see.
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> On 5/29/07, *Dong Xianjun* <Xianjun.Dong at bccs.uib.no 
>>> <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>
>>>     HI, dear all, //sorry for duplicated msg for /Jason/ and /Neil/
>>>
>>>     I'm bothering by two problems when I use PAML module to calculate
>>>     Ka/Ks for my sequences. Could you help me?
>>>
>>>     1.  Codeml could produce different Ka/Ks value if I run it twice.
>>>     I check it both in command line and in Perl wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml;
>>>
>>>     The input sequences are:
>>>     >seq1
>>>     
>>> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG 
>>>
>>>     >seq2
>>>     
>>> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG 
>>>
>>>
>>>     For command-line program, I used Codeml in PAML3.14, with
>>>     specifications in codeml.ctl (runmode = -2, seqtype = 1). I tried
>>>     to run the program four times.  The output are like below (from
>>>     the output file). We could see that they are different from each
>>>     other. they should be same or slightly different. Right? But they
>>>     are NOT.  Weird!
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522     
>>> dS=14.8339
>>>     t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507     
>>> dS=12.2349
>>>     t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510     
>>> dS=14.9961
>>>     t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505     
>>> dS=10.1852
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     I found the same problem when I use the Perl Wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script
>>>     here, similar to the one in BioPerl HOWTO).
>>>
>>>     2. Another strange thing is, if I switch to use program YN00 in
>>>     the package of PAML, the output are stable. However, it's much
>>>     different from Codeml. (see below)
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     seq. seq.     S       N        t   kappa    omega      dN +- SE  
>>>            dS +- SE
>>>        2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +-
>>>     0.0265  2.1300 +- 1.2272
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     Why like this? Which one I should believe?
>>>
>>>
>>>     Is there any guy who would kindly help me to run the perl script
>>>     (twice to check whether they are different)? or help to run the
>>>     codeml in command line?
>>>     I don't know whether there is anyone noticed this before, or
>>>     because of the wrong version of PAML.
>>>
>>>     Regards,
>>>
>>>     Xianjun
>>>
>>>
>>>
>>>     Himanshu Ardawatia wrote:
>>>>     #!/usr/bin/perl
>>>>
>>>>     use strict;
>>>>     use warnings;
>>>>
>>>>
>>>>     use Bio::Tools::Run::Phylo::PAML::Codeml;
>>>>     use Bio::Tools::Run::Alignment::Clustalw;
>>>>
>>>>     # for projecting alignments from protein to R/DNA space
>>>>     use Bio::Align::Utilities qw(aa_to_dna_aln);
>>>>
>>>>     # for input of the sequence data
>>>>     use Bio::SeqIO;
>>>>     use Bio::AlignIO;
>>>>
>>>>     my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>>>>
>>>>     #my $seqdata = 'chuck.fa';
>>>>     my $seqdata = 'xianjun.fa ';
>>>>
>>>>     my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>>>>                                -format => 'fasta');
>>>>     my %seqs;
>>>>     my @prots;
>>>>
>>>>     my $output;
>>>>     # process each sequence
>>>>     while( my $seq = $seqIO->next_seq ) {
>>>>         $seqs{$seq->display_id} = $seq;
>>>>         # translate them into protein
>>>>         my $protein = $seq->translate();
>>>>         my $pseq = $protein->seq();
>>>>         if( $pseq =~ /\*/ &&
>>>>         $pseq !~ /\*$/ ) {
>>>>         warn("provided a cDNA sequence with a stop codon, PAML will
>>>>     choke!");
>>>>         exit(0);
>>>>         }
>>>>         # Tcoffee can't handle '*' even if it is trailing
>>>>         $pseq =~ s/\*//g;
>>>>         $protein->seq($pseq);
>>>>         push @prots, $protein;
>>>>     }
>>>>
>>>>     if( @prots < 2 ) {
>>>>         warn("Need at least 2 cDNA sequences to proceed");
>>>>         exit(0);
>>>>     }
>>>>
>>>>     open(OUT, ">align_output.txt") ||
>>>>           die("cannot open output $output for writing");
>>>>     # Align the sequences with clustalw
>>>>
>>>>     my $aa_aln = $aln_factory->align(\@prots);
>>>>
>>>>     # project the protein alignment back to cDNA coordinates
>>>>     my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>>>>
>>>>     my @each = $dna_aln->each_seq();
>>>>
>>>>     my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>>>>                       ( -params => { 'runmode' => -2,
>>>>                              'seqtype' => 1,
>>>>                      'model' => 1,
>>>>                     }
>>>>                   );
>>>>
>>>>     # set the alignment object
>>>>     $kaks_factory->alignment($dna_aln);
>>>>
>>>>     # run the KaKs analysis
>>>>     my ($rc,$parser) = $kaks_factory->run();
>>>>     my $result = $parser->next_result;
>>>>     my $MLmatrix = $result->get_MLmatrix();
>>>>
>>>>     my @otus = $result->get_seqs();
>>>>     # this gives us a mapping from the PAML order of sequences back to
>>>>     # the input order (since names get truncated)
>>>>     my @pos = map {
>>>>         my $c= 1;
>>>>         foreach my $s ( @each ) {
>>>>         last if( $s->display_id eq $_->display_id );
>>>>         $c++;
>>>>         }
>>>>         $c;
>>>>     } @otus;
>>>>
>>>>     print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
>>>>     CDNA_PERCENTID)), "\n";
>>>>     for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>>>>         for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>>>>         my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         print OUT join("\t",                    $otus[$i]->display_id,
>>>>                    
>>>> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>>>>                    $MLmatrix->[$i]->[$j]->{'dS'},
>>>>                    $MLmatrix->[$i]->[$j]->{'omega'},
>>>>                    sprintf("%.2f",$sub_aa_aln->percentage_identity),
>>>>                    sprintf("%.2f",$sub_dna_aln->percentage_identity),
>>>>                    ), "\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>     On 5/29/07, *Himanshu Ardawatia* <himanshu.ardawatia at bccs.uib.no
>>>>     <mailto:himanshu.ardawatia at bccs.uib.no>> wrote:
>>>>
>>>>         Hi Xianjun,
>>>>
>>>>         I recognize this script. But it was a bit cumbersom to use
>>>>         this as many things are done in the script (like multiple
>>>>         alignment, aa to dna alignment and ka/ks calculation) so one
>>>>         does not have real control on these different aspect.
>>>>         I do not remeber getting different Ka/Ks in different runs
>>>>         though. But I remeber that one I ran the script with
>>>>         different versions of clustalw and it REALLY gave different
>>>>         results !! So please make sure if the clustalw versions are
>>>>         the same in all your runs. Best is to use the latest version.
>>>>
>>>>         Finally I wrote my simple script which would generate a
>>>>         codeml.ctl file for each set of sequences and run the codeml
>>>>         based on that and then more on. Disadvantage of this can be
>>>>         that some files keep getting over-written (like the one
>>>>         which have their names hard-coded in codeml program) and if
>>>>         one needs those files as well then one needs to run the
>>>>         codeml cycles for each set of sequences in different
>>>>         directories.
>>>>
>>>>         One advantage of this kind of script is that you can use
>>>>         whichever alignment program you want to use and so on....But
>>>>         then its also extra steps of yourself doing multiple
>>>>         alignment and aa to dna alignment etc....
>>>>
>>>>         Does it make sense? If you still get different outputs with
>>>>         same version of clustalw then I can sit with you and look at
>>>>         things together. Or else try the script method which I
>>>>         mentioned.
>>>>
>>>>         Cheers  and Fu
>>>>         Himanshu
>>>>         \\
>>>>
>>>>         On 5/28/07, *Dong Xianjun* < Xianjun.Dong at bccs.uib.no
>>>>         <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>>
>>>>             HI, Himanshu
>>>>
>>>>             I am sure you did some work in Ka/Ks calculation. Here I
>>>>             have a question
>>>>             bothering me; the output for
>>>>             Bio::Tools::Run::Phylo::PAML::Codeml is not
>>>>             stable(different for each runtime), and also different
>>>>             from the output
>>>>             with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
>>>>
>>>>             Here I attached the script. Could you help to have a
>>>>             look and try to run
>>>>             the script? How is your way to calculate the Kaks ratio?
>>>>
>>>>             Thanks
>>>>
>>>>             --
>>>>             ---------------------------
>>>>             Sterding (Xianjun) Dong
>>>>             PhD student, Boris Lenhard's group
>>>>             Bergen Center of Computational Science
>>>>             Bergen University, Norway
>>>>             Mobile: 0047-47361688
>>>>             Telephone: 0047-55276381
>>>>             Skype: xianjun.dong
>>>>
>>>>
>>>>
>>>>
>>>
>>>     --     ---------------------------
>>>     Sterding (Xianjun) Dong
>>>     PhD student, Boris Lenhard's group
>>>     Bergen Center of Computational Science
>>>     Bergen University, Norway
>>>     Mobile: 0047-47361688
>>>     Telephone: 0047-55276381
>>>
>>>     Skype: xianjun.dong
>>>        
>>>
>>>     _______________________________________________
>>>     Bioperl-l mailing list
>>>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> ---------------------------
>> Sterding (Xianjun) Dong
>> PhD student, Boris Lenhard's group
>> Bergen Center of Computational Science
>> Bergen University, Norway
>> Mobile: 0047-47361688
>> Telephone: 0047-55276381
>> Skype: xianjun.dong
>>   
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
---------------------------
Sterding (Xianjun) Dong
PhD student, Boris Lenhard's group
Bergen Center of Computational Science
Bergen University, Norway
Mobile: 0047-47361688
Telephone: 0047-55276381
Skype: xianjun.dong


From bix at sendu.me.uk  Thu May 31 04:34:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 09:34:38 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E889E.3090304@sendu.me.uk>

Jason Stajich wrote:
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/);
> }
> close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet => 1  
> parameter there may be an automatic ">& /dev/null" appended to the  
> end of the parameter string that you'll need to figure out how to  
> override.

Is there any particular reason for not having something along these 
lines committed to the module? Shall I go ahead and implement?


From bix at sendu.me.uk  Thu May 31 05:54:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 10:54:32 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E9B58.1020403@sendu.me.uk>

Jason Stajich wrote:
>    $score = $1 if ($_ =~ /Score:(\d+)/);

I see that there are lots of lines in the output that match the above 
regex, but there is also a single /Alignment Score (\d+)/ line printed 
at the end. Isn't that the score that should get stored in $aln->score()?


From jason at bioperl.org  Thu May 31 14:08:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 31 May 2007 11:08:19 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <465E9B58.1020403@sendu.me.uk>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
	<465E9B58.1020403@sendu.me.uk>
Message-ID: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>

you're right --- it is not really my code, I was just elaborating  
Kevin's example --- it would probably need to be more specific or  
perhaps the last Score seen is sufficient for what one is trying to  
capture?

-j
On May 31, 2007, at 2:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>    $score = $1 if ($_ =~ /Score:(\d+)/);
>
> I see that there are lots of lines in the output that match the  
> above regex, but there is also a single /Alignment Score (\d+)/  
> line printed at the end. Isn't that the score that should get  
> stored in $aln->score()?
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Thu May 31 14:15:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 31 May 2007 11:15:38 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO><DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org><465E9B58.1020403@sendu.me.uk>
	<49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B40334A01A@EX02.asurite.ad.asu.edu>

> you're right --- it is not really my code, I was just 
> elaborating Kevin's example --- it would probably need to be 
> more specific or perhaps the last Score seen is sufficient 
> for what one is trying to capture?

I took that code from a pairwise clustal alignment script that I wrote
to deal with aligning a bunch of short sequences against a long one to
see where they line up at.  When all of them were fed to Clustal the
short sequences all ended up aligned to each other and not well aligned
to the longer sequence.  I only saw one score in the output from the
pairwise, so that is what I used to find a reasonable value.


From shameer at ncbs.res.in  Tue May  1 07:36:31 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 17:06:31 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
Message-ID: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>

Dear All,

I am trying to impliment a bioperl based program to generate a dynamic,
clickable image. I have used Dr. Lincoln Steins's code provided in
example3 at this URL :
http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
be perfect for my purpose.

I need to add few modifications to the image. I reffered the Bio::Graphics
HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
(may be am missing something imp.. ? )  but I couldnt get a quick
solution, Thought I will ask about it to the experts for some tips and
tricks.

This is what I am looking for :

1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
changed according to length of the sequence. My sequence length is usually
in a range of 70 - 200.

2. I also need to make the image interactive / clickable on the various
blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
used instead of name of the blast hits)


Many thanks in advance for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 12:04:13 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:13 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42403.192.168.1.1.1178035453.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 12:04:11 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:11 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From cain at cshl.edu  Tue May  1 10:04:09 2007
From: cain at cshl.edu (Scott Cain)
Date: Tue, 01 May 2007 10:04:09 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
Message-ID: <1178028249.2644.13.camel@localhost.localdomain>

Hi Shameer,

There is a fair amount of documentation in the perldoc for
Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
you read that?  Also, for changing the scale, that should happen
automatically--have you tried yet?

Scott


On Tue, 2007-05-01 at 17:06 +0530, Shameer Khadar wrote:
> Dear All,
> 
> I am trying to impliment a bioperl based program to generate a dynamic,
> clickable image. I have used Dr. Lincoln Steins's code provided in
> example3 at this URL :
> http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
> be perfect for my purpose.
> 
> I need to add few modifications to the image. I reffered the Bio::Graphics
> HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
> (may be am missing something imp.. ? )  but I couldnt get a quick
> solution, Thought I will ask about it to the experts for some tips and
> tricks.
> 
> This is what I am looking for :
> 
> 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
> changed according to length of the sequence. My sequence length is usually
> in a range of 70 - 200.
> 
> 2. I also need to make the image interactive / clickable on the various
> blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
> used instead of name of the blast hits)
> 
> 
> Many thanks in advance for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/f84a3220/attachment-0003.bin>

From cjfields at uiuc.edu  Tue May  1 13:10:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 12:10:10 -0500
Subject: [Bioperl-l] Pb makefile
In-Reply-To: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
References: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
Message-ID: <D975B11D-1303-4CF4-AE3B-878881964DB9@uiuc.edu>

Is there any reason you want to install bioperl 1.4 (which is over 3  
yrs old)?  The latest is v.1.5.2 (Dec. 2006); man page generation has  
been fixed for that version, which uses Module::Build.

The man page generation was turned off prior to 1.4, though I may be  
wrong.  Based on the Extutils::MakeMaker FAQ you should be able to  
prevent man page generation this way:

perl Makefile.PL INSTALLMAN1DIR=none INSTALLMAN3DIR=none

chris

On Apr 30, 2007, at 5:35 AM, Francoise.LECOMTE at biogemma.com wrote:

> Hi
> I try to install biopoerl1.4 on Tru64 plateform and I've got a message
> "make:line too long" when I run the command make install
> How can I solve it ? How disable man pages installaton in  
> Makefile.PL if
> it can sove this problem
>
> Best regards
>
> Fran?oise Lecomte


From cain.cshl at gmail.com  Tue May  1 15:50:42 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 01 May 2007 15:50:42 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <1178049042.2644.36.camel@localhost.localdomain>

Perhaps if you provided some code and sample data we might be able to
help you better.

Scott


On Tue, 2007-05-01 at 21:34 +0530, Shameer Khadar wrote:
> Dear Scot,
> 
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
> 
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
> 
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
> 
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>  to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
> 
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
> 
> Thanks a lot for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/9c655e4c/attachment-0003.bin>

From agathman at semo.edu  Tue May  1 19:10:20 2007
From: agathman at semo.edu (Gathman, Allen)
Date: Tue, 1 May 2007 18:10:20 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
Message-ID: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>

Hi, all --
 
I've been using BioPerl 1.4 for a while; recently I installed 1.5.2, and
found that scripts that had been using spliced_seq are now broken.  Any
thoughts on what might be going on? 
 
Here's a sample script:
 
*********************************************
 
#!/usr/bin/perl -w
 
use strict;
use Bio::DB::GFF;

my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
                               -dsn        =>
'dbi:mysql:database=cc;host=localhost',
                               -fasta      => '/gbrowse/databases/cc'
                               );
$db->add_aggregator('transcript{CDS/mRNA}');
my $seg=$db->segment('ccin_Contig120');
my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
 
for my $gene (@genes) {
    my $gid = $gene->display_id;
 
    print STDERR "Gene is $gid\n";
    my $splgene = $gene->spliced_seq();
}

********************************************
The line with "spliced_seq" in it crashes the program.  Here's the
STDERR output:
 
Gene is Jan06m400_GLEAN_11487

-------------------- WARNING ---------------------

MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
absolute set to 1 -- be warned you may not be getting things on the
correct strand

---------------------------------------------------

-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is
::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::,(0,881935
,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to
[Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170)Bio::Prim
arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308)Bio::PrimarySeq=HAS
H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH(0x881f4a
4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] which
does not look healthy

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359

STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258

STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210

STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484

STACK: Bio::SeqFeatureI::spliced_seq
/usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498

STACK: /transfer/testsplice.pl:20

-----------------------------------------------------------

Allen Gathman

http://cstl-csm.semo.edu/gathman

 
From cjfields at uiuc.edu  Tue May  1 20:27:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 19:27:46 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
In-Reply-To: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
References: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
Message-ID: <9F00B020-AFF0-40DB-9694-6061B5A11A73@uiuc.edu>

Can you file a bug on this?  Attach the script and maybe detail what  
data is loaded into your local MySQL database (if possible).

chris

On May 1, 2007, at 6:10 PM, Gathman, Allen wrote:

> Hi, all --
>
> I've been using BioPerl 1.4 for a while; recently I installed  
> 1.5.2, and
> found that scripts that had been using spliced_seq are now broken.   
> Any
> thoughts on what might be going on?
>
> Here's a sample script:
>
> *********************************************
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::DB::GFF;
>
> my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
>                                -dsn        =>
> 'dbi:mysql:database=cc;host=localhost',
>                                -fasta      => '/gbrowse/databases/cc'
>                                );
> $db->add_aggregator('transcript{CDS/mRNA}');
> my $seg=$db->segment('ccin_Contig120');
> my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
>
> for my $gene (@genes) {
>     my $gid = $gene->display_id;
>
>     print STDERR "Gene is $gid\n";
>     my $splgene = $gene->spliced_seq();
> }
>
> ********************************************
> The line with "spliced_seq" in it crashes the program.  Here's the
> STDERR output:
>
> Gene is Jan06m400_GLEAN_11487
>
> -------------------- WARNING ---------------------
>
> MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
> absolute set to 1 -- be warned you may not be getting things on the
> correct strand
>
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
>
> MSG: seq doesn't validate, mismatch is
> ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::, 
> (0,881935
> ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)
>
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
>
> MSG: Attempting to set the sequence to
> [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170) 
> Bio::Prim
> arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308) 
> Bio::PrimarySeq=HAS
> H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH 
> (0x881f4a
> 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)]  
> which
> does not look healthy
>
> STACK: Error::throw
>
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359
>
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258
>
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210
>
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484
>
> STACK: Bio::SeqFeatureI::spliced_seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498
>
> STACK: /transfer/testsplice.pl:20
>
> -----------------------------------------------------------
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From shameer at ncbs.res.in  Tue May  1 23:46:59 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 2 May 2007 09:16:59 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178049042.2644.36.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<1178049042.2644.36.camel@localhost.localdomain>
Message-ID: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>

Dear Scott,

Once thanks a lot for your inputs.

I am following same  data formats as in
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
The purpose is to display these results in a web page.

I am using the program in Stein's Bio::Graphics example
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

I need exactly same image as in
http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
only difference is I need the scale (0.1k - 0.9k) in a range of simple
1-XXX , here XXX depends on the length of the sequence input.

Many thanks for your help,


> Perhaps if you provided some code and sample data we might be able to
> help you better.
>
> Scott
>

-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed May  2 06:02:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 2 May 2007 06:02:48 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<1178049042.2644.36.camel@localhost.localdomain>
	<59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
Message-ID: <200705020602.48404.sdavis2@mail.nih.gov>

On Tuesday 01 May 2007 23:46, Shameer Khadar wrote:
> Dear Scott,
>
> Once thanks a lot for your inputs.
>
> I am following same  data formats as in
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
> Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
> blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
> The purpose is to display these results in a web page.

Do you have your data loaded into bioperl objects?  What code did you use for 
that (post that code)?

> I am using the program in Stein's Bio::Graphics example
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

Does this example run on your computer?  Have you been able to use the bioperl 
objects you created in the first step in the creation of a graphic?  If not, 
what have you tried (post the code) and any error messages.

> I need exactly same image as in
> http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
> only difference is I need the scale (0.1k - 0.9k) in a range of simple
> 1-XXX , here XXX depends on the length of the sequence input.

Again, what have you tried?  Posting code is helpful here, also.  

I'm not an expert in bioperl graphics, but it does really help those that know 
to see the code that you have written to know how best to help.  

Sean


From lzlgboy at gmail.com  Wed May  2 09:58:14 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 2 May 2007 21:58:14 +0800
Subject: [Bioperl-l] Extract CDS from CDNA given Protein SEQs
Message-ID: <d78b3d40705020658w1bee4c68s3058a63ef23c62a1@mail.gmail.com>

Hi ,everyone

   I got a task to extract cds sequences from cdna , and I have the protein
sequence for each cdna, what should I do?
   Should I try 3_frmae_translate? But how.
   Thanks.

-- 
??????
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From MEC at stowers-institute.org  Wed May  2 18:38:31 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 2 May 2007 17:38:31 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>

Lincoln, 
 
Here for your comment and review is a very reworked version of
Bio::Graphics::FeatureBase->gff3_string.
 
The main difference is to that homogenous children get ALL their
attributes except for start/stop from the parent, including the group.
I also provide option as to whether or now to "remove extraneous level
of parentage" called $preserveHomegenousParent.
 
There is an in-line comment and question for you in the code body.
 
It works well in my hands to my use cases, but, I'm not positive it is
in the spirit of your intentions.
 
Cheers,
 
Malcolm
 
 
sub gff3_string {
  my ($self, $recurse, $preserveHomegenousParent,
 
      # Note: the following parameters, whose name begins with '$_',
      # are intended for recursive call only.
 
      $_parent,
      $_self_is_hsf,  # is $self the child in a homogeneous parent/child
relationship?
      $_hsf_parentgroup, # if so, what is the group (GFF column 9) of
the parent
     ) = @_;
 
  # PURPOSE: Return GFF3 format for the feature $self.  Optionally
  # $recurse to include GFF for any subfeatures of the feature. If
  # recursing, provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have subfeatures all of whose types are the same as the
  # feature itself (the "homogenous parent/child" case). This usage is
  # a convention for representing discontiguous features; they may be
  # created by using the -segment directive without specifying a
  # distinct -subtype in to `new` when creating a
  # Bio::Graphics::FeatureBase (i.e.  Bio::DB::SeqFeature,
  # Bio::Graphics::Feature).  Such homogenous subfeatures created in
  # this fashion DO NOT have the parent (GFF column 9) attributes
  # propogated to them; so, since they are all part of the same
  # parent, the ONLY difference relevant to GFF production SHOULD be
  # the $start and $end coordinates for their segment, and ALL THIER
  # OTHER ATTRIBUTES should be copied down from the parent (including:
  # strand, score, Name, ID, Parent, etc).
 

  my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's
parent, if it is a homogenous child, otherwise $self.
 
  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
    my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be
TRUE only if  all subfeatures are the same type as $self.
    my $mygroup =
      # compute $self's group if it is needed to be passed down to
      # subfeatures, unless it is already being passed down (in which
      # case there are (at least) 3 levels of homogenous parent child
      # (will this ever happen in practice???))
      ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent); 
    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
  } else {
    my $name  = $hparentORself->name;
    my $class = $hparentORself->class;
    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
    my $strand = ('-','.','+')[$self->strand+1]; 
    # TODO: understand conditions under which this could be other than
    # hparentORself->strand.  In particular, why does add_segment flip
    # the strand when start > stop?  I thought this was not allowed!
    # Lincoln - any ideas?
    my $p      = join("\t",
 
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
        $self->start||'.',$self->stop||'.',
        defined($hparentORself->score) ? $hparentORself->score : '.',
        $strand||'.',
        defined($hparentORself->phase) ? $hparentORself->phase : '.',
        $group||'');
  }
}
 

________________________________

	From: Cook, Malcolm 
	Sent: Friday, April 27, 2007 1:45 PM
	To: 'lincoln.stein at gmail.com'
	Cc: 'lstein at cshl.org'; 'bioperl list'
	Subject: RE: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Lincoln,
	 
	Cool.
	 
	The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
	 

	Malcolm Cook
	Database Applications Manager - Bioinformatics
	Stowers Institute for Medical Research - Kansas City, Missouri
	  

________________________________

		From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
		Sent: Friday, April 27, 2007 12:45 PM
		To: Cook, Malcolm
		Cc: lstein at cshl.org; bioperl list
		Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
		
		
		Hi Malcom,
		
		This is absolutely ok and you can go ahead and commit.
Thanks for figuring this out!
		
		Lincoln
		
		
		On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

			Lincoln, et al,
			
			I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
			from a Bio::DB::SeqFeature::Store that were
initially created with
			-seqments (i.e. whose location was
discontiguous) does not display any
			other attributes in column 9 than "Name".
			
			What do you think of the following patch to
Bio::Graphics::FeatureBase, 
			whose effect is to "contrive to return
(duplicated) common group values"
			(which otherwise get lost when "collapsing"
"homogenous" parent/child
			features)
			
			Another approach would be to copy the attributes
from the parent to the 
			children when the -seqments are first created.
			
			Another approach would be to use
Bio::SeqFeature::Generic  as the db's
			-seqfeature_class and save with -location being
a Bio::Location::Split,
			but this was wrougth with other problems. 
			
			Any other suggestions?  Do you want me to commit
this patch?
			
			Cheers,
			
			Malcolm
			
			Patch follows:
			
			
			Index: FeatureBase.pm
	
=================================================================== 
			RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
			retrieving revision 1.29
			diff -c -r1.29 FeatureBase.pm
			*** FeatureBase.pm      16 Apr 2007 19:55:33
-0000      1.29
			--- FeatureBase.pm       26 Apr 2007 16:30:23
-0000
			***************
			*** 581,587 ****
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     return join "\n", at children;
			    }
			
			    return join("\n",$p, at children);
			--- 581,589 ----
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     #return join "\n", at children; 
			!     # Instead of above, additionally, contrive
to return (duplicated)
			common group values
			!     return(join("$group\n", at children) .
$group);
			    }
			
			    return join("\n",$p, at children); 
			

		-- 
		Lincoln D. Stein
		Cold Spring Harbor Laboratory
		1 Bungtown Road
		Cold Spring Harbor, NY 11724
		(516) 367-8380 (voice)
		(516) 367-8389 (fax)
		FOR URGENT MESSAGES & SCHEDULING, 
		PLEASE CONTACT MY ASSISTANT, 
		SANDRA MICHELSEN, AT michelse at cshl.edu 


From lstein at cshl.edu  Thu May  3 12:01:38 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 12:01:38 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>

The width of the image is determined by the -width attribute and is given in
pixels. You cannot control the height of the image as it is computed
dynamically based on the number of features and bumping options.

Lincoln

On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>
> Dear Scot,
>
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
>
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
>
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
>
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
> to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
>
> Thanks a lot for your inputs,
> --
> Shameer Khadar
> Lab (# 25) The Computational Biology Group
> National Centre for Biological Sciences (TIFR)
> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
> T - 91-080-23666001 EXT - 6251
> W - http://www.ncbs.res.in
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bioperlanand at yahoo.com  Thu May  3 16:09:18 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Thu, 3 May 2007 13:09:18 -0700 (PDT)
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
Message-ID: <922386.19570.qm@web36808.mail.mud.yahoo.com>

Hi

I am using Bioperl 1.4 and I am trying to obtain protein sequences for specific Uniprot records.

For some records (ROA1_HUMAN), it prints the correct sequence, but  it first prints the warning "Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <STREAM> line 43." 

For other records (BOLA_HAEIN), it prints the correct sequence (without any warnings).

Here is the code:
-------------------------------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Bio::Perl;
use Bio::DB::SwissProt;

my $sp = new Bio::DB::SwissProt;

#my $seq_object  = $sp->get_Seq_by_id('ROA1_HUMAN');
my $seq_object  = $sp->get_Seq_by_id('BOLA_HAEIN');

my $sequence_as_a_string = $seq_object->seq();
print "$sequence_as_a_string\n";
-------------------------------------------------------------------------------------------

 Is there something I need to fix.

Thanks in advance for the help.
 
 Anand

       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.


From MEC at stowers-institute.org  Thu May  3 16:19:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 3 May 2007 15:19:00 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
	<CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>
	<6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E230A6@exchkc02.stowers-institute.org>

Lincoln,
 
Ah, yes, round-tripping GFF, the holy grail....
 
Unfortunately, I don't really have a baseline to go against for an
example that roundtrips successfully now.  Do you?
 
For example, after loading test data: 
 
> bp_seqfeature_load.PLS  bioperl-live/t/data/biodbgff/test.gff3
 
the Contig1 portion of which looks like this:
 
##gff-version 3
## sequence-region Contig1 1 37450
Contig1 confirmed transcript 1001 2000 42 + .
ID=Transcript:trans-1;Gene=abc-1;Gene=xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . ID=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 ID=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 ID=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 ID=Transcript:trans-1
Contig1 est similarity 1001 1100 96 . . Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . . Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . . Target=EST:CEESC13F 201 250 +
Contig1 tc1 transposon 5001 6000 . + . ID=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
ID=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - . ID=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - . ID=Transcript:trans-2
 
 
and then generating output with
 
>bp_seqfeature_gff3.PLS --gff=1 -- seq_id Contig1  #  using a script I
just committed - I hope you like it.  Note: gff=1 => recurse 
 
we get output gff with problems such as:
 
    1 IDs get turned into Aliases
    2 the seqid of a Target attributes gets copied into the features
Name attribute
    3 supression of parents of homogeneous subfeatures doesn't work when
the parent has other subfeatures that those with its same type (i.e. the
transcript feature also has exon subfeatures)
 
look:
 
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed transcript 1001 2000 42 + .
Parent=2;Alias=Transcript:trans-1;Note=function+unknown;Gene=abc-1,xyz-2
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed transcript 30001 31000 . - .
Parent=9;Alias=Transcript:trans-2;Note=Terribly+interesting;Gene=xyz-2
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
with my new version of gff3_string (not yet commited), only the 3rd
problem is addressed, generating
 
bp_seqfeature_gff3.PLS --gff 1  -- seq_id Contig1
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
 
I had to make another change to get this output though, since I had to
change the behaviour to
 
  # provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have at least one subfeature with the same type as the
  # feature itself (thus redefining Lincoln's "homogenous
  # parent/child" case, which previously required all children to have
  # the same type as parent)
 
 
I think you will agree this is the more desirable behaviour.
 
I would be happy to test any other GFF you suggest might be (more or
less) roundtripped.
 
What think you?
 
--Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Thursday, May 03, 2007 9:46 AM
	To: Cook, Malcolm
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcolm,
	
	For me, the major use case is that GFF3 files round-trip
correctly through the database. Do any of your use cases cover that?
	
	Lincoln
	
	
	On 5/2/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, 
		 
		Here for your comment and review is a very reworked
version of Bio::Graphics::FeatureBase->gff3_string.
		 
		The main difference is to that homogenous children get
ALL their attributes except for start/stop from the parent, including
the group.  I also provide option as to whether or now to "remove
extraneous level of parentage" called $preserveHomegenousParent.
		 
		There is an in-line comment and question for you in the
code body.
		 
		It works well in my hands to my use cases, but, I'm not
positive it is in the spirit of your intentions.
		 
		Cheers,
		 
		Malcolm
		 
		 
		sub gff3_string {
		  my ($self, $recurse, $preserveHomegenousParent,
		 
		      # Note: the following parameters, whose name
begins with '$_',
		      # are intended for recursive call only.
		 
		      $_parent,
		      $_self_is_hsf,  # is $self the child in a
homogeneous parent/child relationship?
		      $_hsf_parentgroup, # if so, what is the group (GFF
column 9) of the parent
		     ) = @_;
		 
		  # PURPOSE: Return GFF3 format for the feature $self.
Optionally
		  # $recurse to include GFF for any subfeatures of the
feature. If
		  # recursing, provide special handling to "remove an
extraneous level
		  # of parentage" (unless $preserveHomegenousParent) for
features
		  # which have subfeatures all of whose types are the
same as the
		  # feature itself (the "homogenous parent/child" case).
This usage is
		  # a convention for representing discontiguous
features; they may be
		  # created by using the -segment directive without
specifying a
		  # distinct -subtype in to `new` when creating a
		  # Bio::Graphics::FeatureBase (i.e.
Bio::DB::SeqFeature,
		  # Bio::Graphics::Feature).  Such homogenous
subfeatures created in
		  # this fashion DO NOT have the parent (GFF column 9)
attributes
		  # propogated to them; so, since they are all part of
the same
		  # parent, the ONLY difference relevant to GFF
production SHOULD be
		  # the $start and $end coordinates for their segment,
and ALL THIER
		  # OTHER ATTRIBUTES should be copied down from the
parent (including:
		  # strand, score, Name, ID, Parent, etc).
		 
		
		  my $hparentORself = $_self_is_hsf ? $_parent : $self;
# $self's parent, if it is a homogenous child, otherwise $self.
		 
		  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
		    my $homogenous = ! grep {$_->type ne $self->type}
@ssf; # will be TRUE only if  all subfeatures are the same type as
$self.
		    my $mygroup =
		      # compute $self's group if it is needed to be
passed down to
		      # subfeatures, unless it is already being passed
down (in which
		      # case there are (at least) 3 levels of homogenous
parent child
		      # (will this ever happen in practice???))
		      ! $homogenous ? '' : $_self_is_hsf ?
$_hsf_parentgroup : $self->format_attributes($_parent); 
		    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
		  } else {
		    my $name  = $hparentORself->name;
		    my $class = $hparentORself->class;
		    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
		    my $strand = ('-','.','+')[$self->strand+1]; 
		    # TODO: understand conditions under which this could
be other than
		    # hparentORself->strand.  In particular, why does
add_segment flip
		    # the strand when start > stop?  I thought this was
not allowed!
		    # Lincoln - any ideas?
		    my $p      = join("\t",
	
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
		        $self->start||'.',$self->stop||'.',
		        defined($hparentORself->score) ?
$hparentORself->score : '.',
		        $strand||'.',
		        defined($hparentORself->phase) ?
$hparentORself->phase : '.',
		        $group||'');
		  }
		}
		 
		
________________________________

			From: Cook, Malcolm 
			Sent: Friday, April 27, 2007 1:45 PM
			To: 'lincoln.stein at gmail.com'
			Cc: 'lstein at cshl.org'; 'bioperl list'
			Subject: RE: Handling discontiguous feature
locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
			
			
			Hi Lincoln,
			 
			Cool.
			 
			The principal of what I figured out I still
think holds but the implementation is slightly broke.  Improved patch
forthoming next week.
			 

			Malcolm Cook
			Database Applications Manager - Bioinformatics
			Stowers Institute for Medical Research - Kansas
City, Missouri
			  

________________________________

				From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
				Sent: Friday, April 27, 2007 12:45 PM
				To: Cook, Malcolm
				Cc: lstein at cshl.org; bioperl list
				Subject: Re: Handling discontiguous
feature locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
				
				
				Hi Malcom,
				
				This is absolutely ok and you can go
ahead and commit. Thanks for figuring this out!
				
				Lincoln
				
				
				On 4/26/07, Cook, Malcolm <
MEC at stowers-institute.org <mailto:MEC at stowers-institute.org> > wrote: 

				Lincoln, et al,
				
				I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
				from a Bio::DB::SeqFeature::Store that
were initially created with
				-seqments (i.e. whose location was
discontiguous) does not display any
				other attributes in column 9 than
"Name".
				
				What do you think of the following patch
to Bio::Graphics::FeatureBase, 
				whose effect is to "contrive to return
(duplicated) common group values"
				(which otherwise get lost when
"collapsing" "homogenous" parent/child
				features)
				
				Another approach would be to copy the
attributes from the parent to the 
				children when the -seqments are first
created.
				
				Another approach would be to use
Bio::SeqFeature::Generic  as the db's
				-seqfeature_class and save with
-location being a Bio::Location::Split,
				but this was wrougth with other
problems. 
				
				Any other suggestions?  Do you want me
to commit this patch?
				
				Cheers,
				
				Malcolm
				
				Patch follows:
				
				
				Index: FeatureBase.pm
	
=================================================================== 
				RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
				retrieving revision 1.29
				diff -c -r1.29 FeatureBase.pm
				*** FeatureBase.pm      16 Apr 2007
19:55:33 -0000      1.29
				--- FeatureBase.pm       26 Apr 2007
16:30:23 -0000
				***************
				*** 581,587 ****
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     return join "\n", at children;
				    }
				
				    return join("\n",$p, at children);
				--- 581,589 ----
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     #return join "\n", at children; 
				!     # Instead of above, additionally,
contrive to return (duplicated)
				common group values
				!     return(join("$group\n", at children)
. $group);
				    }
				
				    return join("\n",$p, at children); 
				

				-- 
				Lincoln D. Stein
				Cold Spring Harbor Laboratory
				1 Bungtown Road
				Cold Spring Harbor, NY 11724
				(516) 367-8380 (voice)
				(516) 367-8389 (fax)
				FOR URGENT MESSAGES & SCHEDULING, 
				PLEASE CONTACT MY ASSISTANT, 
				SANDRA MICHELSEN, AT michelse at cshl.edu 


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Thu May  3 16:57:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 3 May 2007 15:57:43 -0500
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
In-Reply-To: <922386.19570.qm@web36808.mail.mud.yahoo.com>
References: <922386.19570.qm@web36808.mail.mud.yahoo.com>
Message-ID: <2930F3F1-2BFB-4320-9A2C-50DFE6F808A1@uiuc.edu>

I would update to BioPerl 1.5.2.  v.1.4 is 3 yrs old and there have  
been tons of changes both for sequence retrieval and parsers.

We can't predict when a new 'stable' release will be available but  
1.5.2 works well for most purposes.

chris

On May 3, 2007, at 3:09 PM, Anand Venkatraman wrote:

> Hi
>
> I am using Bioperl 1.4 and I am trying to obtain protein sequences  
> for specific Uniprot records.
> ...
>  Is there something I need to fix.
>
> Thanks in advance for the help.
>
>  Anand
>
>
> ---------------------------------
> Ahhh...imagining that irresistible "new car" smell?
>  Check outnew cars at Yahoo! Autos.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thiago.venancio at gmail.com  Thu May  3 17:12:35 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Thu, 3 May 2007 18:12:35 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
	<54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
Message-ID: <44255ea80705031412n7abef247je70d2681bb3cc7ed@mail.gmail.com>

Hi all,

Just for record. I am getting good results to extract CDS from protein X dna
alignments by using the following procedure:

- BLASTX to identify the hits for each dna sequence (if you want to process
sequences for further multiple sequence alignment, it is important to record
the frames);

- fastx/y to refine the alignment between the protein and the dna. FASTX/Y
is is quite good, because it performs well with frame shifts and a allows
better identification of premature stop codons. In addition, the alignment
(and the CDS prediction) is better.

This is interesting to note, to avoid analysis of "phantom" mRNAs, which are
sequences that have stops, so merely looking at the blast can raise
misleading results sometimes.

Best.

Thiago


On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Hi -
> There are some tools that do this for you -- I've listed a few from a
> google search or from what I remember reading.  It would be great If you
> (and others!) are willing to contribute a little of the info of what you
> find that works for you to the wiki, that would be great as well.   A little
> HOWTO would be cool - here or on openwetware.org.
>
> Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
> EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2
>
> Ewan Birney's estwise as part of wise package also can help if you have a
> likely protein from BLAST you want to align to the est - estwise can handle
> frameshifts, but can be too slow for some people.  Exonerate's protein2dna
> model may also work here, but I haven't tried it.
>
> -jason
> On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:
>
> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed
> to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From lstein at cshl.edu  Thu May  3 17:35:57 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 17:35:57 -0400
Subject: [Bioperl-l] CSHL is hiring
Message-ID: <6dce9a0b0705031435r3bc2d2ddlfca5ac02844b4ef0@mail.gmail.com>

Hi Folks,

Sorry for the spam. My group at CSHL is looking for a scientific programmer
with good software development credentials and some experience in
bioinformatics. Experience in object-oriented Perl programming is a strict
requirement.

This is to work on user interface development for several projects
including:

   - BioMart (data warehouse) project (www.biomart.org)
   - GBrowse genome browser (www.gmod.org/GBrowse)
   - Reactome pathways database (www.reactome.org)

I can offer salaries in the 60-80K range, depending on level of experience.
Please reply to lstein at cshl.edu.

Best,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From MEC at stowers-institute.org  Tue May  8 12:59:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 8 May 2007 11:59:10 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start
	and stop coordinates??
Message-ID: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>

Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
coordinates, 

as in:
  ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
&& $start > $stop;

I thought it is not legal for a feature to be so composed.  

Anyone know?

Cheers,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue May  8 13:12:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 12:12:45 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>

I believe all seqfeature location coordinates are designed to have  
start < stop for consistency; in cases where the strand matters (CDS,  
gene, etc.) then the strand is set to 1 or -1.  When start > stop,  
the two are reversed and the strand is flipped; at least that's the  
way locations are set up in BioPerl.

chris

On May 8, 2007, at 11:59 AM, Cook, Malcolm wrote:

> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Tue May  8 14:37:20 2007
From: juheymann at yahoo.com (Bohr)
Date: Tue, 8 May 2007 11:37:20 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
Message-ID: <10381379.post@talk.nabble.com>


Hi,

I installed bioperl under OSX Tiger via Fink. I tested the installation
using the test tutorial via: perl -w bptutorial.pl 5

The script failed indicating that the file to retrieve was missing. To
identify the problem, I used a script using 'get_sequence' that will
retrieve a file from 'genbank' or 'embl'. Both succeeded. If I replace it
with 'swiss' or 'swissprot' and substitute the ID with the identical ID as
in the tutorial, I am recreating the problem found with bptutorial.pl. Other
ID's do the same.

Any pointers on the origin of this finding would be greatly appreciated.
-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10381379
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Tue May  8 17:53:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 16:53:04 -0500
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
In-Reply-To: <10381379.post@talk.nabble.com>
References: <10381379.post@talk.nabble.com>
Message-ID: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>

The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
etc) accessed via bioperl.

As a note, the bptutorial.pl has been moved to the bioperl wiki:

http://www.bioperl.org/wiki/Bptutorial

chris

On May 8, 2007, at 1:37 PM, Bohr wrote:

>
> Hi,
>
> I installed bioperl under OSX Tiger via Fink. I tested the  
> installation
> using the test tutorial via: perl -w bptutorial.pl 5
>
> The script failed indicating that the file to retrieve was missing. To
> identify the problem, I used a script using 'get_sequence' that will
> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
> replace it
> with 'swiss' or 'swissprot' and substitute the ID with the  
> identical ID as
> in the tutorial, I am recreating the problem found with  
> bptutorial.pl. Other
> ID's do the same.
>
> Any pointers on the origin of this finding would be greatly  
> appreciated.
> -- 
> View this message in context: http://www.nabble.com/problem-with- 
> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
> tf3711391.html#a10381379
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Wed May  9 18:17:27 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:17:27 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10403903.post@talk.nabble.com>


Thank you for the feedback and the suggestion.

I installed 1.5.2 via Build.pl and the results were the same e.g. embl and
genbank worked fine, swissprot failed

Here is the output:

MSG: acc (CALX_YEAST) does not exist
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Did not provide a valid Bio::PrimarySeqI object
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SeqIO::fasta::write_seq
/sw/lib/perl5/5.8.6/Bio/SeqIO/fasta.pm:181

Before contemplating too much:
Here my question: how do I verify the update to 1.5.2? (I ran ./Build test
and that came back positive.) And what else could have gone wrong here?

What might be a clever way to troubleshoot this?


---------------------------------------------------------------------------

Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10403903
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From ursula_cox at btinternet.com  Wed May  9 18:12:26 2007
From: ursula_cox at btinternet.com (Ursula at BT)
Date: Wed, 9 May 2007 23:12:26 +0100
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
Message-ID: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>

Dear BioPerl List,

 
I'm new to BioPerl (and Perl for that matter). I have an array of enzyme
names, and a larger collection of enzymes (guaranteed to be a superset by
the way it's constructed). I need to make a new collection containing just
the enzymes corresponding to the names I have in the array.

 
I was hoping that something like:

 
my $all_rebase =
Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');

my $all_rebase_collection = $all_rebase->read();

 
my @enzymes =
('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','AccB1I','
AccB7I','AccI');

 
my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

foreach $enzyme (all_rebase_collection)

            {

            $new_collection($enzyme) if grep $_ eq $enzyme->name, @enzymes;

            }

 
would work, but I get a syntax error near "$new_collection(".

 
Any clues much appreciated,

 
Ursula Cox


From juheymann at yahoo.com  Wed May  9 18:38:42 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:38:42 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10404211.post@talk.nabble.com>


Thank you for pointing that out! I installed 1.5.2 via Build.pl. The scripts
work as expected now.


Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10404211
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Wed May  9 19:37:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 May 2007 18:37:33 -0500
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
In-Reply-To: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
References: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
Message-ID: <E4472E55-AADB-4697-8C4D-2EC231923F0B@uiuc.edu>


On May 9, 2007, at 5:12 PM, Ursula at BT wrote:

> Dear BioPerl List,
>
>
>
> I'm new to BioPerl (and Perl for that matter). I have an array of  
> enzyme
> names, and a larger collection of enzymes (guaranteed to be a  
> superset by
> the way it's constructed). I need to make a new collection  
> containing just
> the enzymes corresponding to the names I have in the array.

First, prior to using BioPerl you should really brush up on perl  
itself (Learning Perl, or James Tisdall's Perl for Bioinformatics  
books, the former preferred).  Though there are several scripts  
available to get you started with Bioperl, much of the code is  
written with the expectation that you can write and debug a basic  
perl script (and there is some expectation that you are somewhat  
familiar with OO Perl).

Saying that, let's see what's wrong...

> I was hoping that something like:
>
>
>
> my $all_rebase =
> Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');
>
> my $all_rebase_collection = $all_rebase->read();

The 'bionet' format is not supported; only 'withrefm', 'itype2',  
'bairoch' are (the latter only experimentally).  See 'perldoc  
Bio::Restriction::IO'.

> my @enzymes =
> ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','Acc 
> B1I','
> AccB7I','AccI');
>
>
>
> my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

Missing a new() constructor here.

> foreach $enzyme (all_rebase_collection)

Not sure what this is.  No '$' sigil for $all_rebase_collection will  
make the compiler look for (and fail to find) the sub  
all_rebase_collection().

>
>             {
>
>             $new_collection($enzyme) if grep $_ eq $enzyme->name,  
> @enzymes;
>
>             }
>
>
>
> would work, but I get a syntax error near "$new_collection(".

Yep.  You don't have your grep sub block in brackets {}, hence the  
error.  See 'perldoc -f grep'.

> Any clues much appreciated,
>
>
>
> Ursula Cox

No prob, but again you might want to brush up on perl.

chris


From darin.london at duke.edu  Thu May 10 12:17:38 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Thu, 10 May 2007 12:17:38 -0400
Subject: [Bioperl-l] BOSC 2007 Second Call For Papers
Message-ID: <200705101617.l4AGHceI002463@tenero.duhs.duke.edu>


The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th.  The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions.   Thank you, and we hope to see you in Vienna.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From lstein at cshl.edu  Thu May 10 13:13:09 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 10 May 2007 13:12:09 -0401
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0705101013w1923c173l5ec5d9288c67c9a2@mail.gmail.com>

It's a workaround for some broken data sources. It should "never happen."

Lincoln

On 5/8/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Bank.Beszteri at awi.de  Thu May 10 12:13:00 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Thu, 10 May 2007 18:13:00 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
Message-ID: <4643448C.4000807@awi.de>

Dear Bioperl folks,

I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
but in some things it did not behave as I expected it to, so I had to 
look inside a bit.
In particular, I had problems with mixed up bootstrap values after 
re-rooting. After looking into the Bio::Tree::Tree data structures, it 
seems that

a) bootstrap values are stored as attributes of nodes of the tree [to my 
understanding, they should rather be attributes of branches but 
Bio::Tree::Tree apparently tries to simplify away branches]; each node 
stores the bootstrap value belonging to the branch that connects it to 
its ancestor node (I?m reading in trees from Newick strings, and 
bootstrap values arrive in the id fields of internal branches)

b) when re-rooting a tree, bootstrap values stay with the same node 
where they were before. Because the node that used to be the ancestor of 
a particular node in the original tree might have become its descendant 
after re-rooting, the bootstrap values are being mixed up.

Can you confirm my conclusion? Whether yes or no, have you got an easy 
workaround or alternative solution to re-rooting trees (without having 
to touch the reroot method) or any other hints that could be useful for 
me to deal with this issue?

Cheers,

Bank


--
Dr. B?nk Beszteri
Alfred Wegener Institute for Polar and Marine Research


From dmessina at wustl.edu  Thu May 10 16:16:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 10 May 2007 15:16:48 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
Message-ID: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>

Hi everyone,

Shin Leong here at the Wash U GSC has written SearchIO-compliant  
cross_match parsing and result modules. Specifically,  
Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.

To my knowledge this functionality doesn't exist in BioPerl. Any  
comments or objections before I commit these to CVS?

Thanks,
Dave


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From aperezp at uma.es  Thu May 10 13:58:32 2007
From: aperezp at uma.es (=?ISO-8859-1?Q?=22Antonio_J=2E_P=E9rez=22?=)
Date: Thu, 10 May 2007 19:58:32 +0200
Subject: [Bioperl-l] Get Swiss Entry
Message-ID: <46435D48.4020309@uma.es>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/ca4e893e/attachment-0003.html>

From jason at bioperl.org  Thu May 10 16:53:28 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 10 May 2007 13:53:28 -0700
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <FDBE1855-6252-4902-B32B-E984EC6B22E9@bioperl.org>

Awesome!
On May 10, 2007, at 1:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment-0003.bin>

From cjfields at uiuc.edu  Fri May 11 00:55:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 10 May 2007 23:55:05 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>

Sounds good to me!  Any tests to be added?

chris

On May 10, 2007, at 3:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Fri May 11 01:42:53 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 11 May 2007 00:42:53 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
Message-ID: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>

> Sounds good to me!  Any tests to be added?

No tests right now as far as I can tell. I'm swamped personally, but  
perhaps I can persuade Mark Johnson over here to crank out a few.


From cjfields at uiuc.edu  Fri May 11 11:25:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 10:25:34 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
	<57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
Message-ID: <B654B314-FE39-4DB2-9B2F-5C812CF3E257@uiuc.edu>

Thanks Mark!  I don't think you'll need to add a ton of tests; just  
enough to demo anything that you feel is necessary or specific to the  
parser.  These could go into SearchIO.t or their own test suite.

chris

On May 11, 2007, at 10:14 AM, Mark Johnson wrote:

>>> Sounds good to me!  Any tests to be added?
>>
>> No tests right now as far as I can tell. I'm swamped personally, but
>> perhaps I can persuade Mark Johnson over here to crank out a few.
>
> I'll see what I can do.  I just had to open my mouth about getting  
> this
> contributed back after I noticed it, so I suppose this is appropriate
> retribution.  8)
>
>


From mjohnson at watson.wustl.edu  Fri May 11 11:14:56 2007
From: mjohnson at watson.wustl.edu (Mark Johnson)
Date: Fri, 11 May 2007 10:14:56 -0500 (CDT)
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
Message-ID: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>

>> Sounds good to me!  Any tests to be added?
>
> No tests right now as far as I can tell. I'm swamped personally, but
> perhaps I can persuade Mark Johnson over here to crank out a few.

I'll see what I can do.  I just had to open my mouth about getting this
contributed back after I noticed it, so I suppose this is appropriate
retribution.  8)


From golharam at umdnj.edu  Fri May 11 16:20:41 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 16:20:41 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
Message-ID: <000501c79409$d8c03480$f6028a0a@PICO>

I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


From jason at bioperl.org  Fri May 11 16:53:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 11 May 2007 13:53:19 -0700
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>

Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup();


-jason
On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri May 11 16:57:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 15:57:23 -0500
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <41E91E58-48A5-4E29-B6BA-E9417BF17513@uiuc.edu>

cleanup() is supposed to clean up temp directory stuff; it's  
inherited from Bio::Tools::Run::WrapperBase.

chris

On May 11, 2007, at 3:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From golharam at umdnj.edu  Fri May 11 18:11:47 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 18:11:47 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
 after itself
In-Reply-To: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>
Message-ID: <001301c79419$5e794e90$f6028a0a@PICO>

No, I didn't, but I will now.  Thanks.  Interestingly enough ClustalW
removes the files from within the temp directory, but not the temp directory
itself.
 
 
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Friday, May 11, 2007 4:53 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
after itself


Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup(); 


-jason

On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:


I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From goshng at gmail.com  Sat May 12 11:21:59 2007
From: goshng at gmail.com (Sang Chul Choi)
Date: Sat, 12 May 2007 11:21:59 -0400
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
Message-ID: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>

Hi,

One Bio::Seq's sequence is "ACGT" and I want this object to have
"ACGA" by changing the fouth letter from T to A. I thought I could do
this by reading sequence string through the method of seq(), changing
the string by perl's general function, and generating another Bio::Seq
object with the new string. This seems to be silly, a little bit.

Is there any simple way to do this? Or, is there any method of
Bio::Seq to do this: to change one letter at a particular position, or
additionally to change letters with some range?

Thank you,

Sang Chul


From jason at bioperl.org  Sat May 12 12:50:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 09:50:10 -0700
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
In-Reply-To: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
References: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
Message-ID: <22C99635-C22D-4F51-AADD-5CCF595222DF@bioperl.org>

You can get/set the seq data via the seq() method.

use Bio::Seq;
my $seq = Bio::Seq->new(-seq => 'ACGT');

my $str = $seq->seq;
print $str, "\n";

substr($str,3,1,'A');
$seq->seq($str);
print $seq->seq, "\n";

On May 12, 2007, at 8:21 AM, Sang Chul Choi wrote:

> Hi,
>
> One Bio::Seq's sequence is "ACGT" and I want this object to have
> "ACGA" by changing the fouth letter from T to A. I thought I could do
> this by reading sequence string through the method of seq(), changing
> the string by perl's general function, and generating another Bio::Seq
> object with the new string. This seems to be silly, a little bit.
>
> Is there any simple way to do this? Or, is there any method of
> Bio::Seq to do this: to change one letter at a particular position, or
> additionally to change letters with some range?
>
> Thank you,
>
> Sang Chul
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Sat May 12 18:12:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 15:12:56 -0700
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <4643448C.4000807@awi.de>
References: <4643448C.4000807@awi.de>
Message-ID: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>


On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:

> Dear Bioperl folks,
>
> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees,
> but in some things it did not behave as I expected it to, so I had to
> look inside a bit.
> In particular, I had problems with mixed up bootstrap values after
> re-rooting. After looking into the Bio::Tree::Tree data structures, it
> seems that
>
> a) bootstrap values are stored as attributes of nodes of the tree  
> [to my
> understanding, they should rather be attributes of branches but
> Bio::Tree::Tree apparently tries to simplify away branches]; each node
> stores the bootstrap value belonging to the branch that connects it to
> its ancestor node (I?m reading in trees from Newick strings, and
> bootstrap values arrive in the id fields of internal branches)

Please feel free to suggest an alternative implementation if you  
don't agree with the object model.    It has worked quite well in our  
hands so I'd be all ears for someone wanting to get in an do some  
more work on it.

We have answered the question as to why bootstrap values are internal  
ids many times on this list and I believe on the wiki -- the parser  
can't tell the difference between a node id and a bootstrap value  
because nexus uses the same slot for both.  if you know you have  
bootstrap values in the internal node it is trivial to process your  
tree and copy the values over.


for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
  $node->bootstrap($node->id);
  $node->id('');
}

I just added this as a method to TreeFunctionI so that it can be  
easily called now to help satisfy everyone who hopes that the toolkit  
can guess whether the internal nodes are bootstraps or identifiers.


>
> b) when re-rooting a tree, bootstrap values stay with the same node
> where they were before. Because the node that used to be the  
> ancestor of
> a particular node in the original tree might have become its  
> descendant
> after re-rooting, the bootstrap values are being mixed up.
>
> Can you confirm my conclusion? Whether yes or no, have you got an easy
> workaround or alternative solution to re-rooting trees (without having
> to touch the reroot method) or any other hints that could be useful  
> for
> me to deal with this issue?
>

I think you are right, but I am not clear what should be value for  
the internal node attached to the root now.

Note that is always helpful to provide example code illustrating your  
problem.  Here is an example which I think illustrates your problem.

use Bio::TreeIO;

my $in = Bio::TreeIO->new(-format => 'newick',
			  -fh => \*DATA);
my $out = Bio::TreeIO->new(-format => 'newick');
while( my $t = $in->next_tree ){
     my ($a) = $t->find_node(-id =>"A");
     $out->write_tree($t);
     $t->reroot($a);
     $out->write_tree($t);
}
__DATA__
(((A:5,B:5)90:2,C:4)25:3,D:10);


> Cheers,
>
> Bank
>
>
>
> --
> Dr. B?nk Beszteri
> Alfred Wegener Institute for Polar and Marine Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From darin.london at duke.edu  Mon May 14 10:44:56 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Mon, 14 May 2007 10:44:56 -0400
Subject: [Bioperl-l] BOSC 2007 Abstract Submission Deadline Extended
Message-ID: <200705141444.l4EEium2026969@tenero.duhs.duke.edu>


Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st.  The announcement day will remain the same so that it remains before the Early Discount Date.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From thiago.venancio at gmail.com  Mon May 14 14:54:44 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Mon, 14 May 2007 15:54:44 -0300
Subject: [Bioperl-l] get regions
Message-ID: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>

Hi all,

Using Bio::Seq, is there any easy way to get the coordinates where a
regular expression matches or should I build a sliding window?

For example, looking for a given promoter region in a FASTA file. If
the region is found, I would like to recover exactly the coordinates
where it matches.

Thanks in advance.

Thiago
-- 
"Doubt is not a pleasant condition, but certainty is absurd."
            Voltaire

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From jason at bioperl.org  Mon May 14 15:06:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 12:06:11 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>

I assume you are doing the matches on the string with =~ so Bio::Seq  
doesn't really help you here I don't think.
See the $` variable in Perl for how to capture the position of where  
a regexp matches.

-jason
On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:

> Hi all,
>
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
>
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
>
> Thanks in advance.
>
> Thiago
> -- 
> "Doubt is not a pleasant condition, but certainty is absurd."
>             Voltaire
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Mon May 14 15:15:09 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 14 May 2007 12:15:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>

I do this in perl with the pos() function.  This requires the use of the
match operator (m) like

if ($gene =~ m/$pattern/gi)
{
	$start = pos($gene) - length($pattern) + 1;
}

pos() returns the location of the pointer where the regex left off after
finding a match.  I remove the length of my pattern (which is just a
string with a few placeholder (.) wildcards, so I know how long the
match will always be).

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Monday, May 14, 2007 12:06 PM
> To: Thiago Venancio
> Cc: bioperl-l list
> Subject: Re: [Bioperl-l] get regions
> 
> I assume you are doing the matches on the string with =~ so 
> Bio::Seq doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position 
> of where a regexp matches.
> 
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> 
> > Hi all,
> >
> > Using Bio::Seq, is there any easy way to get the 
> coordinates where a 
> > regular expression matches or should I build a sliding window?
> >
> > For example, looking for a given promoter region in a FASTA 
> file. If 
> > the region is found, I would like to recover exactly the 
> coordinates 
> > where it matches.
> >
> > Thanks in advance.
> >
> > Thiago
> > --
> > "Doubt is not a pleasant condition, but certainty is absurd."
> >             Voltaire
> >
> > ========================
> > Thiago Motta Venancio, MSc
> > PhD student in Bioinformatics
> > University of Sao Paulo
> > ========================
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Bank.Beszteri at awi.de  Mon May 14 09:20:07 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Mon, 14 May 2007 15:20:07 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
References: <4643448C.4000807@awi.de>
	<1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
Message-ID: <46486207.60304@awi.de>

Dear Jason,

thanks for your answer! Sorry about having been ambiguous - it is clear 
that bootstrap values are parsed as ids from newick files, I had no 
problem with that, it was only the first step of the explanation of my 
problem, which was the rerooting issue.

Thanks for your example code as well, it is indeed really useful to 
illustrate the problem. I modified the original tree a bit to make my 
point clearer:

In your example, there are two internal node ids in a four-taxon tree. 
This is not a realistic situtation for bootstrap values, because 
bootstrap values are attached to bipartitions of terminal nodes, i.e., 
edges / branches of a tree (in what proportion of the bootstrap 
replicates was a particular bipartition recovered - an alternative 
representation of bootstraps, like produced e.g. by PAUP, is indeed a 
"taxon bipartition table"). This means that in a four taxon tree, we can 
have at most one bootstrap value - corresponding to the single 
non-trivial bipartition (all other bipartitions are trivial, i.e., they 
separate a terminal node from the rest).

So here is an example 4-taxon tree with a bootstrap value:

(A:52,(B:46,C:50)68:11,D:70);

After rerooting at node B (using your example code) it looks like

((B:46,C:50,(A:52,D:70):11)68);

Now there are two problems:
    1) this seems to be a small problem with TreeIO rather than with 
rerooting: there is an extra pair of parentheses around the whole tree;

but more importantly: 
    2) the bootstrap value appears at the root node, which is not 
sensible according to the convention that "each node stores the 
bootstrap value belonging to the branch linking it to its ancestor". You 
would like the bootstrap value appear at the node connecting A & D in 
this situation, which would look like

(B:46,C:50,(A:52,D:70)68:11);

because in  this new situation, this position would correspond to the 
same bipartition as in the original tree [which is (A,D)(B,C)].

In the meanwhile, I got a mail showing me the solution (thx Daniel!), 
which is in fact pretty simple: all that has to be done is go through 
the nodes on the path from the old to the new root after rerooting, and 
for each node, take the bootstrap values from its ancestor (and remove 
it from the ancestor). This leaves the root node without a bootstrap 
value, which is exactly what you want (because it has no branch 
connecting it to its ancestor, there is no sensible bootstrap value 
attached to a root node).

So this exercise tells me that bootstraps and "real" node ids should be 
handled in different manners when rerooting: real ids should of course 
stay with the nodes, whereas bootstrap values on the path between the 
new and old root should move over to the other end of the corresponding 
branch.

Best wishes,

Bank

Jason Stajich wrote:
>
> On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:
>
>> Dear Bioperl folks,
>>
>> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
>> but in some things it did not behave as I expected it to, so I had to 
>> look inside a bit.
>> In particular, I had problems with mixed up bootstrap values after 
>> re-rooting. After looking into the Bio::Tree::Tree data structures, it 
>> seems that
>>
>> a) bootstrap values are stored as attributes of nodes of the tree [to my 
>> understanding, they should rather be attributes of branches but 
>> Bio::Tree::Tree apparently tries to simplify away branches]; each node 
>> stores the bootstrap value belonging to the branch that connects it to 
>> its ancestor node (I?m reading in trees from Newick strings, and 
>> bootstrap values arrive in the id fields of internal branches)
>
> Please feel free to suggest an alternative implementation if you don't 
> agree with the object model.    It has worked quite well in our hands 
> so I'd be all ears for someone wanting to get in an do some more work 
> on it.
>
> We have answered the question as to why bootstrap values are internal 
> ids many times on this list and I believe on the wiki -- the parser 
> can't tell the difference between a node id and a bootstrap value 
> because nexus uses the same slot for both.  if you know you have 
> bootstrap values in the internal node it is trivial to process your 
> tree and copy the values over.  
>
>
> for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
>  $node->bootstrap($node->id); 
>  $node->id('');
> }
>
> I just added this as a method to TreeFunctionI so that it can be 
> easily called now to help satisfy everyone who hopes that the toolkit 
> can guess whether the internal nodes are bootstraps or identifiers.
>
>
>>
>> b) when re-rooting a tree, bootstrap values stay with the same node 
>> where they were before. Because the node that used to be the ancestor of 
>> a particular node in the original tree might have become its descendant 
>> after re-rooting, the bootstrap values are being mixed up.
>>
>> Can you confirm my conclusion? Whether yes or no, have you got an easy 
>> workaround or alternative solution to re-rooting trees (without having 
>> to touch the reroot method) or any other hints that could be useful for 
>> me to deal with this issue?
>>
>
> I think you are right, but I am not clear what should be value for the 
> internal node attached to the root now.
>
> Note that is always helpful to provide example code illustrating your 
> problem.  Here is an example which I think illustrates your problem.
>
> use Bio::TreeIO;
>
> my $in = Bio::TreeIO->new(-format => 'newick',
>   -fh => \*DATA);
> my $out = Bio::TreeIO->new(-format => 'newick');
> while( my $t = $in->next_tree ){
>     my ($a) = $t->find_node(-id =>"A");
>     $out->write_tree($t);
>     $t->reroot($a);
>     $out->write_tree($t);
> }
> __DATA__
> (((A:5,B:5)90:2,C:4)25:3,D:10);
>
>
>> Cheers,
>>
>> Bank
>>
>>
>>
>> --
>> Dr. B?nk Beszteri
>> Alfred Wegener Institute for Polar and Marine Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
>
>


From basu at pharm.sunysb.edu  Mon May 14 15:10:33 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Mon, 14 May 2007 15:10:33 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <4648B429.2030907@pharm.sunysb.edu>

Thiago Venancio wrote:
> Hi all,
> 
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
The perl core function "pos" should help you in this case. Do a 'perldoc
-f pos' for details.

-sidd


> 
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
> 
> Thanks in advance.
> 
> Thiago


From cjfields at uiuc.edu  Mon May 14 16:48:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 May 2007 15:48:36 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>

I use pos() with m{}g; the quoted globals tend to slow things down  
for me.

Ah, see Kevin's answered that...

chris

On May 14, 2007, at 2:06 PM, Jason Stajich wrote:

> I assume you are doing the matches on the string with =~ so Bio::Seq
> doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position of where
> a regexp matches.
>
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>
>> Hi all,
>>
>> Using Bio::Seq, is there any easy way to get the coordinates where a
>> regular expression matches or should I build a sliding window?
>>
>> For example, looking for a given promoter region in a FASTA file. If
>> the region is found, I would like to recover exactly the coordinates
>> where it matches.
>>
>> Thanks in advance.
>>
>> Thiago
>> -- 
>> "Doubt is not a pleasant condition, but certainty is absurd."
>>             Voltaire
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon May 14 17:50:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 14:50:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
Message-ID: <A5BECADC-6516-41FF-A5DB-EE865AD63842@bioperl.org>

yep you are right pos() much better and faster for getting the position.

-j
On May 14, 2007, at 1:48 PM, Chris Fields wrote:

> I use pos() with m{}g; the quoted globals tend to slow things down  
> for me.
>
> Ah, see Kevin's answered that...
>
> chris
>
> On May 14, 2007, at 2:06 PM, Jason Stajich wrote:
>
>> I assume you are doing the matches on the string with =~ so Bio::Seq
>> doesn't really help you here I don't think.
>> See the $` variable in Perl for how to capture the position of where
>> a regexp matches.
>>
>> -jason
>> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>>
>>> Hi all,
>>>
>>> Using Bio::Seq, is there any easy way to get the coordinates where a
>>> regular expression matches or should I build a sliding window?
>>>
>>> For example, looking for a given promoter region in a FASTA file. If
>>> the region is found, I would like to recover exactly the coordinates
>>> where it matches.
>>>
>>> Thanks in advance.
>>>
>>> Thiago
>>> -- 
>>> "Doubt is not a pleasant condition, but certainty is absurd."
>>>             Voltaire
>>>
>>> ========================
>>> Thiago Motta Venancio, MSc
>>> PhD student in Bioinformatics
>>> University of Sao Paulo
>>> ========================
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From sac at bioperl.org  Mon May 14 21:46:55 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 14 May 2007 18:46:55 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
Message-ID: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>

On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> I do this in perl with the pos() function.  This requires the use of the
> match operator (m) like
>
> if ($gene =~ m/$pattern/gi)
> {
>         $start = pos($gene) - length($pattern) + 1;
> }
>
> pos() returns the location of the pointer where the regex left off after
> finding a match.

Cool. I hadn't known that was possible.

> I remove the length of my pattern (which is just a
> string with a few placeholder (.) wildcards, so I know how long the
> match will always be).

To generalize your code so that it will work for any pattern, such as
one that can match strings of variable length like "A{5,10}", just
subtract the length of the actual string that was matched:

if ($gene =~ m/$pattern/gi)
{
    $start = pos($gene) - length($&) + 1;
 }

Steve

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > Jason Stajich
> > Sent: Monday, May 14, 2007 12:06 PM
> > To: Thiago Venancio
> > Cc: bioperl-l list
> > Subject: Re: [Bioperl-l] get regions
> >
> > I assume you are doing the matches on the string with =~ so
> > Bio::Seq doesn't really help you here I don't think.
> > See the $` variable in Perl for how to capture the position
> > of where a regexp matches.
> >
> > -jason
> > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> >
> > > Hi all,
> > >
> > > Using Bio::Seq, is there any easy way to get the
> > coordinates where a
> > > regular expression matches or should I build a sliding window?
> > >
> > > For example, looking for a given promoter region in a FASTA
> > file. If
> > > the region is found, I would like to recover exactly the
> > coordinates
> > > where it matches.
> > >
> > > Thanks in advance.
> > >
> > > Thiago
> > > --
> > > "Doubt is not a pleasant condition, but certainty is absurd."
> > >             Voltaire
> > >
> > > ========================
> > > Thiago Motta Venancio, MSc
> > > PhD student in Bioinformatics
> > > University of Sao Paulo
> > > ========================
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shameer at ncbs.res.in  Mon May 14 23:03:57 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 15 May 2007 08:33:57 +0530 (IST)
Subject: [Bioperl-l] How to produce Bio::Graphics images using PROSITE
	output ?
In-Reply-To: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
Message-ID: <49697.192.168.1.1.1179198237.squirrel@mail.ncbs.res.in>

Dear All,

Thanks a lot for all your inputs [Help : Imagemaps using Bio::Graphics ].
I am still working on the other part of this project. Now, I am sure that
I can impliment it using Bio::Graphics. I will come back to imagemaps with
in a week or two.

Meanwhile, I need to parse a prosite output to present it as a
Bio::Graphics image. Any one had tries Bio::Graphics to create images
using prosite output ? I tried in the How-to I couldnt find anything
related to prosite.

My output looks like this :
    >Sequence : PS00001 ASN_GLYCOSYLATION N-glycosylation site.
          75 - 78  NGSM
    >Sequence : PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation
site.
         41 - 43  SpK
    >Sequence : PS00008 MYRISTYL N-myristoylation site.
           6 - 11  GTitNQ
    >Sequence : PS00009 AMIDATION Amidation site.
          78 - 81  mGKR

I need to impliment an image like blast-parser image.
Thanks to any inputs/pointers.

> The width of the image is determined by the -width attribute and is given
> in
> pixels. You cannot control the height of the image as it is computed
> dynamically based on the number of features and bumping options.
>
> Lincoln
>
> On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>>
>> Dear Scot,
>>
>> > There is a fair amount of documentation in the perldoc for
>> > Bio::Graphics::Panel under the section called 'Creating Imagemaps';
>> have
>> > you read that?
>>
>> I agreed, but I couldnt the exact information I needed :( (may be I
>> missed
>> something important).
>>
>> >  Also, for changing the scale, that should happen
>> > automatically--have you tried yet?
>>
>> I tried by changing the Lincoln's program eg: blast3.pl
>> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>> to my
>> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>>
>> But it had given me a smaller scale of length upto 300. I was looking
>> for
>> an option where I need same width and height of given image and a
>> dynamic
>> start and end values depending on length of my sequence. Since I couldnt
>> accomplish, I thought of getting some help from you guys. I think I need
>> to play a little bit with the value for reformat the scale to accomodate
>> my hits as well.
>>
>> Thanks a lot for your inputs,
>> --
>> Shameer Khadar


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From bix at sendu.me.uk  Tue May 15 04:23:52 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 15 May 2007 09:23:52 +0100
Subject: [Bioperl-l] New Blast parser
Message-ID: <46496E18.1000809@sendu.me.uk>

Back in August of last year I introduced Bio::PullParserI, a module that 
aids in the creation of fast SearchIO and Search modules. I've finally 
gotten around to implementing a Blast parser using the interface, which 
I've called Bio::SearchIO::blast_pull.

To use it you say:

my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");

or in the near future (when I've committed StandAloneBlast changes):

my $sab = Bio::Tools::Run::StandAloneBlast->new(-_READMETHOD => 
"blast_pull");


Currently the parser is incomplete: I've only tested it with NCBI BLASTN 
and BLASTP. However, results are promising. In one particular real-world 
usage-case involving running and parsing multiple Blast jobs via 
StandAloneBlast (amongst other things), changing only the _READMETHOD 
from 'blast' to 'blast_pull' in the code dropped run time from 20223s to 
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40% 
less).

Please try it out and feed-back any bugs you discover.


Cheers,
Sendu.


From aaron.j.mackey at gsk.com  Tue May 15 10:30:13 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 15 May 2007 10:30:13 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <OFFA3F7652.5382601A-ON852572DC.004F5B2A-852572DC.004FAF72@gsk.com>

Or, use a zero-width, positive look ahead assertion, and don't incur the 
penalty of either $` or $&:

  if ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
  }

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/14/2007 09:46:55 PM:

> On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > I do this in perl with the pos() function.  This requires the use of 
the
> > match operator (m) like
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >         $start = pos($gene) - length($pattern) + 1;
> > }
> >
> > pos() returns the location of the pointer where the regex left off 
after
> > finding a match.
> 
> Cool. I hadn't known that was possible.
> 
> > I remove the length of my pattern (which is just a
> > string with a few placeholder (.) wildcards, so I know how long the
> > match will always be).
> 
> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
> 
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
> 
> Steve
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > > Jason Stajich
> > > Sent: Monday, May 14, 2007 12:06 PM
> > > To: Thiago Venancio
> > > Cc: bioperl-l list
> > > Subject: Re: [Bioperl-l] get regions
> > >
> > > I assume you are doing the matches on the string with =~ so
> > > Bio::Seq doesn't really help you here I don't think.
> > > See the $` variable in Perl for how to capture the position
> > > of where a regexp matches.
> > >
> > > -jason
> > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> > >
> > > > Hi all,
> > > >
> > > > Using Bio::Seq, is there any easy way to get the
> > > coordinates where a
> > > > regular expression matches or should I build a sliding window?
> > > >
> > > > For example, looking for a given promoter region in a FASTA
> > > file. If
> > > > the region is found, I would like to recover exactly the
> > > coordinates
> > > > where it matches.
> > > >
> > > > Thanks in advance.
> > > >
> > > > Thiago
> > > > --
> > > > "Doubt is not a pleasant condition, but certainty is absurd."
> > > >             Voltaire
> > > >
> > > > ========================
> > > > Thiago Motta Venancio, MSc
> > > > PhD student in Bioinformatics
> > > > University of Sao Paulo
> > > > ========================
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > jason at bioperl.org
> > > http://jason.open-bio.org/
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From diogoat at gmail.com  Tue May 15 18:44:59 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 19:44:59 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
Message-ID: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>

Dear All,

I need to download a lot of sequence of Leishmania major in genbank
format...
But i can't download on the page of NCBI, because the downloaded file are
corrupted... when i use a browser to download this sequences
And them i looking for some script to download that`s file and fink
something like that:


#########################################################
use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>teste6.gb');
$out->write_seq($seqio);
#########################################################

And the system return me this erros
[diogo1 at genome perl]$ perl teste6.pl

-------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.

Any Ideia?

Thank`s

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http:biowebdb.org <http://www.ncbs.res.in/>


From diogoat at gmail.com  Tue May 15 19:27:05 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 20:27:05 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
Message-ID: <638512560705151627t2e25f17cg7f820f3097a67748@mail.gmail.com>

Thank for your help Barry!!

It`s work very fine and i`'m using the script... like you said...
The error was on the print that`s right?
I need to use a while to print all sequeces...

Thanks a Lot

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

2007/5/15, Barry Moore <barry.moore at genetics.utah.edu>:
>
> Diogo-
>
> write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO
> object.  Try this
>
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                  (-query   =>'Leishmania major
> [Organism]',
>                                   -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                            -file => '>>teste6.gb');
> while (my $seq = $seqio->next_seq) {
>          $out->write_seq($seq);
> }
>
> Barry
>
> On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:
>
> > Dear All,
> >
> > I need to download a lot of sequence of Leishmania major in genbank
> > format...
> > But i can't download on the page of NCBI, because the downloaded
> > file are
> > corrupted... when i use a browser to download this sequences
> > And them i looking for some script to download that`s file and fink
> > something like that:
> >
> >
> > #########################################################
> > use strict;
> > use warnings;
> >
> > use Bio::Seq;
> > use Bio::SeqIO;
> > use Bio::DB::GenBank;
> >
> > my $query = Bio::DB::Query::GenBank->new
> >                                 (-query   =>'Leishmania major
> > [Organism]',
> >                                 -db      => 'nucleotide');
> > my $gb = new Bio::DB::GenBank;
> > my $seqio = $gb->get_Stream_by_query($query);
> >
> > my $out = Bio::SeqIO->new(-format => 'genbank',
> >                           -file => '>>teste6.gb');
> > $out->write_seq($seqio);
> > #########################################################
> >
> > And the system return me this erros
> > [diogo1 at genome perl]$ perl teste6.pl
> >
> > -------------------- WARNING ---------------------
> > MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant
> > module.
> > Attempting to dump, but may fail!
> > ---------------------------------------------------
> > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
> >
> > Any Ideia?
> >
> > Thank`s
> >
> > Diogo Tschoeke
> > Laboratory of Molecular Biology of Trypanosomatides
> > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> > http://biowebdb.org
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From barry.moore at genetics.utah.edu  Tue May 15 19:17:39 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue, 15 May 2007 17:17:39 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>

Diogo-

write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO  
object.  Try this

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                 (-query   =>'Leishmania major  
[Organism]',
                                  -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                           -file => '>>teste6.gb');
while (my $seq = $seqio->next_seq) {
         $out->write_seq($seq);
}

Barry

On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded  
> file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
> module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 15 22:44:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 May 2007 21:44:43 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>


On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
...

> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
>
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
>
> Steve

Right, but $& (as well as $` and $') inflict a significant penalty  
for their use, as Aaron alludes to.  Their use, even indirectly via a  
library module, can cause a significant performance hit.

chris


From sac at bioperl.org  Wed May 16 04:16:38 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 01:16:38 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
Message-ID: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>

On 5/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
> ...
>
> > To generalize your code so that it will work for any pattern, such as
> > one that can match strings of variable length like "A{5,10}", just
> > subtract the length of the actual string that was matched:
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >     $start = pos($gene) - length($&) + 1;
> >  }
> >
> > Steve
>
> Right, but $& (as well as $` and $') inflict a significant penalty
> for their use, as Aaron alludes to.  Their use, even indirectly via a
> library module, can cause a significant performance hit.
>
> chris

Yes. I had forgotten how poisonous $&, $` and $' were to regex
performance. Please forgive me. We might consider regularly auditing
the bioperl module tree for use of these in committed code.

But regarding the use of the look ahead assertion, there's a problem
if you want to find *all* occurrences of the pattern in a target
string and the pattern can have variable length hits: it may report
overlapping hits because it only collects the starting points of the
match, and does not determine how long each match would be. For
example:

$gene = 'TTTAAAAAAAAGG';
$pattern="A{5,10}";
while ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
    print ++$hit, " hit starts at $start\n";
}

Generates:
1 hit starts at 4
2 hit starts at 5
3 hit starts at 6
4 hit starts at 7

You could get around this by imposing a constraint to avoid trivial
overlaps. OK if you know the length of the pattern, but not so good
for more complex patterns. If there was I way to get the look ahead to
match the longest string possible for a variable length pattern, then
this approach could work, but I'm not sure if that is possible.

Here's a solution I think does the job of reporting the extent of each
match without a performance hit and works for patterns of any
complexity, taking advantage of the special arrays containing hit
indexes, @- and @+:

$gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
while ($gene =~ m/$pattern/gi){
    $hit++;
    printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
}

Generates:
1 hit at:  4 - 11
2 hit at: 16 - 21

You can also use this approach to report the locations of any internal
back references, if the pattern contains any parentheses, via $-[1],
$+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
patterns, but patterns not containing parens won't be penalized.

Steve


From georg.otto at tuebingen.mpg.de  Wed May 16 05:19:06 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 16 May 2007 11:19:06 +0200
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <m17ir9m8hh.fsf@tuebingen.mpg.de>


Dear all,

I have a problem that has to do with downloading data from GenBank as
well, therefor I put it in this thread.

I try to get all entries from organism Danio rerio using the something
like this:


use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $query = "Danio rerio[ORGN]";
my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
					       -query => $query);
my $gb_obj = Bio::DB::GenBank->new;
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);


while (my $seq_obj = $stream_obj->next_seq) {
  my $out = Bio::SeqIO->new(-format => 'fasta',
			    -file => '>>output.fas');
  $out->write_seq($seq_obj);
}


However, the download process aborts after a few thousand entries. I
do not think that this is due to the request itself or problems with
specific entries, since the number of transferred sequences varies
before the stop. It might rather have to do with GenBank terminating
the connection.

Has anybody a suggestion of a better strategy to achieve what I want
(e.g. a different kind of query, a method to reassume the download at
the point where it terminated etc.)?

Best,

Georg


"Diogo Tschoeke" <diogoat at gmail.com> writes:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>


From cjfields at uiuc.edu  Wed May 16 09:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 08:05:59 -0500
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <m17ir9m8hh.fsf@tuebingen.mpg.de>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
Message-ID: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>

It's likely from a timeout issue on the remote server.  One thing  
which will speed things up is to retrieve the remote sequences in  
fasta format to begin with (described in the Bio::DB::GenBank POD):

my $gb_obj = Bio::DB::GenBank->new(-retrievaltype => 'tempfile' ,
			                      -format => 'fasta');
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);

while (my $seq_obj = $stream_obj->next_seq) {
   $out->write_seq($seq_obj);
}

I also suggest using the direct ftp downloads if at all possible  
(i.e. you are downloading WGS or contig sequences).  It's much faster.

chris

On May 16, 2007, at 4:19 AM, Georg Otto wrote:

>
> Dear all,
>
> I have a problem that has to do with downloading data from GenBank as
> well, therefor I put it in this thread.
>
> I try to get all entries from organism Danio rerio using the something
> like this:
>
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $query = "Danio rerio[ORGN]";
> my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
> 					       -query => $query);
> my $gb_obj = Bio::DB::GenBank->new;
> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
>
> while (my $seq_obj = $stream_obj->next_seq) {
>   my $out = Bio::SeqIO->new(-format => 'fasta',
> 			    -file => '>>output.fas');
>   $out->write_seq($seq_obj);
> }
>
>
> However, the download process aborts after a few thousand entries. I
> do not think that this is due to the request itself or problems with
> specific entries, since the number of transferred sequences varies
> before the stop. It might rather have to do with GenBank terminating
> the connection.
>
> Has anybody a suggestion of a better strategy to achieve what I want
> (e.g. a different kind of query, a method to reassume the download at
> the point where it terminated etc.)?
>
> Best,
>
> Georg
>
>
> "Diogo Tschoeke" <diogoat at gmail.com> writes:
>
>> Dear All,
>>
>> I need to download a lot of sequence of Leishmania major in genbank
>> format...
>> But i can't download on the page of NCBI, because the downloaded  
>> file are
>> corrupted... when i use a browser to download this sequences
>> And them i looking for some script to download that`s file and fink
>> something like that:
>>
>>
>> #########################################################
>> use strict;
>> use warnings;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> my $query = Bio::DB::Query::GenBank->new
>>                                 (-query   =>'Leishmania major  
>> [Organism]',
>>                                 -db      => 'nucleotide');
>> my $gb = new Bio::DB::GenBank;
>> my $seqio = $gb->get_Stream_by_query($query);
>>
>> my $out = Bio::SeqIO->new(-format => 'genbank',
>>                           -file => '>>teste6.gb');
>> $out->write_seq($seqio);
>> #########################################################
>>
>> And the system return me this erros
>> [diogo1 at genome perl]$ perl teste6.pl
>>
>> -------------------- WARNING ---------------------
>> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
>> module.
>> Attempting to dump, but may fail!
>> ---------------------------------------------------
>> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
>> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>>
>> Any Ideia?
>>
>> Thank`s
>>
>> Diogo Tschoeke
>> Laboratory of Molecular Biology of Trypanosomatides
>> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
>> http:biowebdb.org <http://www.ncbs.res.in/>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ferraria at gmail.com  Wed May 16 10:38:47 2007
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 16 May 2007 16:38:47 +0200
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
Message-ID: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>

Hi all,

I want to do something relatively simple and I want to know how far Bioperl
tools could help me because I'm having troubles to get to the point.
Here is the pipeline :

"EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
"GeneStructure"

(*) :
>From the EntrezGene ID, I want to retrieve the structure of the gene which
means having the whole genomic sequence and having the start and end
positions of each exons, introns, UTR'....

I thought of 2 ways to accomplish that :

  -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
desired positions.
     this method should work but would take a little time to be ok.

  -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
obtain a Bio::Seq object but I am not able to find any features stored in
it. So it doesn't seem that the get_Seq_by_id function get all information
contained in a EntrezGene entry (?) .

Can somebody help me to make the right choice or show me the right way?

I also saw that some packages detinated to deal with  gene structure exist
but I don't manage to know how to use it properly and even how to create one
of those objects !
Are those packages currently usable ?


Thanks in advance.
Best regards,
tony


From cjfields at uiuc.edu  Wed May 16 12:02:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 11:02:28 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
	<8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
Message-ID: <9C6F4829-4E06-4751-8B10-B2726B5288B9@uiuc.edu>


On May 16, 2007, at 3:16 AM, Steve Chervitz wrote:
...

>>
>> Right, but $& (as well as $` and $') inflict a significant penalty
>> for their use, as Aaron alludes to.  Their use, even indirectly via a
>> library module, can cause a significant performance hit.
>>
>> chris
>
> Yes. I had forgotten how poisonous $&, $` and $' were to regex
> performance. Please forgive me. We might consider regularly auditing
> the bioperl module tree for use of these in committed code.

Already done!  We have run a few audits for gotchas like that:

http://www.bioperl.org/wiki/Auditing

http://www.bioperl.org/wiki/Bioperl_Best_Practices

If there is anything we should be looking for please feel free to add  
as needed.  There shouldn't be any use of the 'naughty' variables in  
CVS, but it might be worth a second look...

> But regarding the use of the look ahead assertion, there's a problem
> if you want to find *all* occurrences of the pattern in a target
> string and the pattern can have variable length hits: it may report
> overlapping hits because it only collects the starting points of the
> match, and does not determine how long each match would be. For
> example:
>
> $gene = 'TTTAAAAAAAAGG';
> $pattern="A{5,10}";
> while ($gene =~ m/(?=$pattern)/gi) {
>     $start = pos($gene) + 1;
>     print ++$hit, " hit starts at $start\n";
> }
>
> Generates:
> 1 hit starts at 4
> 2 hit starts at 5
> 3 hit starts at 6
> 4 hit starts at 7
>
> You could get around this by imposing a constraint to avoid trivial
> overlaps. OK if you know the length of the pattern, but not so good
> for more complex patterns. If there was I way to get the look ahead to
> match the longest string possible for a variable length pattern, then
> this approach could work, but I'm not sure if that is possible.
>
> Here's a solution I think does the job of reporting the extent of each
> match without a performance hit and works for patterns of any
> complexity, taking advantage of the special arrays containing hit
> indexes, @- and @+:
>
> $gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
> while ($gene =~ m/$pattern/gi){
>     $hit++;
>     printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
> }
>
> Generates:
> 1 hit at:  4 - 11
> 2 hit at: 16 - 21
>
> You can also use this approach to report the locations of any internal
> back references, if the pattern contains any parentheses, via $-[1],
> $+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
> patterns, but patterns not containing parens won't be penalized.
>
> Steve

Friedl's Regex book has outlined a few ways to get around the  
'naughty' variables $`, $&, and $' using substr() and $-[0], $+[0],  
or both, which makes sense since @+ and @- are arrays of positions  
instead of actual text.

$`  substr(target, 0, $-[0])
$&  substr(target, $-[0], $+[0] - $-[0])
$'  substr(target, $+[0])

Wonderful book!

chris


From benoit at ebi.ac.uk  Wed May 16 12:35:39 2007
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 16 May 2007 17:35:39 +0100
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
Message-ID: <464B32DB.6080607@ebi.ac.uk>

Hi Tony,

I don't know how simple it is in bioperl, but it is quite simple using 
the ensembl perl API.

Have a look here :

API instalation:
http://www.ensembl.org/info/software/api_installation.html
API tutorial :
http://www.ensembl.org/info/software/core/core_tutorial.html
API Perl module Documentation :
http://www.ensembl.org/info/software/Pdoc/ensembl/index.html

so you can do something similar to the example below :

# Get the 'COG6' gene from human

my $gene = $gene_adaptor->fetch_by_display_label('COG6');

print "GENE ", $gene->stable_id(), "\n";
# here you get gene coordinate

foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
     print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
     #print transcript coordinates
	
	foreach my $exon ( @{ $transcript->get_all_exons() } ) {
	#print the exon coordinates

	}
     }
}

Hope this helps

Benoit


Anthony Ferrari wrote:
 > Hi all,
 >
 > I want to do something relatively simple and I want to know how far 
Bioperl
 > tools could help me because I'm having troubles to get to the point.
 > Here is the pipeline :
 >
 > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
 > "GeneStructure"
 >
 > (*) :
 >>From the EntrezGene ID, I want to retrieve the structure of the gene 
which
 > means having the whole genomic sequence and having the start and end
 > positions of each exons, introns, UTR'....
 >
 > I thought of 2 ways to accomplish that :
 >
 >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
 > desired positions.
 >      this method should work but would take a little time to be ok.
 >
 >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
 > obtain a Bio::Seq object but I am not able to find any features stored in
 > it. So it doesn't seem that the get_Seq_by_id function get all 
information
 > contained in a EntrezGene entry (?) .
 >
 > Can somebody help me to make the right choice or show me the right way?
 >
 > I also saw that some packages detinated to deal with  gene structure 
exist
 > but I don't manage to know how to use it properly and even how to 
create one
 > of those objects !
 > Are those packages currently usable ?
 >
 >
 > Thanks in advance.
 > Best regards,
 > tony
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnsonm at gmail.com  Wed May 16 15:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 16 May 2007 14:11:18 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
Message-ID: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>

On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I believe all seqfeature location coordinates are designed to have
> start < stop for consistency; in cases where the strand matters (CDS,
> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> the two are reversed and the strand is flipped; at least that's the
> way locations are set up in BioPerl.
>
> chris

    Oh yeah?  I always tend to ensure that (start < stop), regardless
of strand, when working with sequence features...the other day, I
caught Glimmer2 emitting a prediction on the plus strand with start >
stop.  I was going to work up a patch for the parser, but I wonder,
should I just force everything to start < stop?  Or only predictions
on the plus strand?  Should all the parsers for all the ab initio
predictors ensure they emit features with coordinates like this?


From diogoat at gmail.com  Wed May 16 16:02:44 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Wed, 16 May 2007 17:02:44 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
Message-ID: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>

Dear all,

The script wich i wrote with your helps is working very good ( I paste the
script in the end of e-mail).
But I have another problem now, all the times wich I use the script im every
all the file have a diferent size...
Any ideia? what is the problem..? My conection? Problem on Ncbi? The script
maybe?

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

#############################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Trypanosoma cruzi [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);
my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>Trypanosoma_cruzi1.gb');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
#########################################################


From barry.moore at genetics.utah.edu  Wed May 16 17:13:27 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 16 May 2007 15:13:27 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
Message-ID: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>

Diogo,

I'd guess that this is a result of NCBI terminating the connection as  
Chris suggested previously.  There are a number of approaches you  
could use:  Download only fasta if that's all you need.  Download  
only IDs, and then use SeqHound, Batch Entrez or BioPerl to download  
those sequences or you could download the genbank files from the ftp  
site as Chris also suggested, and then run a bioperl script on each  
of those files.  I can see that you are looking at Trypanosomes, so  
doing this (on linux or  Mac OSX):

wget ftp://ftp.ncbi.nih.gov/genbank/gbinv*.seq.gz

will get you the 10 files in the invertebrate division from GenBank,  
and you could run a bioperl script  on those 10 files.

Barry

On May 16, 2007, at 2:02 PM, Diogo Tschoeke wrote:

> Dear all,
>
> The script wich i wrote with your helps is working very good ( I  
> paste the
> script in the end of e-mail).
> But I have another problem now, all the times wich I use the script  
> im every
> all the file have a diferent size...
> Any ideia? what is the problem..? My conection? Problem on Ncbi?  
> The script
> maybe?
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org
>
> #############################################################
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Trypanosoma cruzi  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>Trypanosoma_cruzi1.gb');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> #########################################################
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sac at bioperl.org  Wed May 16 18:29:16 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 15:29:16 -0700
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <464B32DB.6080607@ebi.ac.uk>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
	<464B32DB.6080607@ebi.ac.uk>
Message-ID: <8f200b4c0705161529h26e7c44fk54082a1156201861@mail.gmail.com>

Another option is to use DAS ( http://biodas.org ), which was designed
precisely to solve this sort of problem.

A DAS genome query is a URL that specifies the genome assembly version
on which the returned coordinates should be based. For example, get
all features and their coordinates associated with the human actin
gene on hg17:

http://das.biopackages.net/das/genome/human/17/feature?name=ACTA1

Ensembl, UCSC, and  other sites also provide DAS servers for genomic
features, but these serve up a different XML response format (DAS/1.x)
from what biopackages.net is serving (DAS/2). Here's are some links to
these servers, both DAS/1 and DAS/2:

http://www.biodas.org/wiki/DAS/1#Servers
http://www.biodas.org/wiki/DAS/2#Servers

By default, a DAS/2 server will return data in DAS2XML format, but you
can specify alternative formats if a server supports them. This is one
advantage of the DAS/2 retrieval spec, which is stable and is
described here:

http://biodas.org/documents/das2/das2_get.html

You may not be able to user an Entrez gene ID directly in the query.
It depends on whether these IDs are available on the given server.
Accessions and gene names should be OK. You can always map your Entrez
ids to accessions or gene names using this file
ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz .

Steve

On 5/16/07, Benoit Ballester <benoit at ebi.ac.uk> wrote:
> Hi Tony,
>
> I don't know how simple it is in bioperl, but it is quite simple using
> the ensembl perl API.
>
> Have a look here :
>
> API instalation:
> http://www.ensembl.org/info/software/api_installation.html
> API tutorial :
> http://www.ensembl.org/info/software/core/core_tutorial.html
> API Perl module Documentation :
> http://www.ensembl.org/info/software/Pdoc/ensembl/index.html
>
> so you can do something similar to the example below :
>
> # Get the 'COG6' gene from human
>
> my $gene = $gene_adaptor->fetch_by_display_label('COG6');
>
> print "GENE ", $gene->stable_id(), "\n";
> # here you get gene coordinate
>
> foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
>      print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
>      #print transcript coordinates
>
>         foreach my $exon ( @{ $transcript->get_all_exons() } ) {
>         #print the exon coordinates
>
>         }
>      }
> }
>
> Hope this helps
>
> Benoit
>
>
> Anthony Ferrari wrote:
>  > Hi all,
>  >
>  > I want to do something relatively simple and I want to know how far
> Bioperl
>  > tools could help me because I'm having troubles to get to the point.
>  > Here is the pipeline :
>  >
>  > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
>  > "GeneStructure"
>  >
>  > (*) :
>  >>From the EntrezGene ID, I want to retrieve the structure of the gene
> which
>  > means having the whole genomic sequence and having the start and end
>  > positions of each exons, introns, UTR'....
>  >
>  > I thought of 2 ways to accomplish that :
>  >
>  >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
>  > desired positions.
>  >      this method should work but would take a little time to be ok.
>  >
>  >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
>  > obtain a Bio::Seq object but I am not able to find any features stored in
>  > it. So it doesn't seem that the get_Seq_by_id function get all
> information
>  > contained in a EntrezGene entry (?) .
>  >
>  > Can somebody help me to make the right choice or show me the right way?
>  >
>  > I also saw that some packages detinated to deal with  gene structure
> exist
>  > but I don't manage to know how to use it properly and even how to
> create one
>  > of those objects !
>  > Are those packages currently usable ?
>  >
>  >
>  > Thanks in advance.
>  > Best regards,
>  > tony
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From heikki at sanbi.ac.za  Thu May 17 02:46:44 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 17 May 2007 08:46:44 +0200
Subject: [Bioperl-l] Writing OBO fiies
Message-ID: <200705170846.44641.heikki@sanbi.ac.za>


I've started putting together Bio::OntologyIO::obo::write_ontology().
The current parser ignores a number of fields in common obo files.
If anyone knows any issues regarding adding more information into obo ontology 
object, shout now.

I need to start parsing at least "xref_analog" and "subset" to get a 
reasonable roundtrip of obo files representing cell ontology and sequence 
ontology.

I am not aiming at extending the existing ontology interfaces but simply 
patching obo parsing, but I am open to suggestions.

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bernd.web at gmail.com  Thu May 17 06:48:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 17 May 2007 12:48:07 +0200
Subject: [Bioperl-l] (Simple)Align
Message-ID: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>

Hi,

I am playing with alignment and would like to insert strings at
certain columns (so in all sequences in the alignment). I know about
the slice and remove_columns.
Is there already an insert_columns type of functionality?
Otherwise I'll just iterate over the sequences similar to
remove_columns (and give it a try to implement add_columns like
remove_columns).


Regards
Bernd


From Kevin.M.Brown at asu.edu  Thu May 17 11:17:04 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 17 May 2007 08:17:04 -0700
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B403284273@EX02.asurite.ad.asu.edu>

> I am playing with alignment and would like to insert strings 
> at certain columns (so in all sequences in the alignment). I 
> know about the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to 
> remove_columns (and give it a try to implement add_columns 
> like remove_columns).

Try reading the deobfuscator to see all the methods available to the
simplealign object.
http://bioperl.org/cgi-bin/deob_interface.cgi


From diogoat at gmail.com  Thu May 17 14:14:14 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Thu, 17 May 2007 15:14:14 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
Message-ID: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>

Hi Barry thank's for all your help,

I choose download the Invertebrates division of NCBI to machine...
but the I don't have thus script to get the sequences of the local file and
I know how to write...
i tried choose change in the script
the -db => 'nucleotide' for -db => 'local-gbdi.gb'
like I wrote below

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major',
                                -db     => '>local-gbdi.gb );
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

but didn't work because de Bio:DB::Query::GenBank is a perl module wich
conect at Ncbi to do my query and my Database is now local.

 I need the genomes of Trypanosoma cruzi, Trypanosoma brucei, Leishmania
major, Entamoeba and Plasmodium falciparum in the genbank format file.
Any Sugestion? Somebody have this script?
Help!
And thank's for the help!

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org


From barry.moore at genetics.utah.edu  Thu May 17 14:19:46 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 17 May 2007 12:19:46 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
	<638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
Message-ID: <F5104D8D-030D-4F01-884C-623B5F2D63CC@genetics.utah.edu>

Diogo-

Look at the bioperl documentation - there you will find a HowTo on  
SeqIO.  This will help you learn how to write scripts to load genbank  
flat files and you can then iterate over those files and check the  
organism to see if it's one that you want.  You should be able to  
find everything that you need in the documentation.

B

On May 17, 2007, at 12:14 PM, Diogo Tschoeke wrote:

> Hi Barry thank's for all your help,
>
> I choose download the Invertebrates division of NCBI to machine...
> but the I don't have thus script to get the sequences of the local  
> file and I know how to write...
> i tried choose change in the script
> the -db => 'nucleotide' for -db => 'local-gbdi.gb'
> like I wrote below
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major',
>                                 -db     => '>local-gbdi.gb );
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> but didn't work because de Bio:DB::Query::GenBank is a perl module  
> wich conect at Ncbi to do my query and my Database is now local.
>
>  I need the genomes of Trypanosoma cruzi, Trypanosoma brucei,  
> Leishmania major, Entamoeba and Plasmodium falciparum in the  
> genbank format file.
> Any Sugestion? Somebody have this script?
> Help!
> And thank's for the help!
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org


From torsten.seemann at infotech.monash.edu.au  Fri May 18 04:13:38 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 18 May 2007 18:13:38 +1000
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <46496E18.1000809@sendu.me.uk>
References: <46496E18.1000809@sendu.me.uk>
Message-ID: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>

Sendu,

> Back in August of last year I introduced Bio::PullParserI, a module that
> aids in the creation of fast SearchIO and Search modules. I've finally
> gotten around to implementing a Blast parser using the interface, which
> I've called Bio::SearchIO::blast_pull.
> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");
> Please try it out and feed-back any bugs you discover.

This is very cool!
Here's hoping NCBI don't change the default output format too much.

You should be able to add "rpsblast -p T" support as this is identical
to "blastall -p blastp" except for first line:
BLASTP 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]

The only problem is the (rarely used) "rpsblast -p F" mode which
looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
"Frame"

 Score = 29.6 bits (65), Expect = 0.26
 Identities = 10/26 (38%), Positives = 12/26 (46%)
 Frame = -1

BUT has the same header line, so you can't know -p F was used until
you see a "Frame = ??" in a hit (what were they thinking???).

TBLASTN 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...

Thanks for the good work. Shame I converted most of our systems to blastxml :-(

--Torsten


From cjfields at uiuc.edu  Fri May 18 09:39:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 08:39:05 -0500
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
References: <46496E18.1000809@sendu.me.uk>
	<a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
Message-ID: <2219EED8-F721-4586-B029-EF6CD9C32246@uiuc.edu>

I'll be looking at cleaning up SearchIO::blastxml soon myself.  It  
needs to be more memory-friendly with large XML files and PSI-BLAST  
iterations need to be addressed (nope, I haven't forgot about that!).

There is a XML::LibXML pull parser interface (XML::LibXML::Reader) we  
could look into...

chris

On May 18, 2007, at 3:13 AM, Torsten Seemann wrote:

> Sendu,
>
>> Back in August of last year I introduced Bio::PullParserI, a  
>> module that
>> aids in the creation of fast SearchIO and Search modules. I've  
>> finally
>> gotten around to implementing a Blast parser using the interface,  
>> which
>> I've called Bio::SearchIO::blast_pull.
>> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file =>  
>> "file");
>> Please try it out and feed-back any bugs you discover.
>
> This is very cool!
> Here's hoping NCBI don't change the default output format too much.
>
> You should be able to add "rpsblast -p T" support as this is identical
> to "blastall -p blastp" except for first line:
> BLASTP 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]
>
> The only problem is the (rarely used) "rpsblast -p F" mode which
> looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
> "Frame"
>
>  Score = 29.6 bits (65), Expect = 0.26
>  Identities = 10/26 (38%), Positives = 12/26 (46%)
>  Frame = -1
>
> BUT has the same header line, so you can't know -p F was used until
> you see a "Frame = ??" in a hit (what were they thinking???).
>
> TBLASTN 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...
>
> Thanks for the good work. Shame I converted most of our systems to  
> blastxml :-(
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri May 18 10:00:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 09:00:38 -0500
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <239FDEF1-38D4-47B8-AC71-514B61BDF9E0@uiuc.edu>

Sounds great to me!  Sohel Merchant might have some ideas...

chris

On May 17, 2007, at 1:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat May 19 20:54:11 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 20:54:11 -0400
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <221DB1CF-2F4E-47D4-80A8-D8D8BD777423@gmx.net>

Sounds great to me! -hilmar

On May 17, 2007, at 2:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat May 19 21:36:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 21:36:49 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
Message-ID: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>

FYI. Is it worth thinking about implementing a remote access  
interface to the CIPRES tree inference tools, similar to what we have  
for RemoteBlast?

	-hilmar

Begin forwarded message:

From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
Date: May 16, 2007 6:48:49 AM EDT
Subject: FW: release of cipres portal for tree inference

The CIPRES Central Resource team is pleased to announce the first public
release of the CIPRES portal for Tree Inference.

The portal is based on capabilities exposed by the Cipres software
libraries, which were constructed as a Joint Effort between Mark Holder
at Florida State University and the SDSC SW engineering team led by
Terri Liebowitz.

It currently presents Parsimony (PAUP) and Likelihood (GARLI and RAxML)
tools with or without boosting from RecIDCM3 created by Usman Roshan and
co-workers. Nexus and Phylip files are currently supported.

The site is available to all, and is underwritten by the CIPRES cluster
at SDSC.

The portal is fully supported by the SDSC team, with contributions and
new features introduced by the team in collaboration with Mark Holder
and Rutger Vos. At present weekly releases are made with improvements
and new features.

You can visit the portal at the Cipres Web Site.

http://www.phylo.org/sub_sections/portal.htm

Please forward this information to anyone you feel may find the
portal useful.

On behalf of the whole CIPRES team,

Mark

Mark A. Miller, PhD
Principal Investigator, Biology
San Diego Supercomputer Center
University of California, San Diego
La Jolla, CA, 92093-0505
Tel: 858-822-0866
Fax: 858-822-3610

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat May 19 22:10:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 19 May 2007 21:10:53 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
Message-ID: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>

I think it would be worthwhile.  Would we place it in bioperl-run?

chris

On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:

> FYI. Is it worth thinking about implementing a remote access
> interface to the CIPRES tree inference tools, similar to what we have
> for RemoteBlast?
>
> 	-hilmar
>
> Begin forwarded message:
>
> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
> Date: May 16, 2007 6:48:49 AM EDT
> Subject: FW: release of cipres portal for tree inference
>
> The CIPRES Central Resource team is pleased to announce the first  
> public
> release of the CIPRES portal for Tree Inference.
>
> The portal is based on capabilities exposed by the Cipres software
> libraries, which were constructed as a Joint Effort between Mark  
> Holder
> at Florida State University and the SDSC SW engineering team led by
> Terri Liebowitz.
>
> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
> RAxML)
> tools with or without boosting from RecIDCM3 created by Usman  
> Roshan and
> co-workers. Nexus and Phylip files are currently supported.
>
> The site is available to all, and is underwritten by the CIPRES  
> cluster
> at SDSC.
>
> The portal is fully supported by the SDSC team, with contributions and
> new features introduced by the team in collaboration with Mark Holder
> and Rutger Vos. At present weekly releases are made with improvements
> and new features.
>
> You can visit the portal at the Cipres Web Site.
>
> http://www.phylo.org/sub_sections/portal.htm
>
> Please forward this information to anyone you feel may find the
> portal useful.
>
> On behalf of the whole CIPRES team,
>
> Mark
>
> Mark A. Miller, PhD
> Principal Investigator, Biology
> San Diego Supercomputer Center
> University of California, San Diego
> La Jolla, CA, 92093-0505
> Tel: 858-822-0866
> Fax: 858-822-3610
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat May 19 22:19:47 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 22:19:47 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
Message-ID: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>

I guess so. That's where RemoteBlast is too, if I'm not mistaken?

What sucks about the UI from a programming perspective is that it  
goes through multiple screens. There may be a lot of screen-scraping.

	-hilmar

On May 19, 2007, at 10:10 PM, Chris Fields wrote:

> I think it would be worthwhile.  Would we place it in bioperl-run?
>
> chris
>
> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>
>> FYI. Is it worth thinking about implementing a remote access
>> interface to the CIPRES tree inference tools, similar to what we have
>> for RemoteBlast?
>>
>> 	-hilmar
>>
>> Begin forwarded message:
>>
>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>> Date: May 16, 2007 6:48:49 AM EDT
>> Subject: FW: release of cipres portal for tree inference
>>
>> The CIPRES Central Resource team is pleased to announce the first  
>> public
>> release of the CIPRES portal for Tree Inference.
>>
>> The portal is based on capabilities exposed by the Cipres software
>> libraries, which were constructed as a Joint Effort between Mark  
>> Holder
>> at Florida State University and the SDSC SW engineering team led by
>> Terri Liebowitz.
>>
>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
>> RAxML)
>> tools with or without boosting from RecIDCM3 created by Usman  
>> Roshan and
>> co-workers. Nexus and Phylip files are currently supported.
>>
>> The site is available to all, and is underwritten by the CIPRES  
>> cluster
>> at SDSC.
>>
>> The portal is fully supported by the SDSC team, with contributions  
>> and
>> new features introduced by the team in collaboration with Mark Holder
>> and Rutger Vos. At present weekly releases are made with improvements
>> and new features.
>>
>> You can visit the portal at the Cipres Web Site.
>>
>> http://www.phylo.org/sub_sections/portal.htm
>>
>> Please forward this information to anyone you feel may find the
>> portal useful.
>>
>> On behalf of the whole CIPRES team,
>>
>> Mark
>>
>> Mark A. Miller, PhD
>> Principal Investigator, Biology
>> San Diego Supercomputer Center
>> University of California, San Diego
>> La Jolla, CA, 92093-0505
>> Tel: 858-822-0866
>> Fax: 858-822-3610
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sun May 20 01:06:53 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 19 May 2007 22:06:53 -0700
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
Message-ID: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>

technically remoteblast is in bioperl-live, but for historical/ease  
of user-install purposes (i.e. so many people want to use blast out  
of the box, we kept it in bioperl-live to not force them to install  
bioperl-run).

I think it would be great to have the interface - can we do it all  
via HTTP or will it require some installation of client software and/ 
or CORBA?

-jason
On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:

> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>
> What sucks about the UI from a programming perspective is that it
> goes through multiple screens. There may be a lot of screen-scraping.
>
> 	-hilmar
>
> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>
>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>
>> chris
>>
>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>
>>> FYI. Is it worth thinking about implementing a remote access
>>> interface to the CIPRES tree inference tools, similar to what we  
>>> have
>>> for RemoteBlast?
>>>
>>> 	-hilmar
>>>
>>> Begin forwarded message:
>>>
>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>> Date: May 16, 2007 6:48:49 AM EDT
>>> Subject: FW: release of cipres portal for tree inference
>>>
>>> The CIPRES Central Resource team is pleased to announce the first
>>> public
>>> release of the CIPRES portal for Tree Inference.
>>>
>>> The portal is based on capabilities exposed by the Cipres software
>>> libraries, which were constructed as a Joint Effort between Mark
>>> Holder
>>> at Florida State University and the SDSC SW engineering team led by
>>> Terri Liebowitz.
>>>
>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>> RAxML)
>>> tools with or without boosting from RecIDCM3 created by Usman
>>> Roshan and
>>> co-workers. Nexus and Phylip files are currently supported.
>>>
>>> The site is available to all, and is underwritten by the CIPRES
>>> cluster
>>> at SDSC.
>>>
>>> The portal is fully supported by the SDSC team, with contributions
>>> and
>>> new features introduced by the team in collaboration with Mark  
>>> Holder
>>> and Rutger Vos. At present weekly releases are made with  
>>> improvements
>>> and new features.
>>>
>>> You can visit the portal at the Cipres Web Site.
>>>
>>> http://www.phylo.org/sub_sections/portal.htm
>>>
>>> Please forward this information to anyone you feel may find the
>>> portal useful.
>>>
>>> On behalf of the whole CIPRES team,
>>>
>>> Mark
>>>
>>> Mark A. Miller, PhD
>>> Principal Investigator, Biology
>>> San Diego Supercomputer Center
>>> University of California, San Diego
>>> La Jolla, CA, 92093-0505
>>> Tel: 858-822-0866
>>> Fax: 858-822-3610
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0003.bin>

From bernd.web at gmail.com  Sun May 20 10:56:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 20 May 2007 16:56:07 +0200
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
	<C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
Message-ID: <716af09c0705200756h46bf2134x3d6841d2a98744c0@mail.gmail.com>

Hi

I have made a simple add_columns function in SimpleAlign along the
lines of remove_columns. I only need to insert characters that are the
same for all sequences:

=head2 add_columns

 Title     : add_columns
  Usage     : $aln2 = $aln->add_columns([0, 10, '.'], [12, 15])
  Function  : Creates an alignment with columns added by specifying
the columns by number and supplying the character (optional) to insert
in all sequences. Default character is gap_char.
  Returns   : Bio::SimpleAlign object
  Args      : Array ref where the referenced array contains a pair of
integers that
             that specify a column range and optionally the character to insert.
             The first column is 0.

=cut

The functionalilty could be extended:
- possibility to supply a string to insert (for all sequences)
- possibility to define the string to insert on a per sequence basis
(although this may be more transparant to do outside SimpleAlign).

After some final checks I could supply it (e.g. via bugzilla).


Regards,
Bernd


On 5/17/07, Jason Stajich <jason at bioperl.org> wrote:
> not yet - when I did this to insert intron positions I just manipulated the
> sequence strings outside of SimpleAlign, but I think it would be nice to
> have an insert function.
>
> -jason
>
> On May 17, 2007, at 3:48 AM, Bernd Web wrote:
>
> Hi,
>
> I am playing with alignment and would like to insert strings at
> certain columns (so in all sequences in the alignment). I know about
> the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to
> remove_columns (and give it a try to implement add_columns like
> remove_columns).
>
>
> Regards
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


From hlapp at gmx.net  Sun May 20 11:59:03 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 20 May 2007 11:59:03 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
Message-ID: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>

Just HTTP, no CORBA or other stuff needed client-side.

Ultimately it would of course be nice if they offered a more SOA  
compliant interface too, to obviate the screen-scraping need.  
However, if I understand the UI correctly the screen scraping is - if  
at all - only needed for walking through the steps, and for  
extracting the location of the result. The result itself is in NEXUS  
format, as a separate file.

	-hilmar

On May 20, 2007, at 1:06 AM, Jason Stajich wrote:

> technically remoteblast is in bioperl-live, but for historical/ease  
> of user-install purposes (i.e. so many people want to use blast out  
> of the box, we kept it in bioperl-live to not force them to install  
> bioperl-run).
>
> I think it would be great to have the interface - can we do it all  
> via HTTP or will it require some installation of client software  
> and/or CORBA?
>
> -jason
> On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:
>
>> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>>
>> What sucks about the UI from a programming perspective is that it
>> goes through multiple screens. There may be a lot of screen-scraping.
>>
>> 	-hilmar
>>
>> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>>
>>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>>
>>> chris
>>>
>>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>>
>>>> FYI. Is it worth thinking about implementing a remote access
>>>> interface to the CIPRES tree inference tools, similar to what we  
>>>> have
>>>> for RemoteBlast?
>>>>
>>>> 	-hilmar
>>>>
>>>> Begin forwarded message:
>>>>
>>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>>> Date: May 16, 2007 6:48:49 AM EDT
>>>> Subject: FW: release of cipres portal for tree inference
>>>>
>>>> The CIPRES Central Resource team is pleased to announce the first
>>>> public
>>>> release of the CIPRES portal for Tree Inference.
>>>>
>>>> The portal is based on capabilities exposed by the Cipres software
>>>> libraries, which were constructed as a Joint Effort between Mark
>>>> Holder
>>>> at Florida State University and the SDSC SW engineering team led by
>>>> Terri Liebowitz.
>>>>
>>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>>> RAxML)
>>>> tools with or without boosting >from RecIDCM3 created by Usman
>>>> Roshan and
>>>> co-workers. Nexus and Phylip files are currently supported.
>>>>
>>>> The site is available to all, and is underwritten by the CIPRES
>>>> cluster
>>>> at SDSC.
>>>>
>>>> The portal is fully supported by the SDSC team, with contributions
>>>> and
>>>> new features introduced by the team in collaboration with Mark  
>>>> Holder
>>>> and Rutger Vos. At present weekly releases are made with  
>>>> improvements
>>>> and new features.
>>>>
>>>> You can visit the portal at the Cipres Web Site.
>>>>
>>>> http://www.phylo.org/sub_sections/portal.htm
>>>>
>>>> Please forward this information to anyone you feel may find the
>>>> portal useful.
>>>>
>>>> On behalf of the whole CIPRES team,
>>>>
>>>> Mark
>>>>
>>>> Mark A. Miller, PhD
>>>> Principal Investigator, Biology
>>>> San Diego Supercomputer Center
>>>> University of California, San Diego
>>>> La Jolla, CA, 92093-0505
>>>> Tel: 858-822-0866
>>>> Fax: 858-822-3610
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Mon May 21 11:19:56 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 10:19:56 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
Message-ID: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>

Sounds like time to bust out WWW::Mechanize.  I didn't step through
the whole process, but the first screen/step looks okay.  Plain HTML
form with plain buttons.  Looks like the Javascript is only getting
involved for client-side sanity checking.  Should be easy to automate
(Don't look at me, I've bitten off a bit too much as it is).

On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> Just HTTP, no CORBA or other stuff needed client-side.
>
> Ultimately it would of course be nice if they offered a more SOA
> compliant interface too, to obviate the screen-scraping need.
> However, if I understand the UI correctly the screen scraping is - if
> at all - only needed for walking through the steps, and for
> extracting the location of the result. The result itself is in NEXUS
> format, as a separate file.
>
>         -hilmar


From cjfields at uiuc.edu  Mon May 21 16:11:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:11:36 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
	<ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
Message-ID: <61E0D74B-77F7-499B-A0B7-B1E5106964E6@uiuc.edu>

It would be nice to have a generalized interface (SOAP, CGI,  
anything), as Hilmar states.  I agree WWW::Mechanize is prob. the way  
to go for now.  Don't know who wants to take it up...

chris

On May 21, 2007, at 10:19 AM, Mark Johnson wrote:

> Sounds like time to bust out WWW::Mechanize.  I didn't step through
> the whole process, but the first screen/step looks okay.  Plain HTML
> form with plain buttons.  Looks like the Javascript is only getting
> involved for client-side sanity checking.  Should be easy to automate
> (Don't look at me, I've bitten off a bit too much as it is).
>
> On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> Just HTTP, no CORBA or other stuff needed client-side.
>>
>> Ultimately it would of course be nice if they offered a more SOA
>> compliant interface too, to obviate the screen-scraping need.
>> However, if I understand the UI correctly the screen scraping is - if
>> at all - only needed for walking through the steps, and for
>> extracting the location of the result. The result itself is in NEXUS
>> format, as a separate file.
>>
>>         -hilmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon May 21 16:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:35:41 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
Message-ID: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>

On May 16, 2007, at 2:11 PM, Mark Johnson wrote:

> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> I believe all seqfeature location coordinates are designed to have
>> start < stop for consistency; in cases where the strand matters (CDS,
>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>> the two are reversed and the strand is flipped; at least that's the
>> way locations are set up in BioPerl.
>>
>> chris
>
>     Oh yeah?  I always tend to ensure that (start < stop), regardless
> of strand, when working with sequence features...the other day, I
> caught Glimmer2 emitting a prediction on the plus strand with start >
> stop.  I was going to work up a patch for the parser, but I wonder,
> should I just force everything to start < stop?  Or only predictions
> on the plus strand?  Should all the parsers for all the ab initio
> predictors ensure they emit features with coordinates like this?

Odd that it would predict a start > stop on the plus strand, though  
it may be corrected in Glimmer3.  Does the same prediction show up in  
Glimmer3?

chris


From johnsonm at gmail.com  Mon May 21 16:48:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 15:48:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>

Check the test data for Glimmer2 and Glimmer3.  They both predict one
large gene, I'd guess covering most of the sequence, in frame +1.
That's probably a bogus prediction, but that's not up to the parser to
decide.  I hadn't noticed it until recently.

I sent a patch via bugzilla to swap the coordinates if start > end and
strand > 0.

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
> > On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> I believe all seqfeature location coordinates are designed to have
> >> start < stop for consistency; in cases where the strand matters (CDS,
> >> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> >> the two are reversed and the strand is flipped; at least that's the
> >> way locations are set up in BioPerl.
> >>
> >> chris
> >
> >     Oh yeah?  I always tend to ensure that (start < stop), regardless
> > of strand, when working with sequence features...the other day, I
> > caught Glimmer2 emitting a prediction on the plus strand with start >
> > stop.  I was going to work up a patch for the parser, but I wonder,
> > should I just force everything to start < stop?  Or only predictions
> > on the plus strand?  Should all the parsers for all the ab initio
> > predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris
>


From cjfields at uiuc.edu  Mon May 21 16:56:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:56:50 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <6186D928-A47E-4EED-B06A-50E25A4893CC@uiuc.edu>

On May 21, 2007, at 3:35 PM, Chris Fields wrote:

> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
>> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> I believe all seqfeature location coordinates are designed to have
>>> start < stop for consistency; in cases where the strand matters  
>>> (CDS,
>>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>>> the two are reversed and the strand is flipped; at least that's the
>>> way locations are set up in BioPerl.
>>>
>>> chris
>>
>>     Oh yeah?  I always tend to ensure that (start < stop), regardless
>> of strand, when working with sequence features...the other day, I
>> caught Glimmer2 emitting a prediction on the plus strand with start >
>> stop.  I was going to work up a patch for the parser, but I wonder,
>> should I just force everything to start < stop?  Or only predictions
>> on the plus strand?  Should all the parsers for all the ab initio
>> predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris

... and I see that it does (per your bug report).  The next thing to  
ask is how often these odd Glimmer hits occur and whether others have  
seen the same thing.  Maybe there's an explanation (bug, etc) but I  
can't immediately think of anything that makes sense unless it's  
running the reverse of the + strand as a control for some reason.

chris


From cjfields at uiuc.edu  Mon May 21 17:17:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 16:17:37 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
Message-ID: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>


On May 21, 2007, at 3:48 PM, Mark Johnson wrote:

> Check the test data for Glimmer2 and Glimmer3.  They both predict one
> large gene, I'd guess covering most of the sequence, in frame +1.
> That's probably a bogus prediction, but that's not up to the parser to
> decide.  I hadn't noticed it until recently.
>
> I sent a patch via bugzilla to swap the coordinates if start > end and
> strand > 0.

I think I know what it is.  If you mean these predictions:

Glimmer2:

    27    29263        6  [+1 L= 684 r=-1.187]

Glimmer3:

orf00001    29263        9  +1     9.60

Glimmer2/3 are predicting a gene for a circular chromosome that  
starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off  
the stop codon).  Note in Glimmer2 detailed output the end is 29946  
and the length of the sequence is 29940, so Glimmer2 artificially  
extends the end of the sequence with part of the start.

This is handled as a split location in bioperl and in most GenBank  
files; the above would be a location string like 'join 
(29263..29940,1..9)'.  If you switched the start and stop the  
location would be '9..29263' which wouldn't be correct (and would be  
a huge gene).

chris


From johnsonm at gmail.com  Mon May 21 17:21:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 16:21:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
Message-ID: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>

    That makes sense.  Is that behavior documented anywhere?  I'll
feel like less of an idiot if it's not.  8)  Either way, if you're
sure that's whats going on, I'll fix up the parser to handle that as a
split location.

> I think I know what it is.  If you mean these predictions:
>
> Glimmer2:
>
>     27    29263        6  [+1 L= 684 r=-1.187]
>
> Glimmer3:
>
> orf00001    29263        9  +1     9.60
>
> Glimmer2/3 are predicting a gene for a circular chromosome that
> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> and the length of the sequence is 29940, so Glimmer2 artificially
> extends the end of the sequence with part of the start.
>
> This is handled as a split location in bioperl and in most GenBank
> files; the above would be a location string like 'join
> (29263..29940,1..9)'.  If you switched the start and stop the
> location would be '9..29263' which wouldn't be correct (and would be
> a huge gene).
>
> chris
>


From cjfields at uiuc.edu  Mon May 21 19:13:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 18:13:24 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
Message-ID: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>

glimmer2/3 both assume the genome is circular by default (I'm  
assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to  
the Glimmer3 release notes the detail file has the information in the  
header; from the Glimmer3 data used for tests:

Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA  
Glimmer3.icm Glimmer3

Sequence file = ../BCTDNA
ICM model file = Glimmer3.icm
Excluded regions file = none
List of orfs file = none
Truncated orfs = false
Circular genome = true
...

There are options available for glimmer3 (-L, -X) that specify a  
linear sequence or allow ORFs to extend past the end of the sequence  
analyzed (the latter assumes a linear sequence).

chris

On May 21, 2007, at 4:21 PM, Mark Johnson wrote:

>     That makes sense.  Is that behavior documented anywhere?  I'll
> feel like less of an idiot if it's not.  8)  Either way, if you're
> sure that's whats going on, I'll fix up the parser to handle that as a
> split location.
>
>> I think I know what it is.  If you mean these predictions:
>>
>> Glimmer2:
>>
>>     27    29263        6  [+1 L= 684 r=-1.187]
>>
>> Glimmer3:
>>
>> orf00001    29263        9  +1     9.60
>>
>> Glimmer2/3 are predicting a gene for a circular chromosome that
>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>> and the length of the sequence is 29940, so Glimmer2 artificially
>> extends the end of the sequence with part of the start.
>>
>> This is handled as a split location in bioperl and in most GenBank
>> files; the above would be a location string like 'join
>> (29263..29940,1..9)'.  If you switched the start and stop the
>> location would be '9..29263' which wouldn't be correct (and would be
>> a huge gene).
>>
>> chris
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnsonm at gmail.com  Mon May 21 19:57:03 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 18:57:03 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>

    Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
this for a fix?  For plus strand predictions with start > end, use a
split location.  For minus strand predictions with start < end, use a
split location.  Without knowing the length of the sequence, that's
the best that can be done, I think.
    Unless there are objections, I'll go code that up.  Close that bug
out as 'requester is an idiot'.  8)

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:
>
> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3
>
> Sequence file = ../BCTDNA
> ICM model file = Glimmer3.icm
> Excluded regions file = none
> List of orfs file = none
> Truncated orfs = false
> Circular genome = true
> ...
>
> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).
>
> chris
>
> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>
> >     That makes sense.  Is that behavior documented anywhere?  I'll
> > feel like less of an idiot if it's not.  8)  Either way, if you're
> > sure that's whats going on, I'll fix up the parser to handle that as a
> > split location.
> >
> >> I think I know what it is.  If you mean these predictions:
> >>
> >> Glimmer2:
> >>
> >>     27    29263        6  [+1 L= 684 r=-1.187]
> >>
> >> Glimmer3:
> >>
> >> orf00001    29263        9  +1     9.60
> >>
> >> Glimmer2/3 are predicting a gene for a circular chromosome that
> >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> >> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> >> and the length of the sequence is 29940, so Glimmer2 artificially
> >> extends the end of the sequence with part of the start.
> >>
> >> This is handled as a split location in bioperl and in most GenBank
> >> files; the above would be a location string like 'join
> >> (29263..29940,1..9)'.  If you switched the start and stop the
> >> location would be '9..29263' which wouldn't be correct (and would be
> >> a huge gene).
> >>
> >> chris
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From torsten.seemann at infotech.monash.edu.au  Mon May 21 20:29:47 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 22 May 2007 10:29:47 +1000
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>

> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:

You beat me to the reply Chris - yes, Glimmer2/3 assume circular
chromosome by default. I had forgotten about this in earlier
discussions of the new Glimmer parsers as I normally run it in
--linear / -L mode (even if I know it is circular) because it is
easier to handle, and our sequencer/assembler team usually gets the
origin of replication right.

> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3

I did a double-take here - that's the path to my Glimmer3
installation! It took me a couple of minutes to realise that you got
it from the bioperl test data I created. D'oh! :-)

> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).

If the -L mode should produce Bio::Location::Split objects, I guess if
-X is used
it should produce Bio::Location::Fuzzy objects too...

--Torsten


From cjfields at uiuc.edu  Mon May 21 20:59:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 19:59:20 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
Message-ID: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>

You can add the necessary patch to the bug report when it's ready; no  
need to close it out.

The most complete file format to parse seems to be the details file;  
it contains the sequence length:

 >BCTDNA
Sequence length = 29940

which can be used for the split location.  As Torsten points out, use  
of -X could also potentially produce fuzzy locations.

Since the parser currently only parses predict files, you could  
optionally supply the parser with the seq length and emit a warning  
if seqfeatures requiring it are produced, such as the sporadic ones  
which wrap around.  If one were using the bioperl-run module this  
could be automated a bit by passing the seq length in to the parser  
object by adding the seq length to the constructor argument list.

chris

On May 21, 2007, at 6:57 PM, Mark Johnson wrote:

>     Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
> this for a fix?  For plus strand predictions with start > end, use a
> split location.  For minus strand predictions with start < end, use a
> split location.  Without knowing the length of the sequence, that's
> the best that can be done, I think.
>     Unless there are objections, I'll go code that up.  Close that bug
> out as 'requester is an idiot'.  8)
>
> On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>>
>> Sequence file = ../BCTDNA
>> ICM model file = Glimmer3.icm
>> Excluded regions file = none
>> List of orfs file = none
>> Truncated orfs = false
>> Circular genome = true
>> ...
>>
>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>>
>> chris
>>
>> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>>
>>>     That makes sense.  Is that behavior documented anywhere?  I'll
>>> feel like less of an idiot if it's not.  8)  Either way, if you're
>>> sure that's whats going on, I'll fix up the parser to handle that  
>>> as a
>>> split location.
>>>
>>>> I think I know what it is.  If you mean these predictions:
>>>>
>>>> Glimmer2:
>>>>
>>>>     27    29263        6  [+1 L= 684 r=-1.187]
>>>>
>>>> Glimmer3:
>>>>
>>>> orf00001    29263        9  +1     9.60
>>>>
>>>> Glimmer2/3 are predicting a gene for a circular chromosome that
>>>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>>>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>>>> and the length of the sequence is 29940, so Glimmer2 artificially
>>>> extends the end of the sequence with part of the start.
>>>>
>>>> This is handled as a split location in bioperl and in most GenBank
>>>> files; the above would be a location string like 'join
>>>> (29263..29940,1..9)'.  If you switched the start and stop the
>>>> location would be '9..29263' which wouldn't be correct (and  
>>>> would be
>>>> a huge gene).
>>>>
>>>> chris
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon May 21 21:00:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 20:00:58 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
Message-ID: <E22A8442-E00D-4732-9D80-EE61C75732B7@uiuc.edu>


On May 21, 2007, at 7:29 PM, Torsten Seemann wrote:

>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>
> You beat me to the reply Chris - yes, Glimmer2/3 assume circular
> chromosome by default. I had forgotten about this in earlier
> discussions of the new Glimmer parsers as I normally run it in
> --linear / -L mode (even if I know it is circular) because it is
> easier to handle, and our sequencer/assembler team usually gets the
> origin of replication right.
>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>
> I did a double-take here - that's the path to my Glimmer3
> installation! It took me a couple of minutes to realise that you got
> it from the bioperl test data I created. D'oh! :-)

Yep, I forgot about that!

>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>
> If the -L mode should produce Bio::Location::Split objects, I guess if
> -X is used
> it should produce Bio::Location::Fuzzy objects too...
>
> --Torsten

True, didn't think about that one.  Def. something to consider adding  
in.

chris


From johnsonm at gmail.com  Tue May 22 14:04:31 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 22 May 2007 13:04:31 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
	<A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
Message-ID: <ebf5eb170705221104s486ff488u1d8c0b87dd193861@mail.gmail.com>

Yes, Glimmer3 outputs the length of the input sequence.  I don't
believe Glimmer2 does.

> The most complete file format to parse seems to be the details file;
> it contains the sequence length:
>
>  >BCTDNA
> Sequence length = 29940

> Since the parser currently only parses predict files, you could
> optionally supply the parser with the seq length and emit a warning
> if seqfeatures requiring it are produced, such as the sporadic ones
> which wrap around.  If one were using the bioperl-run module this
> could be automated a bit by passing the seq length in to the parser
> object by adding the seq length to the constructor argument list.

I think we can spot wrap-around genes easily enough without knowing
the length of the input sequence.  Having it just means we can perform
a sanity check or two, such as making sure 'wraparound' genes are
within N bases of the end of the input sequence.  Any suggestions on a
good default value for N?

Parsing both output files for glimmer3 will be a little tricky.  The
constructor for Bio::Tools::Glimmer calls $class->SUPER::new(@args);,
which hits the constructor for Bio::Tools::AnalysisResult, which does
the same thing.  It all ends up in Bio::Root::IO::_initialize_io,
which grabs the -file arg and opens it.  So, either let, Bio::Root::IO
handle -file and have Bio::Tools::Glimmer handle, say -detail file, or
have Bio::Tools::Glimmer just implement   intialize_io() and hopefully
that will fly..


From ClarkeW at AGR.GC.CA  Tue May 22 17:10:08 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Tue, 22 May 2007 15:10:08 -0600
Subject: [Bioperl-l] TextResultWriter
Message-ID: <C278B850.1002%ClarkeW@AGR.GC.CA>

Hi, 

I am interested in becoming a bioperl developer as I have recently found a
bug in TextResultWriter. I know that I should submit the bug fixes using the
protocol outlined in the How To but I haven't been able to login to the CVS
anonymously to check it out. However, I have checked that the bug still
exists in the most recent version of the code using the web interface to the
CVS repositories. The bug is between lines 433 and 443, and deals with the
reporting of the number of letters in the database and the number of entries
in the database. My fix would be to change the existing code block:

from:

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_entries()),
        &_numwithcommas($result->database_letters()),
        $result->get_parameter('matrix') || '');

to: 

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_letters()),
        &_numwithcommas($result->database_entries()),
        $result->get_parameter('matrix') || '');

I believe that this is a simple enough modification that it does not require
any new test cases.

Cheers, Wayne


From dmessina at wustl.edu  Wed May 23 02:06:52 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 23 May 2007 01:06:52 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C278B850.1002%ClarkeW@AGR.GC.CA>
References: <C278B850.1002%ClarkeW@AGR.GC.CA>
Message-ID: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>

Hi Wayne,

I submitted the bug report on your behalf

	http://bugzilla.open-bio.org/show_bug.cgi?id=2300

and committed your patch. Thanks for reporting this, and thanks even  
more for including a patch!

Regarding your trouble checking out the repository via anonymous CVS,  
could you post the transcript of your attempt so we can get a better  
look at what's going wrong?

Dave


From ClarkeW at AGR.GC.CA  Wed May 23 10:39:17 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Wed, 23 May 2007 08:39:17 -0600
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>
Message-ID: <C279AE35.1008%ClarkeW@AGR.GC.CA>

With regards to not being able to connect, I have discovered that the reason
I cannot connect is that our firewall is blocking my access. It appears that
I am not the first person to have this problem but that the people in charge
are firm in their position to block the anonymous access port. However, if I
obtain a developer account I will be able to access the CVS.

Cheers, Wayne


On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:

> Hi Wayne,
> 
> I submitted the bug report on your behalf
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
> 
> and committed your patch. Thanks for reporting this, and thanks even
> more for including a patch!
> 
> Regarding your trouble checking out the repository via anonymous CVS,
> could you post the transcript of your attempt so we can get a better
> look at what's going wrong?
> 
> Dave
> 
> 


From cjfields at uiuc.edu  Wed May 23 12:16:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 23 May 2007 11:16:32 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C279AE35.1008%ClarkeW@AGR.GC.CA>
References: <C279AE35.1008%ClarkeW@AGR.GC.CA>
Message-ID: <7077B4AB-A3B5-4EAE-9994-0EF629D2DE2B@uiuc.edu>

You can always use the browsable CVS link to download a tarball if  
that works for you.

http://www.bioperl.org/wiki/Using_CVS
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/? 
cvsroot=bioperl

The link to download is at the bottom of the page.

chris

On May 23, 2007, at 9:39 AM, ClarkeW wrote:

> With regards to not being able to connect, I have discovered that  
> the reason
> I cannot connect is that our firewall is blocking my access. It  
> appears that
> I am not the first person to have this problem but that the people  
> in charge
> are firm in their position to block the anonymous access port.  
> However, if I
> obtain a developer account I will be able to access the CVS.
>
> Cheers, Wayne
>
>
> On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:
>
>> Hi Wayne,
>>
>> I submitted the bug report on your behalf
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
>>
>> and committed your patch. Thanks for reporting this, and thanks even
>> more for including a patch!
>>
>> Regarding your trouble checking out the repository via anonymous CVS,
>> could you post the transcript of your attempt so we can get a better
>> look at what's going wrong?
>>
>> Dave
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Xianjun.Dong at bccs.uib.no  Tue May 29 07:57:39 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 13:57:39 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
Message-ID: <465C1533.6070900@ii.uib.no>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kaks_methods.pl
Type: application/x-perl
Size: 2732 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment-0003.bin>

From avilella at gmail.com  Tue May 29 09:02:44 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:02:44 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>

codeml in PAML can give different results in cases where the optimization
reaches different local maxima depending on the different starting points of
each run (seed values). So, at least for some methods and options, this
instability is inherent to the underlying algorithm.

Even more, for some methods and options, it is even recommended in PAML
documentation to run the same data more than once, to see if the results are
the same, which would be a good indication that the model is robust given
the data.

Maybe PAML's author can give a more specific answer for your data at:

http://www.rannala.org/gsf/viewforum.php?f=1

Cheers,

    Albert.

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
>
> I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> for my sequences. Could you help me?
>
> 1.  Codeml could produce different Ka/Ks value if I run it twice. I check
> it both in command line and in Perl wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml;
>
> The input sequences are:
> >seq1
> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> >seq2
>
> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
>
> For command-line program, I used Codeml in PAML3.14, with specifications
> in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program four
> times.  The output are like below (from the output file). We could see that
> they are different from each other. they should be same or slightly
> different. Right? But they are NOT.  Weird!
>
> ----------------------------------------------------------------------------------------------------------------------------------
> t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
>
> ----------------------------------------------------------------------------------------------------------------------------------
> I found the same problem when I use the Perl Wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> similar to the one in BioPerl HOWTO).
>
> 2. Another strange thing is, if I switch to use program YN00 in the
> package of PAML, the output are stable. However, it's much different from
> Codeml. (see below)
>
> ----------------------------------------------------------------------------------------------------------------------------------
> seq. seq.     S       N        t   kappa    omega      dN +- SE
> dS +- SE
>    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> 2.1300 +- 1.2272
>
> ----------------------------------------------------------------------------------------------------------------------------------
> Why like this? Which one I should believe?
>
>
> Is there any guy who would kindly help me to run the perl script (twice to
> check whether they are different)? or help to run the codeml in command
> line?
> I don't know whether there is anyone noticed this before, or because of
> the wrong version of PAML.
>
> Regards,
>
> Xianjun
>
>
>
> Himanshu Ardawatia wrote:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
>
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::Tools::Run::Alignment::Clustalw;
>
> # for projecting alignments from protein to R/DNA space
> use Bio::Align::Utilities qw(aa_to_dna_aln);
>
> # for input of the sequence data
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>
> #my $seqdata = 'chuck.fa';
> my $seqdata = 'xianjun.fa ';
>
> my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>                            -format => 'fasta');
> my %seqs;
> my @prots;
>
> my $output;
> # process each sequence
> while( my $seq = $seqIO->next_seq ) {
>     $seqs{$seq->display_id} = $seq;
>     # translate them into protein
>     my $protein = $seq->translate();
>     my $pseq = $protein->seq();
>     if( $pseq =~ /\*/ &&
>     $pseq !~ /\*$/ ) {
>     warn("provided a cDNA sequence with a stop codon, PAML will choke!");
>     exit(0);
>     }
>     # Tcoffee can't handle '*' even if it is trailing
>     $pseq =~ s/\*//g;
>     $protein->seq($pseq);
>     push @prots, $protein;
> }
>
> if( @prots < 2 ) {
>     warn("Need at least 2 cDNA sequences to proceed");
>     exit(0);
> }
>
> open(OUT, ">align_output.txt") ||
>       die("cannot open output $output for writing");
> # Align the sequences with clustalw
>
> my $aa_aln = $aln_factory->align(\@prots);
>
> # project the protein alignment back to cDNA coordinates
> my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>
> my @each = $dna_aln->each_seq();
>
> my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>                   ( -params => { 'runmode' => -2,
>                          'seqtype' => 1,
>                  'model' => 1,
>                 }
>               );
>
> # set the alignment object
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map {
>     my $c= 1;
>     foreach my $s ( @each ) {
>     last if( $s->display_id eq $_->display_id );
>     $c++;
>     }
>     $c;
> } @otus;
>
> print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> CDNA_PERCENTID)), "\n";
> for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print OUT join("\t",
>                $otus[$i]->display_id,
>                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                $MLmatrix->[$i]->[$j]->{'dS'},
>                $MLmatrix->[$i]->[$j]->{'omega'},
>                sprintf("%.2f",$sub_aa_aln->percentage_identity),
>                sprintf("%.2f",$sub_dna_aln->percentage_identity),
>                ), "\n";
>     }
> }
>
>
> On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> >
> > Hi Xianjun,
> >
> > I recognize this script. But it was a bit cumbersom to use this as many
> > things are done in the script (like multiple alignment, aa to dna alignment
> > and ka/ks calculation) so one does not have real control on these different
> > aspect.
> > I do not remeber getting different Ka/Ks in different runs though. But I
> > remeber that one I ran the script with different versions of clustalw and it
> > REALLY gave different results !! So please make sure if the clustalw
> > versions are the same in all your runs. Best is to use the latest version.
> >
> > Finally I wrote my simple script which would generate a codeml.ctl file
> > for each set of sequences and run the codeml based on that and then more on.
> > Disadvantage of this can be that some files keep getting over-written (like
> > the one which have their names hard-coded in codeml program) and if one
> > needs those files as well then one needs to run the codeml cycles for each
> > set of sequences in different directories.
> >
> > One advantage of this kind of script is that you can use whichever
> > alignment program you want to use and so on....But then its also extra steps
> > of yourself doing multiple alignment and aa to dna alignment etc....
> >
> > Does it make sense? If you still get different outputs with same version
> > of clustalw then I can sit with you and look at things together. Or else try
> > the script method which I mentioned.
> >
> > Cheers  and Fu
> > Himanshu
> > \\
> > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > >
> > > HI, Himanshu
> > >
> > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > question
> > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > not
> > > stable(different for each runtime), and also different from the output
> > >
> > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > >
> > > Here I attached the script. Could you help to have a look and try to
> > > run
> > > the script? How is your way to calculate the Kaks ratio?
> > >
> > > Thanks
> > >
> > > --
> > > ---------------------------
> > > Sterding (Xianjun) Dong
> > > PhD student, Boris Lenhard's group
> > > Bergen Center of Computational Science
> > > Bergen University, Norway
> > > Mobile: 0047-47361688
> > > Telephone: 0047-55276381
> > > Skype: xianjun.dong
> > >
> > >
> > >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From Xianjun.Dong at bccs.uib.no  Tue May 29 09:30:09 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 15:30:09 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>	
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
Message-ID: <465C2AE1.30101@ii.uib.no>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/532a333d/attachment-0003.html>

From avilella at gmail.com  Tue May 29 09:45:28 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:45:28 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2AE1.30101@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no>
Message-ID: <358f4d650705290645s65f596cbp37715f12064a5ced@mail.gmail.com>

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  Thanks for information, Albert.
>
> But still in two questions:
> Albert Vilella wrote:
>
> codeml in PAML can give different results in cases where the optimization
> reaches different local maxima depending on the different starting points of
> each run (seed values). So, at least for some methods and options, this
> instability is inherent to the underlying algorithm.
>
> 1. How to set the initial value in order to get a reasonable estimation?
> Do you have some experience for that?
>

People usually change the initial omega in the conf. For example, 3 runs
with 0.001, 1 and 5.

Even more, for some methods and options, it is even recommended in PAML
> documentation to run the same data more than once, to see if the results are
> the same, which would be a good indication that the model is robust given
> the data.
>
> 2. Is there a recommend way to test the significance if the results are
> different? For example, in my case, dS could range from 10.1852 to 14.9961for the four runtime. If that means the model is not robust(how to check
> this?), should I change to use another model?
>

I would prefer PAML's author to answer this question :)

How could YN00 reach stable result? (Is it because YN00 does not require
> initial value for optimization?) Why could YN00 produce so different result
> from Codeml? (for YN00, dS=2.1300 with SE=1.2272; for Codeml, dS=
> 10.1852-14.9961)
>

I think Yn00 is less prone to give different local maxima than some codeml
models, but then, codeml is better in giving true positives in cases where
yn00 will give false negatives...

Maybe PAML's author can give a more specific answer for your data at:
> http://www.rannala.org/gsf/viewforum.php?f=1
>
>
> Actually I already post my question in the author's forum. Let's wait and
> see.
>

Yes, I would wait for his answers, which should be way more reliable than
mine :)

Cheers,
>
>     Albert.
>
> On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
> >
> > HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
> >
> > I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> > for my sequences. Could you help me?
> >
> > 1.  Codeml could produce different Ka/Ks value if I run it twice. I
> > check it both in command line and in Perl wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml;
> >
> > The input sequences are:
> > >seq1
> > TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> > >seq2
> >
> > TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
> >
> > For command-line program, I used Codeml in PAML3.14, with specifications
> > in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program
> > four times.  The output are like below (from the output file). We could see
> > that they are different from each other. they should be same or slightly
> > different. Right? But they are NOT.  Weird!
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> > t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> > t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> > t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > I found the same problem when I use the Perl Wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> > similar to the one in BioPerl HOWTO).
> >
> > 2. Another strange thing is, if I switch to use program YN00 in the
> > package of PAML, the output are stable. However, it's much different from
> > Codeml. (see below)
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > seq. seq.     S       N        t   kappa    omega      dN +- SE
> > dS +- SE
> >    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> > 2.1300 +- 1.2272
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > Why like this? Which one I should believe?
> >
> >
> > Is there any guy who would kindly help me to run the perl script (twice
> > to check whether they are different)? or help to run the codeml in command
> > line?
> > I don't know whether there is anyone noticed this before, or because of
> > the wrong version of PAML.
> >
> > Regards,
> >
> > Xianjun
> >
> >
> >
> > Himanshu Ardawatia wrote:
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> >
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::Tools::Run::Alignment::Clustalw;
> >
> > # for projecting alignments from protein to R/DNA space
> > use Bio::Align::Utilities qw(aa_to_dna_aln);
> >
> > # for input of the sequence data
> > use Bio::SeqIO;
> > use Bio::AlignIO;
> >
> > my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
> >
> > #my $seqdata = 'chuck.fa';
> > my $seqdata = 'xianjun.fa ';
> >
> > my $seqIO = new Bio::SeqIO(-file   => $seqdata,
> >                            -format => 'fasta');
> > my %seqs;
> > my @prots;
> >
> > my $output;
> > # process each sequence
> > while( my $seq = $seqIO->next_seq ) {
> >     $seqs{$seq->display_id} = $seq;
> >     # translate them into protein
> >     my $protein = $seq->translate();
> >     my $pseq = $protein->seq();
> >     if( $pseq =~ /\*/ &&
> >     $pseq !~ /\*$/ ) {
> >     warn("provided a cDNA sequence with a stop codon, PAML will
> > choke!");
> >     exit(0);
> >     }
> >     # Tcoffee can't handle '*' even if it is trailing
> >     $pseq =~ s/\*//g;
> >     $protein->seq($pseq);
> >     push @prots, $protein;
> > }
> >
> > if( @prots < 2 ) {
> >     warn("Need at least 2 cDNA sequences to proceed");
> >     exit(0);
> > }
> >
> > open(OUT, ">align_output.txt") ||
> >       die("cannot open output $output for writing");
> > # Align the sequences with clustalw
> >
> > my $aa_aln = $aln_factory->align(\@prots);
> >
> > # project the protein alignment back to cDNA coordinates
> > my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
> >
> > my @each = $dna_aln->each_seq();
> >
> > my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
> >                   ( -params => { 'runmode' => -2,
> >                          'seqtype' => 1,
> >                  'model' => 1,
> >                 }
> >               );
> >
> > # set the alignment object
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >     last if( $s->display_id eq $_->display_id );
> >     $c++;
> >     }
> >     $c;
> > } @otus;
> >
> > print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> > CDNA_PERCENTID)), "\n";
> > for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
> >     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print OUT join("\t",
> >                $otus[$i]->display_id,
> >                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                $MLmatrix->[$i]->[$j]->{'dS'},
> >                $MLmatrix->[$i]->[$j]->{'omega'},
> >                sprintf("%.2f",$sub_aa_aln->percentage_identity),
> >                sprintf("%.2f",$sub_dna_aln->percentage_identity),
> >                ), "\n";
> >     }
> > }
> >
> >
> > On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> > >
> > > Hi Xianjun,
> > >
> > > I recognize this script. But it was a bit cumbersom to use this as
> > > many things are done in the script (like multiple alignment, aa to dna
> > > alignment and ka/ks calculation) so one does not have real control on these
> > > different aspect.
> > > I do not remeber getting different Ka/Ks in different runs though. But
> > > I remeber that one I ran the script with different versions of clustalw and
> > > it REALLY gave different results !! So please make sure if the clustalw
> > > versions are the same in all your runs. Best is to use the latest version.
> > >
> > > Finally I wrote my simple script which would generate a codeml.ctlfile for each set of sequences and run the codeml based on that and then
> > > more on. Disadvantage of this can be that some files keep getting
> > > over-written (like the one which have their names hard-coded in codeml
> > > program) and if one needs those files as well then one needs to run the
> > > codeml cycles for each set of sequences in different directories.
> > >
> > > One advantage of this kind of script is that you can use whichever
> > > alignment program you want to use and so on....But then its also extra steps
> > > of yourself doing multiple alignment and aa to dna alignment etc....
> > >
> > > Does it make sense? If you still get different outputs with same
> > > version of clustalw then I can sit with you and look at things together. Or
> > > else try the script method which I mentioned.
> > >
> > > Cheers  and Fu
> > > Himanshu
> > > \\
> > > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > > >
> > > > HI, Himanshu
> > > >
> > > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > > question
> > > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > > not
> > > > stable(different for each runtime), and also different from the
> > > > output
> > > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > > >
> > > > Here I attached the script. Could you help to have a look and try to
> > > > run
> > > > the script? How is your way to calculate the Kaks ratio?
> > > >
> > > > Thanks
> > > >
> > > > --
> > > > ---------------------------
> > > > Sterding (Xianjun) Dong
> > > > PhD student, Boris Lenhard's group
> > > > Bergen Center of Computational Science
> > > > Bergen University, Norway
> > > > Mobile: 0047-47361688
> > > > Telephone: 0047-55276381
> > > > Skype: xianjun.dong
> > > >
> > > >
> > > >
> > >
> >
> > --
> > ---------------------------
> > Sterding (Xianjun) Dong
> > PhD student, Boris Lenhard's group
> > Bergen Center of Computational Science
> > Bergen University, Norway
> > Mobile: 0047-47361688
> > Telephone: 0047-55276381
> >
> > Skype: xianjun.dong
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>


From roy at colibase.bham.ac.uk  Tue May 29 10:05:12 2007
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Tue, 29 May 2007 15:05:12 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <465C3318.5080201@colibase.bham.ac.uk>

Hi Xianjun,

I'm not sure if it is the cause of your problem, but your sequences seem
to be quite short. This paper:
http://mbe.oxfordjournals.org/cgi/content/full/21/12/2290

suggests that the codeml method of calculating Ka and Ks may be
unreliable for sequences shorter than 300 codons.

Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.


From gbr0wn at comcast.net  Wed May 30 11:44:13 2007
From: gbr0wn at comcast.net (gbr0wn at comcast.net)
Date: Wed, 30 May 2007 15:44:13 +0000
Subject: [Bioperl-l] getting started in windows
Message-ID: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070530/2f640e16/attachment-0002.pl>

From golharam at umdnj.edu  Wed May 30 11:40:28 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 30 May 2007 11:40:28 -0400
Subject: [Bioperl-l] ClustalW Score?
Message-ID: <00c201c7a2d0$d971f550$2d01a8c0@PICO>

How do I get the clustalw score from a clustalw alignment?  I'm using the
following code to align my sequences:

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();

$seq[0] = ...
$seq[1] = ...
$seq[2] = ...
$seq[3] = ...

$aln = $aln_factory->align(\@seq);

I can get the percentage identity from the Bio::SimpleAlign object, but
there is no score.  I looked into it further and it doesn't look like the
score is being captured anywhere.  So, how does one get the score from
ClustalW using this method?

Ryan


From barry.moore at genetics.utah.edu  Wed May 30 12:21:16 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 30 May 2007 10:21:16 -0600
Subject: [Bioperl-l] getting started in windows
In-Reply-To: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
References: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
Message-ID: <CA090066-0624-4C52-8306-E783278484B0@genetics.utah.edu>

Try opening up a terminal window (I think you'll find that under  
accessories).  Change to the directory where you code is and run it  
off the command line.

B

On May 30, 2007, at 9:44 AM, gbr0wn at comcast.net wrote:

> I am a perl novice trying to run perl 5.8.8 on windows xp system.   
> I have used 'wordpad' to paste tutorial code into an executable  
> file and when I double click the icon for the file a window opens  
> up briefly with output and/or error message but closes too fast for  
> me to read.  Any idea why this might be happening?
> Thanks, Greg Brown - gbr0wn at comcast.net
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed May 30 13:16:49 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 10:16:49 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
References: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
Message-ID: <1A4207F8295607498283FE9E93B775B403349DAB@EX02.asurite.ad.asu.edu>

> How do I get the clustalw score from a clustalw alignment?  
> I'm using the following code to align my sequences:
> 
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> 
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
> 
> $aln = $aln_factory->align(\@seq);
> 
> I can get the percentage identity from the Bio::SimpleAlign 
> object, but there is no score.  I looked into it further and 
> it doesn't look like the score is being captured anywhere.  
> So, how does one get the score from ClustalW using this method?


        open(OUTCOPY, ">&STDOUT")  or die "Couldn't dup STDOUT: $!";
        open(STDOUT,  ">log.test") or die "Couldn't open log.test: $!";
        push @aln, $factory->align(\@seq);
        close STDOUT;
        open(STDOUT, ">&OUTCOPY");
        open(TEMP,   "log.test");
        while (<TEMP>)
        {

                if ($_ =~ /Score:(\d+)/)
                {
                        $aln->score($1);
                        print "Found score of $1\n";
                }
        }
        close TEMP;
        unlink("log.test");


From jason at bioperl.org  Wed May 30 14:54:20 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 May 2007 11:54:20 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
Message-ID: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>

You can do it without redirecting STDOUT or creating a new file, just  
change the system call to:

Here is the code for running in _run in the module:
    my $commandstring = $self->executable."$instring"."$param_string";
     $self->debug( "clustal command = $commandstring");
     my $status = system($commandstring);
      unless( $status == 0 ) {
           $self->warn( "Clustalw call ($commandstring) crashed: $?  
\n");
           return undef;
      }

Do something like:

my $fh;
open($fh, "$commandstring |");
my $score;
while(<$fh>) {
   $score = $1 if ($_ =~ /Score:(\d+)/);
}
close($fh);

... then at the bottom after the alignment is created do:

$aln->score($score);


There may be some more debugging b/c if you invoke the quiet => 1  
parameter there may be an automatic ">& /dev/null" appended to the  
end of the parameter string that you'll need to figure out how to  
override.

Sorry I don't have more time to help; I hope this gets you started.

-jason
On May 30, 2007, at 10:18 AM, Ryan Golhar wrote:

> Did you see Kevin's response?  That's one possible solution that  
> could be
> implemented...
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Wednesday, May 30, 2007 12:05 PM
> To: golharam at umdnj.edu
> Subject: Re: [Bioperl-l] ClustalW Score?
>
>
> Nope it isn't parsed since it is part of the STDOUT from the  
> program not the
> alignment.  If you want to add parsing of the STDOUT from Clustalw  
> someone
> will need to refactor how the program is run and capture and parse the
> STDOUT. The score can be added to the score field of the  
> SimpleAlign object,
> but again since there is no where for it to be stored in a clustalw
> alignment file it won't be round tripped anywhere. I think  
> stockholm will
> manage it for you though.
>
> Do you know what the score represents - can it be computed from the
> alignment itsself?
>
> -jason
>
> On May 30, 2007, at 8:40 AM, Ryan Golhar wrote:
>
>
> How do I get the clustalw score from a clustalw alignment?  I'm  
> using the
> following code to align my sequences:
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
>
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
>
> $aln = $aln_factory->align(\@seq);
>
> I can get the percentage identity from the Bio::SimpleAlign object,  
> but
> there is no score.  I looked into it further and it doesn't look  
> like the
> score is being captured anywhere.  So, how does one get the score from
> ClustalW using this method?
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Wed May 30 15:52:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 12:52:01 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403349E4D@EX02.asurite.ad.asu.edu>

> You can do it without redirecting STDOUT or creating a new 
> file, just change the system call to:
> 
> Here is the code for running in _run in the module:
>     my $commandstring = $self->executable."$instring"."$param_string";
>      $self->debug( "clustal command = $commandstring");
>      my $status = system($commandstring);
>       unless( $status == 0 ) {
>            $self->warn( "Clustalw call ($commandstring) crashed: $?  
> \n");
>            return undef;
>       }
> 
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/); } close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet 
> => 1 parameter there may be an automatic ">& /dev/null" 
> appended to the end of the parameter string that you'll need 
> to figure out how to override.
> 
> Sorry I don't have more time to help; I hope this gets you started.

I did it my way as I was doing it without modifying the Bioperl code (in
case I later updated to a new version and forgot about the changes I had
put into it).  So that code just sits in my perl script where it calls
the Bioperl module to create the Clustal alignment object.


From Xianjun.Dong at bccs.uib.no  Tue May 29 11:02:21 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 17:02:21 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2F8E.2070309@ed.ac.uk>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>		<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>		<465C1533.6070900@ii.uib.no>	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no> <465C2F8E.2070309@ed.ac.uk>
Message-ID: <465C407D.608@ii.uib.no>

HI, Darren

The sequences are from Human and zebrafish. I currently use two 
sequences. I just want to see what's the substitution pattern there is. 
But your comment remind me whether I should get the other species 
involved, like mouse, chicken.

BTW, what's you mean 'per codon, not per site'? Do you mean the Ds(Ks) 
of Codeml is for per codon, and Yn00 is for per site?
I think there should be a possible/reasonable way to calculate the 
synonymous substitution, even if the divergence is big enough. If the 
Codeml is not a good solution for that case, do you have better suggestion?

Thanks

Xianjun

Darren Obbard wrote:
> Out of interest, what are the species, and how much sequence are you 
> using?
>
> - Estimating Ds when it is >>1 is very hard anyway, since the 
> substitutions are saturated. i.e. Regardless of the method, there will 
> be some level of divergence for which Ds can no longer be estimated. A 
> Ds of ~14 (for PAML I think this is per codon, not per site) sounds 
> very high to me - higher than I would want to try to estimate Ds.
>
> Dong Xianjun wrote:
>> Thanks for information, Albert.
>>
>> But still in two questions:
>> Albert Vilella wrote:
>>> codeml in PAML can give different results in cases where the 
>>> optimization reaches different local maxima depending on the 
>>> different starting points of each run (seed values). So, at least 
>>> for some methods and options, this instability is inherent to the 
>>> underlying algorithm.
>> 1. How to set the initial value in order to get a reasonable 
>> estimation? Do you have some experience for that?
>>> Even more, for some methods and options, it is even recommended in 
>>> PAML documentation to run the same data more than once, to see if 
>>> the results are the same, which would be a good indication that the 
>>> model is robust given the data.
>> 2. Is there a recommend way to test the significance if the results 
>> are different? For example, in my case, dS could range from 10.1852 
>> to 14.9961 for the four runtime. If that means the model is not 
>> robust(how to check this?), should I change to use another model?
>>
>> How could YN00 reach stable result? (Is it because YN00 does not 
>> require initial value for optimization?) Why could YN00 produce so 
>> different result from Codeml? (for YN00, dS=2.1300 with SE=1.2272; 
>> for Codeml, dS=10.1852-14.9961)
>>> Maybe PAML's author can give a more specific answer for your data at:
>>> http://www.rannala.org/gsf/viewforum.php?f=1
>>
>> Actually I already post my question in the author's forum. Let's wait 
>> and see.
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> On 5/29/07, *Dong Xianjun* <Xianjun.Dong at bccs.uib.no 
>>> <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>
>>>     HI, dear all, //sorry for duplicated msg for /Jason/ and /Neil/
>>>
>>>     I'm bothering by two problems when I use PAML module to calculate
>>>     Ka/Ks for my sequences. Could you help me?
>>>
>>>     1.  Codeml could produce different Ka/Ks value if I run it twice.
>>>     I check it both in command line and in Perl wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml;
>>>
>>>     The input sequences are:
>>>     >seq1
>>>     
>>> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG 
>>>
>>>     >seq2
>>>     
>>> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG 
>>>
>>>
>>>     For command-line program, I used Codeml in PAML3.14, with
>>>     specifications in codeml.ctl (runmode = -2, seqtype = 1). I tried
>>>     to run the program four times.  The output are like below (from
>>>     the output file). We could see that they are different from each
>>>     other. they should be same or slightly different. Right? But they
>>>     are NOT.  Weird!
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522     
>>> dS=14.8339
>>>     t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507     
>>> dS=12.2349
>>>     t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510     
>>> dS=14.9961
>>>     t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505     
>>> dS=10.1852
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     I found the same problem when I use the Perl Wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script
>>>     here, similar to the one in BioPerl HOWTO).
>>>
>>>     2. Another strange thing is, if I switch to use program YN00 in
>>>     the package of PAML, the output are stable. However, it's much
>>>     different from Codeml. (see below)
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     seq. seq.     S       N        t   kappa    omega      dN +- SE  
>>>            dS +- SE
>>>        2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +-
>>>     0.0265  2.1300 +- 1.2272
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     Why like this? Which one I should believe?
>>>
>>>
>>>     Is there any guy who would kindly help me to run the perl script
>>>     (twice to check whether they are different)? or help to run the
>>>     codeml in command line?
>>>     I don't know whether there is anyone noticed this before, or
>>>     because of the wrong version of PAML.
>>>
>>>     Regards,
>>>
>>>     Xianjun
>>>
>>>
>>>
>>>     Himanshu Ardawatia wrote:
>>>>     #!/usr/bin/perl
>>>>
>>>>     use strict;
>>>>     use warnings;
>>>>
>>>>
>>>>     use Bio::Tools::Run::Phylo::PAML::Codeml;
>>>>     use Bio::Tools::Run::Alignment::Clustalw;
>>>>
>>>>     # for projecting alignments from protein to R/DNA space
>>>>     use Bio::Align::Utilities qw(aa_to_dna_aln);
>>>>
>>>>     # for input of the sequence data
>>>>     use Bio::SeqIO;
>>>>     use Bio::AlignIO;
>>>>
>>>>     my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>>>>
>>>>     #my $seqdata = 'chuck.fa';
>>>>     my $seqdata = 'xianjun.fa ';
>>>>
>>>>     my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>>>>                                -format => 'fasta');
>>>>     my %seqs;
>>>>     my @prots;
>>>>
>>>>     my $output;
>>>>     # process each sequence
>>>>     while( my $seq = $seqIO->next_seq ) {
>>>>         $seqs{$seq->display_id} = $seq;
>>>>         # translate them into protein
>>>>         my $protein = $seq->translate();
>>>>         my $pseq = $protein->seq();
>>>>         if( $pseq =~ /\*/ &&
>>>>         $pseq !~ /\*$/ ) {
>>>>         warn("provided a cDNA sequence with a stop codon, PAML will
>>>>     choke!");
>>>>         exit(0);
>>>>         }
>>>>         # Tcoffee can't handle '*' even if it is trailing
>>>>         $pseq =~ s/\*//g;
>>>>         $protein->seq($pseq);
>>>>         push @prots, $protein;
>>>>     }
>>>>
>>>>     if( @prots < 2 ) {
>>>>         warn("Need at least 2 cDNA sequences to proceed");
>>>>         exit(0);
>>>>     }
>>>>
>>>>     open(OUT, ">align_output.txt") ||
>>>>           die("cannot open output $output for writing");
>>>>     # Align the sequences with clustalw
>>>>
>>>>     my $aa_aln = $aln_factory->align(\@prots);
>>>>
>>>>     # project the protein alignment back to cDNA coordinates
>>>>     my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>>>>
>>>>     my @each = $dna_aln->each_seq();
>>>>
>>>>     my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>>>>                       ( -params => { 'runmode' => -2,
>>>>                              'seqtype' => 1,
>>>>                      'model' => 1,
>>>>                     }
>>>>                   );
>>>>
>>>>     # set the alignment object
>>>>     $kaks_factory->alignment($dna_aln);
>>>>
>>>>     # run the KaKs analysis
>>>>     my ($rc,$parser) = $kaks_factory->run();
>>>>     my $result = $parser->next_result;
>>>>     my $MLmatrix = $result->get_MLmatrix();
>>>>
>>>>     my @otus = $result->get_seqs();
>>>>     # this gives us a mapping from the PAML order of sequences back to
>>>>     # the input order (since names get truncated)
>>>>     my @pos = map {
>>>>         my $c= 1;
>>>>         foreach my $s ( @each ) {
>>>>         last if( $s->display_id eq $_->display_id );
>>>>         $c++;
>>>>         }
>>>>         $c;
>>>>     } @otus;
>>>>
>>>>     print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
>>>>     CDNA_PERCENTID)), "\n";
>>>>     for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>>>>         for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>>>>         my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         print OUT join("\t",                    $otus[$i]->display_id,
>>>>                    
>>>> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>>>>                    $MLmatrix->[$i]->[$j]->{'dS'},
>>>>                    $MLmatrix->[$i]->[$j]->{'omega'},
>>>>                    sprintf("%.2f",$sub_aa_aln->percentage_identity),
>>>>                    sprintf("%.2f",$sub_dna_aln->percentage_identity),
>>>>                    ), "\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>     On 5/29/07, *Himanshu Ardawatia* <himanshu.ardawatia at bccs.uib.no
>>>>     <mailto:himanshu.ardawatia at bccs.uib.no>> wrote:
>>>>
>>>>         Hi Xianjun,
>>>>
>>>>         I recognize this script. But it was a bit cumbersom to use
>>>>         this as many things are done in the script (like multiple
>>>>         alignment, aa to dna alignment and ka/ks calculation) so one
>>>>         does not have real control on these different aspect.
>>>>         I do not remeber getting different Ka/Ks in different runs
>>>>         though. But I remeber that one I ran the script with
>>>>         different versions of clustalw and it REALLY gave different
>>>>         results !! So please make sure if the clustalw versions are
>>>>         the same in all your runs. Best is to use the latest version.
>>>>
>>>>         Finally I wrote my simple script which would generate a
>>>>         codeml.ctl file for each set of sequences and run the codeml
>>>>         based on that and then more on. Disadvantage of this can be
>>>>         that some files keep getting over-written (like the one
>>>>         which have their names hard-coded in codeml program) and if
>>>>         one needs those files as well then one needs to run the
>>>>         codeml cycles for each set of sequences in different
>>>>         directories.
>>>>
>>>>         One advantage of this kind of script is that you can use
>>>>         whichever alignment program you want to use and so on....But
>>>>         then its also extra steps of yourself doing multiple
>>>>         alignment and aa to dna alignment etc....
>>>>
>>>>         Does it make sense? If you still get different outputs with
>>>>         same version of clustalw then I can sit with you and look at
>>>>         things together. Or else try the script method which I
>>>>         mentioned.
>>>>
>>>>         Cheers  and Fu
>>>>         Himanshu
>>>>         \\
>>>>
>>>>         On 5/28/07, *Dong Xianjun* < Xianjun.Dong at bccs.uib.no
>>>>         <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>>
>>>>             HI, Himanshu
>>>>
>>>>             I am sure you did some work in Ka/Ks calculation. Here I
>>>>             have a question
>>>>             bothering me; the output for
>>>>             Bio::Tools::Run::Phylo::PAML::Codeml is not
>>>>             stable(different for each runtime), and also different
>>>>             from the output
>>>>             with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
>>>>
>>>>             Here I attached the script. Could you help to have a
>>>>             look and try to run
>>>>             the script? How is your way to calculate the Kaks ratio?
>>>>
>>>>             Thanks
>>>>
>>>>             --
>>>>             ---------------------------
>>>>             Sterding (Xianjun) Dong
>>>>             PhD student, Boris Lenhard's group
>>>>             Bergen Center of Computational Science
>>>>             Bergen University, Norway
>>>>             Mobile: 0047-47361688
>>>>             Telephone: 0047-55276381
>>>>             Skype: xianjun.dong
>>>>
>>>>
>>>>
>>>>
>>>
>>>     --     ---------------------------
>>>     Sterding (Xianjun) Dong
>>>     PhD student, Boris Lenhard's group
>>>     Bergen Center of Computational Science
>>>     Bergen University, Norway
>>>     Mobile: 0047-47361688
>>>     Telephone: 0047-55276381
>>>
>>>     Skype: xianjun.dong
>>>        
>>>
>>>     _______________________________________________
>>>     Bioperl-l mailing list
>>>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> ---------------------------
>> Sterding (Xianjun) Dong
>> PhD student, Boris Lenhard's group
>> Bergen Center of Computational Science
>> Bergen University, Norway
>> Mobile: 0047-47361688
>> Telephone: 0047-55276381
>> Skype: xianjun.dong
>>   
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
---------------------------
Sterding (Xianjun) Dong
PhD student, Boris Lenhard's group
Bergen Center of Computational Science
Bergen University, Norway
Mobile: 0047-47361688
Telephone: 0047-55276381
Skype: xianjun.dong


From bix at sendu.me.uk  Thu May 31 04:34:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 09:34:38 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E889E.3090304@sendu.me.uk>

Jason Stajich wrote:
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/);
> }
> close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet => 1  
> parameter there may be an automatic ">& /dev/null" appended to the  
> end of the parameter string that you'll need to figure out how to  
> override.

Is there any particular reason for not having something along these 
lines committed to the module? Shall I go ahead and implement?


From bix at sendu.me.uk  Thu May 31 05:54:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 10:54:32 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E9B58.1020403@sendu.me.uk>

Jason Stajich wrote:
>    $score = $1 if ($_ =~ /Score:(\d+)/);

I see that there are lots of lines in the output that match the above 
regex, but there is also a single /Alignment Score (\d+)/ line printed 
at the end. Isn't that the score that should get stored in $aln->score()?


From jason at bioperl.org  Thu May 31 14:08:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 31 May 2007 11:08:19 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <465E9B58.1020403@sendu.me.uk>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
	<465E9B58.1020403@sendu.me.uk>
Message-ID: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>

you're right --- it is not really my code, I was just elaborating  
Kevin's example --- it would probably need to be more specific or  
perhaps the last Score seen is sufficient for what one is trying to  
capture?

-j
On May 31, 2007, at 2:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>    $score = $1 if ($_ =~ /Score:(\d+)/);
>
> I see that there are lots of lines in the output that match the  
> above regex, but there is also a single /Alignment Score (\d+)/  
> line printed at the end. Isn't that the score that should get  
> stored in $aln->score()?
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Thu May 31 14:15:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 31 May 2007 11:15:38 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO><DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org><465E9B58.1020403@sendu.me.uk>
	<49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B40334A01A@EX02.asurite.ad.asu.edu>

> you're right --- it is not really my code, I was just 
> elaborating Kevin's example --- it would probably need to be 
> more specific or perhaps the last Score seen is sufficient 
> for what one is trying to capture?

I took that code from a pairwise clustal alignment script that I wrote
to deal with aligning a bunch of short sequences against a long one to
see where they line up at.  When all of them were fed to Clustal the
short sequences all ended up aligned to each other and not well aligned
to the longer sequence.  I only saw one score in the output from the
pairwise, so that is what I used to find a reasonable value.


From torsten.seemann at infotech.monash.edu.au  Tue May  1 00:22:35 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 1 May 2007 10:22:35 +1000
Subject: [Bioperl-l] generate a fasta file from the blast report
In-Reply-To: <10259461.post@talk.nabble.com>
References: <10259461.post@talk.nabble.com>
Message-ID: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>

> if i have the following script working on my blast report, can anyone plz
> tell me how can i generate a fasta format file of just the hits (subject)
> sequence.

Do you want the WHOLE subject sequence, or just the region that hit the query?

The hit is available as $hsp->hit_string();
http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/GenericHSP.html#CODE11

The whole subject sequence would require the original Fasta input file.

By the way, are your questions for work related issues, or is this
your homework or assignment for a study course?

--Torsten


From shameer at ncbs.res.in  Tue May  1 11:36:31 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 17:06:31 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
Message-ID: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>

Dear All,

I am trying to impliment a bioperl based program to generate a dynamic,
clickable image. I have used Dr. Lincoln Steins's code provided in
example3 at this URL :
http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
be perfect for my purpose.

I need to add few modifications to the image. I reffered the Bio::Graphics
HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
(may be am missing something imp.. ? )  but I couldnt get a quick
solution, Thought I will ask about it to the experts for some tips and
tricks.

This is what I am looking for :

1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
changed according to length of the sequence. My sequence length is usually
in a range of 70 - 200.

2. I also need to make the image interactive / clickable on the various
blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
used instead of name of the blast hits)


Many thanks in advance for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 16:04:13 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:13 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42403.192.168.1.1.1178035453.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From shameer at ncbs.res.in  Tue May  1 16:04:11 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 1 May 2007 21:34:11 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
Message-ID: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>

Dear Scot,

> There is a fair amount of documentation in the perldoc for
> Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> you read that?

I agreed, but I couldnt the exact information I needed :( (may be I missed
something important).

>  Also, for changing the scale, that should happen
> automatically--have you tried yet?

I tried by changing the Lincoln's program eg: blast3.pl
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
 to my
$full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);

But it had given me a smaller scale of length upto 300. I was looking for
an option where I need same width and height of given image and a dynamic
start and end values depending on length of my sequence. Since I couldnt
accomplish, I thought of getting some help from you guys. I think I need
to play a little bit with the value for reformat the scale to accomodate
my hits as well.

Thanks a lot for your inputs,
-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From cain at cshl.edu  Tue May  1 14:04:09 2007
From: cain at cshl.edu (Scott Cain)
Date: Tue, 01 May 2007 10:04:09 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
Message-ID: <1178028249.2644.13.camel@localhost.localdomain>

Hi Shameer,

There is a fair amount of documentation in the perldoc for
Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
you read that?  Also, for changing the scale, that should happen
automatically--have you tried yet?

Scott


On Tue, 2007-05-01 at 17:06 +0530, Shameer Khadar wrote:
> Dear All,
> 
> I am trying to impliment a bioperl based program to generate a dynamic,
> clickable image. I have used Dr. Lincoln Steins's code provided in
> example3 at this URL :
> http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to
> be perfect for my purpose.
> 
> I need to add few modifications to the image. I reffered the Bio::Graphics
> HOWTO,  Creating_Imagemaps documents and other old bio-perl list mails
> (may be am missing something imp.. ? )  but I couldnt get a quick
> solution, Thought I will ask about it to the experts for some tips and
> tricks.
> 
> This is what I am looking for :
> 
> 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be
> changed according to length of the sequence. My sequence length is usually
> in a range of 70 - 200.
> 
> 2. I also need to make the image interactive / clickable on the various
> blue bar as different hyperlink to NCBI / PDB using ID (This ids will be
> used instead of name of the blast hits)
> 
> 
> Many thanks in advance for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/f84a3220/attachment.sig>

From cjfields at uiuc.edu  Tue May  1 17:10:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 12:10:10 -0500
Subject: [Bioperl-l] Pb makefile
In-Reply-To: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
References: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>
Message-ID: <D975B11D-1303-4CF4-AE3B-878881964DB9@uiuc.edu>

Is there any reason you want to install bioperl 1.4 (which is over 3  
yrs old)?  The latest is v.1.5.2 (Dec. 2006); man page generation has  
been fixed for that version, which uses Module::Build.

The man page generation was turned off prior to 1.4, though I may be  
wrong.  Based on the Extutils::MakeMaker FAQ you should be able to  
prevent man page generation this way:

perl Makefile.PL INSTALLMAN1DIR=none INSTALLMAN3DIR=none

chris

On Apr 30, 2007, at 5:35 AM, Francoise.LECOMTE at biogemma.com wrote:

> Hi
> I try to install biopoerl1.4 on Tru64 plateform and I've got a message
> "make:line too long" when I run the command make install
> How can I solve it ? How disable man pages installaton in  
> Makefile.PL if
> it can sove this problem
>
> Best regards
>
> Fran?oise Lecomte


From cain.cshl at gmail.com  Tue May  1 19:50:42 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 01 May 2007 15:50:42 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <1178049042.2644.36.camel@localhost.localdomain>

Perhaps if you provided some code and sample data we might be able to
help you better.

Scott


On Tue, 2007-05-01 at 21:34 +0530, Shameer Khadar wrote:
> Dear Scot,
> 
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
> 
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
> 
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
> 
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>  to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
> 
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
> 
> Thanks a lot for your inputs,
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/9c655e4c/attachment.sig>

From agathman at semo.edu  Tue May  1 23:10:20 2007
From: agathman at semo.edu (Gathman, Allen)
Date: Tue, 1 May 2007 18:10:20 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
Message-ID: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>

Hi, all --
 
I've been using BioPerl 1.4 for a while; recently I installed 1.5.2, and
found that scripts that had been using spliced_seq are now broken.  Any
thoughts on what might be going on? 
 
Here's a sample script:
 
*********************************************
 
#!/usr/bin/perl -w
 
use strict;
use Bio::DB::GFF;

my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
                               -dsn        =>
'dbi:mysql:database=cc;host=localhost',
                               -fasta      => '/gbrowse/databases/cc'
                               );
$db->add_aggregator('transcript{CDS/mRNA}');
my $seg=$db->segment('ccin_Contig120');
my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
 
for my $gene (@genes) {
    my $gid = $gene->display_id;
 
    print STDERR "Gene is $gid\n";
    my $splgene = $gene->spliced_seq();
}

********************************************
The line with "spliced_seq" in it crashes the program.  Here's the
STDERR output:
 
Gene is Jan06m400_GLEAN_11487

-------------------- WARNING ---------------------

MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
absolute set to 1 -- be warned you may not be getting things on the
correct strand

---------------------------------------------------

-------------------- WARNING ---------------------

MSG: seq doesn't validate, mismatch is
::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::,(0,881935
,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Attempting to set the sequence to
[Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170)Bio::Prim
arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308)Bio::PrimarySeq=HAS
H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH(0x881f4a
4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] which
does not look healthy

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359

STACK: Bio::PrimarySeq::seq
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258

STACK: Bio::PrimarySeq::new
/usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210

STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484

STACK: Bio::SeqFeatureI::spliced_seq
/usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498

STACK: /transfer/testsplice.pl:20

-----------------------------------------------------------

Allen Gathman

http://cstl-csm.semo.edu/gathman

 
From cjfields at uiuc.edu  Wed May  2 00:27:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 1 May 2007 19:27:46 -0500
Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2
In-Reply-To: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
References: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu>
Message-ID: <9F00B020-AFF0-40DB-9694-6061B5A11A73@uiuc.edu>

Can you file a bug on this?  Attach the script and maybe detail what  
data is loaded into your local MySQL database (if possible).

chris

On May 1, 2007, at 6:10 PM, Gathman, Allen wrote:

> Hi, all --
>
> I've been using BioPerl 1.4 for a while; recently I installed  
> 1.5.2, and
> found that scripts that had been using spliced_seq are now broken.   
> Any
> thoughts on what might be going on?
>
> Here's a sample script:
>
> *********************************************
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::DB::GFF;
>
> my $db  = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
>                                -dsn        =>
> 'dbi:mysql:database=cc;host=localhost',
>                                -fasta      => '/gbrowse/databases/cc'
>                                );
> $db->add_aggregator('transcript{CDS/mRNA}');
> my $seg=$db->segment('ccin_Contig120');
> my @genes=$seg->features(-types=>('transcript:GLEAN_alt'));
>
> for my $gene (@genes) {
>     my $gid = $gene->display_id;
>
>     print STDERR "Gene is $gid\n";
>     my $splgene = $gene->spliced_seq();
> }
>
> ********************************************
> The line with "spliced_seq" in it crashes the program.  Here's the
> STDERR output:
>
> Gene is Jan06m400_GLEAN_11487
>
> -------------------- WARNING ---------------------
>
> MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have
> absolute set to 1 -- be warned you may not be getting things on the
> correct strand
>
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
>
> MSG: seq doesn't validate, mismatch is
> ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::, 
> (0,881935
> ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098)
>
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
>
> MSG: Attempting to set the sequence to
> [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170) 
> Bio::Prim
> arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308) 
> Bio::PrimarySeq=HAS
> H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH 
> (0x881f4a
> 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)]  
> which
> does not look healthy
>
> STACK: Error::throw
>
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359
>
> STACK: Bio::PrimarySeq::seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258
>
> STACK: Bio::PrimarySeq::new
> /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210
>
> STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484
>
> STACK: Bio::SeqFeatureI::spliced_seq
> /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498
>
> STACK: /transfer/testsplice.pl:20
>
> -----------------------------------------------------------
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From shameer at ncbs.res.in  Wed May  2 03:46:59 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 2 May 2007 09:16:59 +0530 (IST)
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <1178049042.2644.36.camel@localhost.localdomain>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<1178049042.2644.36.camel@localhost.localdomain>
Message-ID: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>

Dear Scott,

Once thanks a lot for your inputs.

I am following same  data formats as in
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
The purpose is to display these results in a web page.

I am using the program in Stein's Bio::Graphics example
http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

I need exactly same image as in
http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
only difference is I need the scale (0.1k - 0.9k) in a range of simple
1-XXX , here XXX depends on the length of the sequence input.

Many thanks for your help,


> Perhaps if you provided some code and sample data we might be able to
> help you better.
>
> Scott
>

-- 
Shameer Khadar
Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed May  2 10:02:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 2 May 2007 06:02:48 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<1178049042.2644.36.camel@localhost.localdomain>
	<59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in>
Message-ID: <200705020602.48404.sdavis2@mail.nih.gov>

On Tuesday 01 May 2007 23:46, Shameer Khadar wrote:
> Dear Scott,
>
> Once thanks a lot for your inputs.
>
> I am following same  data formats as in
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt
> Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he
> blue boxes (feature) should be clickable like a hot-spot/imagesmap images.
> The purpose is to display these results in a web page.

Do you have your data loaded into bioperl objects?  What code did you use for 
that (post that code)?

> I am using the program in Stein's Bio::Graphics example
> http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl

Does this example run on your computer?  Have you been able to use the bioperl 
objects you created in the first step in the creation of a graphic?  If not, 
what have you tried (post the code) and any error messages.

> I need exactly same image as in
> http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png
> only difference is I need the scale (0.1k - 0.9k) in a range of simple
> 1-XXX , here XXX depends on the length of the sequence input.

Again, what have you tried?  Posting code is helpful here, also.  

I'm not an expert in bioperl graphics, but it does really help those that know 
to see the code that you have written to know how best to help.  

Sean


From lzlgboy at gmail.com  Wed May  2 13:58:14 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 2 May 2007 21:58:14 +0800
Subject: [Bioperl-l] Extract CDS from CDNA given Protein SEQs
Message-ID: <d78b3d40705020658w1bee4c68s3058a63ef23c62a1@mail.gmail.com>

Hi ,everyone

   I got a task to extract cds sequences from cdna , and I have the protein
sequence for each cdna, what should I do?
   Should I try 3_frmae_translate? But how.
   Thanks.

-- 
???
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From MEC at stowers-institute.org  Wed May  2 22:38:31 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 2 May 2007 17:38:31 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>

Lincoln, 
 
Here for your comment and review is a very reworked version of
Bio::Graphics::FeatureBase->gff3_string.
 
The main difference is to that homogenous children get ALL their
attributes except for start/stop from the parent, including the group.
I also provide option as to whether or now to "remove extraneous level
of parentage" called $preserveHomegenousParent.
 
There is an in-line comment and question for you in the code body.
 
It works well in my hands to my use cases, but, I'm not positive it is
in the spirit of your intentions.
 
Cheers,
 
Malcolm
 
 
sub gff3_string {
  my ($self, $recurse, $preserveHomegenousParent,
 
      # Note: the following parameters, whose name begins with '$_',
      # are intended for recursive call only.
 
      $_parent,
      $_self_is_hsf,  # is $self the child in a homogeneous parent/child
relationship?
      $_hsf_parentgroup, # if so, what is the group (GFF column 9) of
the parent
     ) = @_;
 
  # PURPOSE: Return GFF3 format for the feature $self.  Optionally
  # $recurse to include GFF for any subfeatures of the feature. If
  # recursing, provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have subfeatures all of whose types are the same as the
  # feature itself (the "homogenous parent/child" case). This usage is
  # a convention for representing discontiguous features; they may be
  # created by using the -segment directive without specifying a
  # distinct -subtype in to `new` when creating a
  # Bio::Graphics::FeatureBase (i.e.  Bio::DB::SeqFeature,
  # Bio::Graphics::Feature).  Such homogenous subfeatures created in
  # this fashion DO NOT have the parent (GFF column 9) attributes
  # propogated to them; so, since they are all part of the same
  # parent, the ONLY difference relevant to GFF production SHOULD be
  # the $start and $end coordinates for their segment, and ALL THIER
  # OTHER ATTRIBUTES should be copied down from the parent (including:
  # strand, score, Name, ID, Parent, etc).
 

  my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's
parent, if it is a homogenous child, otherwise $self.
 
  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
    my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be
TRUE only if  all subfeatures are the same type as $self.
    my $mygroup =
      # compute $self's group if it is needed to be passed down to
      # subfeatures, unless it is already being passed down (in which
      # case there are (at least) 3 levels of homogenous parent child
      # (will this ever happen in practice???))
      ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent); 
    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
  } else {
    my $name  = $hparentORself->name;
    my $class = $hparentORself->class;
    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
    my $strand = ('-','.','+')[$self->strand+1]; 
    # TODO: understand conditions under which this could be other than
    # hparentORself->strand.  In particular, why does add_segment flip
    # the strand when start > stop?  I thought this was not allowed!
    # Lincoln - any ideas?
    my $p      = join("\t",
 
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
        $self->start||'.',$self->stop||'.',
        defined($hparentORself->score) ? $hparentORself->score : '.',
        $strand||'.',
        defined($hparentORself->phase) ? $hparentORself->phase : '.',
        $group||'');
  }
}
 

________________________________

	From: Cook, Malcolm 
	Sent: Friday, April 27, 2007 1:45 PM
	To: 'lincoln.stein at gmail.com'
	Cc: 'lstein at cshl.org'; 'bioperl list'
	Subject: RE: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Lincoln,
	 
	Cool.
	 
	The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
	 

	Malcolm Cook
	Database Applications Manager - Bioinformatics
	Stowers Institute for Medical Research - Kansas City, Missouri
	  

________________________________

		From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
		Sent: Friday, April 27, 2007 12:45 PM
		To: Cook, Malcolm
		Cc: lstein at cshl.org; bioperl list
		Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
		
		
		Hi Malcom,
		
		This is absolutely ok and you can go ahead and commit.
Thanks for figuring this out!
		
		Lincoln
		
		
		On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

			Lincoln, et al,
			
			I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
			from a Bio::DB::SeqFeature::Store that were
initially created with
			-seqments (i.e. whose location was
discontiguous) does not display any
			other attributes in column 9 than "Name".
			
			What do you think of the following patch to
Bio::Graphics::FeatureBase, 
			whose effect is to "contrive to return
(duplicated) common group values"
			(which otherwise get lost when "collapsing"
"homogenous" parent/child
			features)
			
			Another approach would be to copy the attributes
from the parent to the 
			children when the -seqments are first created.
			
			Another approach would be to use
Bio::SeqFeature::Generic  as the db's
			-seqfeature_class and save with -location being
a Bio::Location::Split,
			but this was wrougth with other problems. 
			
			Any other suggestions?  Do you want me to commit
this patch?
			
			Cheers,
			
			Malcolm
			
			Patch follows:
			
			
			Index: FeatureBase.pm
	
=================================================================== 
			RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
			retrieving revision 1.29
			diff -c -r1.29 FeatureBase.pm
			*** FeatureBase.pm      16 Apr 2007 19:55:33
-0000      1.29
			--- FeatureBase.pm       26 Apr 2007 16:30:23
-0000
			***************
			*** 581,587 ****
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     return join "\n", at children;
			    }
			
			    return join("\n",$p, at children);
			--- 581,589 ----
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     #return join "\n", at children; 
			!     # Instead of above, additionally, contrive
to return (duplicated)
			common group values
			!     return(join("$group\n", at children) .
$group);
			    }
			
			    return join("\n",$p, at children); 
			

		-- 
		Lincoln D. Stein
		Cold Spring Harbor Laboratory
		1 Bungtown Road
		Cold Spring Harbor, NY 11724
		(516) 367-8380 (voice)
		(516) 367-8389 (fax)
		FOR URGENT MESSAGES & SCHEDULING, 
		PLEASE CONTACT MY ASSISTANT, 
		SANDRA MICHELSEN, AT michelse at cshl.edu 


From lstein at cshl.edu  Thu May  3 16:01:38 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 12:01:38 -0400
Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics
In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
Message-ID: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>

The width of the image is determined by the -width attribute and is given in
pixels. You cannot control the height of the image as it is computed
dynamically based on the number of features and bumping options.

Lincoln

On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>
> Dear Scot,
>
> > There is a fair amount of documentation in the perldoc for
> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have
> > you read that?
>
> I agreed, but I couldnt the exact information I needed :( (may be I missed
> something important).
>
> >  Also, for changing the scale, that should happen
> > automatically--have you tried yet?
>
> I tried by changing the Lincoln's program eg: blast3.pl
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
> to my
> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>
> But it had given me a smaller scale of length upto 300. I was looking for
> an option where I need same width and height of given image and a dynamic
> start and end values depending on length of my sequence. Since I couldnt
> accomplish, I thought of getting some help from you guys. I think I need
> to play a little bit with the value for reformat the scale to accomodate
> my hits as well.
>
> Thanks a lot for your inputs,
> --
> Shameer Khadar
> Lab (# 25) The Computational Biology Group
> National Centre for Biological Sciences (TIFR)
> GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
> T - 91-080-23666001 EXT - 6251
> W - http://www.ncbs.res.in
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bioperlanand at yahoo.com  Thu May  3 20:09:18 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Thu, 3 May 2007 13:09:18 -0700 (PDT)
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
Message-ID: <922386.19570.qm@web36808.mail.mud.yahoo.com>

Hi

I am using Bioperl 1.4 and I am trying to obtain protein sequences for specific Uniprot records.

For some records (ROA1_HUMAN), it prints the correct sequence, but  it first prints the warning "Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <STREAM> line 43." 

For other records (BOLA_HAEIN), it prints the correct sequence (without any warnings).

Here is the code:
-------------------------------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Bio::Perl;
use Bio::DB::SwissProt;

my $sp = new Bio::DB::SwissProt;

#my $seq_object  = $sp->get_Seq_by_id('ROA1_HUMAN');
my $seq_object  = $sp->get_Seq_by_id('BOLA_HAEIN');

my $sequence_as_a_string = $seq_object->seq();
print "$sequence_as_a_string\n";
-------------------------------------------------------------------------------------------

 Is there something I need to fix.

Thanks in advance for the help.
 
 Anand

       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.


From MEC at stowers-institute.org  Thu May  3 20:19:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 3 May 2007 15:19:00 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E5047A12B1@exchkc02.stowers-institute.org>
	<CED81D34E37D5043A1211565277A51E507E2307A@exchkc02.stowers-institute.org>
	<6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E230A6@exchkc02.stowers-institute.org>

Lincoln,
 
Ah, yes, round-tripping GFF, the holy grail....
 
Unfortunately, I don't really have a baseline to go against for an
example that roundtrips successfully now.  Do you?
 
For example, after loading test data: 
 
> bp_seqfeature_load.PLS  bioperl-live/t/data/biodbgff/test.gff3
 
the Contig1 portion of which looks like this:
 
##gff-version 3
## sequence-region Contig1 1 37450
Contig1 confirmed transcript 1001 2000 42 + .
ID=Transcript:trans-1;Gene=abc-1;Gene=xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . ID=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . ID=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 ID=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 ID=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 ID=Transcript:trans-1
Contig1 est similarity 1001 1100 96 . . Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . . Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . . Target=EST:CEESC13F 201 250 +
Contig1 tc1 transposon 5001 6000 . + . ID=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
ID=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - . ID=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - . ID=Transcript:trans-2
 
 
and then generating output with
 
>bp_seqfeature_gff3.PLS --gff=1 -- seq_id Contig1  #  using a script I
just committed - I hope you like it.  Note: gff=1 => recurse 
 
we get output gff with problems such as:
 
    1 IDs get turned into Aliases
    2 the seqid of a Target attributes gets copied into the features
Name attribute
    3 supression of parents of homogeneous subfeatures doesn't work when
the parent has other subfeatures that those with its same type (i.e. the
transcript feature also has exon subfeatures)
 
look:
 
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed transcript 1001 2000 42 + .
Parent=2;Alias=Transcript:trans-1;Note=function+unknown;Gene=abc-1,xyz-2
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed transcript 30001 31000 . - .
Parent=9;Alias=Transcript:trans-2;Note=Terribly+interesting;Gene=xyz-2
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
with my new version of gff3_string (not yet commited), only the 3rd
problem is addressed, generating
 
bp_seqfeature_gff3.PLS --gff 1  -- seq_id Contig1
Contig1 est similarity 1001 1100 96 . .
Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 +
Contig1 est similarity 1201 1300 99 . .
Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 +
Contig1 est similarity 1401 1450 99 . .
Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 +
Contig1 confirmed transcript 1001 2000 42 + .
ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown
Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1
Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1
Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1
Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2
Contig1 confirmed transcript 30001 31000 . - .
ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting
Contig1 confirmed exon 30001 30100 . - .
Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown
Contig1 confirmed exon 30701 30800 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 confirmed exon 30801 31000 . - .
Parent=9;Alias=Transcript:trans-2
Contig1 . region 1 37450 . . . Name=Contig1;ID=1
 
 
I had to make another change to get this output though, since I had to
change the behaviour to
 
  # provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have at least one subfeature with the same type as the
  # feature itself (thus redefining Lincoln's "homogenous
  # parent/child" case, which previously required all children to have
  # the same type as parent)
 
 
I think you will agree this is the more desirable behaviour.
 
I would be happy to test any other GFF you suggest might be (more or
less) roundtripped.
 
What think you?
 
--Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Thursday, May 03, 2007 9:46 AM
	To: Cook, Malcolm
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcolm,
	
	For me, the major use case is that GFF3 files round-trip
correctly through the database. Do any of your use cases cover that?
	
	Lincoln
	
	
	On 5/2/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, 
		 
		Here for your comment and review is a very reworked
version of Bio::Graphics::FeatureBase->gff3_string.
		 
		The main difference is to that homogenous children get
ALL their attributes except for start/stop from the parent, including
the group.  I also provide option as to whether or now to "remove
extraneous level of parentage" called $preserveHomegenousParent.
		 
		There is an in-line comment and question for you in the
code body.
		 
		It works well in my hands to my use cases, but, I'm not
positive it is in the spirit of your intentions.
		 
		Cheers,
		 
		Malcolm
		 
		 
		sub gff3_string {
		  my ($self, $recurse, $preserveHomegenousParent,
		 
		      # Note: the following parameters, whose name
begins with '$_',
		      # are intended for recursive call only.
		 
		      $_parent,
		      $_self_is_hsf,  # is $self the child in a
homogeneous parent/child relationship?
		      $_hsf_parentgroup, # if so, what is the group (GFF
column 9) of the parent
		     ) = @_;
		 
		  # PURPOSE: Return GFF3 format for the feature $self.
Optionally
		  # $recurse to include GFF for any subfeatures of the
feature. If
		  # recursing, provide special handling to "remove an
extraneous level
		  # of parentage" (unless $preserveHomegenousParent) for
features
		  # which have subfeatures all of whose types are the
same as the
		  # feature itself (the "homogenous parent/child" case).
This usage is
		  # a convention for representing discontiguous
features; they may be
		  # created by using the -segment directive without
specifying a
		  # distinct -subtype in to `new` when creating a
		  # Bio::Graphics::FeatureBase (i.e.
Bio::DB::SeqFeature,
		  # Bio::Graphics::Feature).  Such homogenous
subfeatures created in
		  # this fashion DO NOT have the parent (GFF column 9)
attributes
		  # propogated to them; so, since they are all part of
the same
		  # parent, the ONLY difference relevant to GFF
production SHOULD be
		  # the $start and $end coordinates for their segment,
and ALL THIER
		  # OTHER ATTRIBUTES should be copied down from the
parent (including:
		  # strand, score, Name, ID, Parent, etc).
		 
		
		  my $hparentORself = $_self_is_hsf ? $_parent : $self;
# $self's parent, if it is a homogenous child, otherwise $self.
		 
		  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
		    my $homogenous = ! grep {$_->type ne $self->type}
@ssf; # will be TRUE only if  all subfeatures are the same type as
$self.
		    my $mygroup =
		      # compute $self's group if it is needed to be
passed down to
		      # subfeatures, unless it is already being passed
down (in which
		      # case there are (at least) 3 levels of homogenous
parent child
		      # (will this ever happen in practice???))
		      ! $homogenous ? '' : $_self_is_hsf ?
$_hsf_parentgroup : $self->format_attributes($_parent); 
		    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
		  } else {
		    my $name  = $hparentORself->name;
		    my $class = $hparentORself->class;
		    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
		    my $strand = ('-','.','+')[$self->strand+1]; 
		    # TODO: understand conditions under which this could
be other than
		    # hparentORself->strand.  In particular, why does
add_segment flip
		    # the strand when start > stop?  I thought this was
not allowed!
		    # Lincoln - any ideas?
		    my $p      = join("\t",
	
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
		        $self->start||'.',$self->stop||'.',
		        defined($hparentORself->score) ?
$hparentORself->score : '.',
		        $strand||'.',
		        defined($hparentORself->phase) ?
$hparentORself->phase : '.',
		        $group||'');
		  }
		}
		 
		
________________________________

			From: Cook, Malcolm 
			Sent: Friday, April 27, 2007 1:45 PM
			To: 'lincoln.stein at gmail.com'
			Cc: 'lstein at cshl.org'; 'bioperl list'
			Subject: RE: Handling discontiguous feature
locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
			
			
			Hi Lincoln,
			 
			Cool.
			 
			The principal of what I figured out I still
think holds but the implementation is slightly broke.  Improved patch
forthoming next week.
			 

			Malcolm Cook
			Database Applications Manager - Bioinformatics
			Stowers Institute for Medical Research - Kansas
City, Missouri
			  

________________________________

				From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
				Sent: Friday, April 27, 2007 12:45 PM
				To: Cook, Malcolm
				Cc: lstein at cshl.org; bioperl list
				Subject: Re: Handling discontiguous
feature locations in Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
				
				
				Hi Malcom,
				
				This is absolutely ok and you can go
ahead and commit. Thanks for figuring this out!
				
				Lincoln
				
				
				On 4/26/07, Cook, Malcolm <
MEC at stowers-institute.org <mailto:MEC at stowers-institute.org> > wrote: 

				Lincoln, et al,
				
				I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
				from a Bio::DB::SeqFeature::Store that
were initially created with
				-seqments (i.e. whose location was
discontiguous) does not display any
				other attributes in column 9 than
"Name".
				
				What do you think of the following patch
to Bio::Graphics::FeatureBase, 
				whose effect is to "contrive to return
(duplicated) common group values"
				(which otherwise get lost when
"collapsing" "homogenous" parent/child
				features)
				
				Another approach would be to copy the
attributes from the parent to the 
				children when the -seqments are first
created.
				
				Another approach would be to use
Bio::SeqFeature::Generic  as the db's
				-seqfeature_class and save with
-location being a Bio::Location::Split,
				but this was wrougth with other
problems. 
				
				Any other suggestions?  Do you want me
to commit this patch?
				
				Cheers,
				
				Malcolm
				
				Patch follows:
				
				
				Index: FeatureBase.pm
	
=================================================================== 
				RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
				retrieving revision 1.29
				diff -c -r1.29 FeatureBase.pm
				*** FeatureBase.pm      16 Apr 2007
19:55:33 -0000      1.29
				--- FeatureBase.pm       26 Apr 2007
16:30:23 -0000
				***************
				*** 581,587 ****
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     return join "\n", at children;
				    }
				
				    return join("\n",$p, at children);
				--- 581,589 ----
				      foreach (@children) {
				        s/Parent=/ID=/g;
				      } # replace Parent tag with ID
				!     #return join "\n", at children; 
				!     # Instead of above, additionally,
contrive to return (duplicated)
				common group values
				!     return(join("$group\n", at children)
. $group);
				    }
				
				    return join("\n",$p, at children); 
				

				-- 
				Lincoln D. Stein
				Cold Spring Harbor Laboratory
				1 Bungtown Road
				Cold Spring Harbor, NY 11724
				(516) 367-8380 (voice)
				(516) 367-8389 (fax)
				FOR URGENT MESSAGES & SCHEDULING, 
				PLEASE CONTACT MY ASSISTANT, 
				SANDRA MICHELSEN, AT michelse at cshl.edu 


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Thu May  3 20:57:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 3 May 2007 15:57:43 -0500
Subject: [Bioperl-l] a query on Obtaining UniProt sequences
In-Reply-To: <922386.19570.qm@web36808.mail.mud.yahoo.com>
References: <922386.19570.qm@web36808.mail.mud.yahoo.com>
Message-ID: <2930F3F1-2BFB-4320-9A2C-50DFE6F808A1@uiuc.edu>

I would update to BioPerl 1.5.2.  v.1.4 is 3 yrs old and there have  
been tons of changes both for sequence retrieval and parsers.

We can't predict when a new 'stable' release will be available but  
1.5.2 works well for most purposes.

chris

On May 3, 2007, at 3:09 PM, Anand Venkatraman wrote:

> Hi
>
> I am using Bioperl 1.4 and I am trying to obtain protein sequences  
> for specific Uniprot records.
> ...
>  Is there something I need to fix.
>
> Thanks in advance for the help.
>
>  Anand
>
>
> ---------------------------------
> Ahhh...imagining that irresistible "new car" smell?
>  Check outnew cars at Yahoo! Autos.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thiago.venancio at gmail.com  Thu May  3 21:12:35 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Thu, 3 May 2007 18:12:35 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
	<54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>
Message-ID: <44255ea80705031412n7abef247je70d2681bb3cc7ed@mail.gmail.com>

Hi all,

Just for record. I am getting good results to extract CDS from protein X dna
alignments by using the following procedure:

- BLASTX to identify the hits for each dna sequence (if you want to process
sequences for further multiple sequence alignment, it is important to record
the frames);

- fastx/y to refine the alignment between the protein and the dna. FASTX/Y
is is quite good, because it performs well with frame shifts and a allows
better identification of premature stop codons. In addition, the alignment
(and the CDS prediction) is better.

This is interesting to note, to avoid analysis of "phantom" mRNAs, which are
sequences that have stops, so merely looking at the blast can raise
misleading results sometimes.

Best.

Thiago


On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Hi -
> There are some tools that do this for you -- I've listed a few from a
> google search or from what I remember reading.  It would be great If you
> (and others!) are willing to contribute a little of the info of what you
> find that works for you to the wiki, that would be great as well.   A little
> HOWTO would be cool - here or on openwetware.org.
>
> Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
> EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2
>
> Ewan Birney's estwise as part of wise package also can help if you have a
> likely protein from BLAST you want to align to the est - estwise can handle
> frameshifts, but can be too slow for some people.  Exonerate's protein2dna
> model may also work here, but I haven't tried it.
>
> -jason
> On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:
>
> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed
> to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From lstein at cshl.edu  Thu May  3 21:35:57 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 May 2007 17:35:57 -0400
Subject: [Bioperl-l] CSHL is hiring
Message-ID: <6dce9a0b0705031435r3bc2d2ddlfca5ac02844b4ef0@mail.gmail.com>

Hi Folks,

Sorry for the spam. My group at CSHL is looking for a scientific programmer
with good software development credentials and some experience in
bioinformatics. Experience in object-oriented Perl programming is a strict
requirement.

This is to work on user interface development for several projects
including:

   - BioMart (data warehouse) project (www.biomart.org)
   - GBrowse genome browser (www.gmod.org/GBrowse)
   - Reactome pathways database (www.reactome.org)

I can offer salaries in the 60-80K range, depending on level of experience.
Please reply to lstein at cshl.edu.

Best,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From MEC at stowers-institute.org  Tue May  8 16:59:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 8 May 2007 11:59:10 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start
	and stop coordinates??
Message-ID: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>

Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
coordinates, 

as in:
  ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
&& $start > $stop;

I thought it is not legal for a feature to be so composed.  

Anyone know?

Cheers,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue May  8 17:12:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 12:12:45 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>

I believe all seqfeature location coordinates are designed to have  
start < stop for consistency; in cases where the strand matters (CDS,  
gene, etc.) then the strand is set to 1 or -1.  When start > stop,  
the two are reversed and the strand is flipped; at least that's the  
way locations are set up in BioPerl.

chris

On May 8, 2007, at 11:59 AM, Cook, Malcolm wrote:

> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Tue May  8 18:37:20 2007
From: juheymann at yahoo.com (Bohr)
Date: Tue, 8 May 2007 11:37:20 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
Message-ID: <10381379.post@talk.nabble.com>


Hi,

I installed bioperl under OSX Tiger via Fink. I tested the installation
using the test tutorial via: perl -w bptutorial.pl 5

The script failed indicating that the file to retrieve was missing. To
identify the problem, I used a script using 'get_sequence' that will
retrieve a file from 'genbank' or 'embl'. Both succeeded. If I replace it
with 'swiss' or 'swissprot' and substitute the ID with the identical ID as
in the tutorial, I am recreating the problem found with bptutorial.pl. Other
ID's do the same.

Any pointers on the origin of this finding would be greatly appreciated.
-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10381379
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Tue May  8 21:53:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 8 May 2007 16:53:04 -0500
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#");
In-Reply-To: <10381379.post@talk.nabble.com>
References: <10381379.post@talk.nabble.com>
Message-ID: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>

The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
etc) accessed via bioperl.

As a note, the bptutorial.pl has been moved to the bioperl wiki:

http://www.bioperl.org/wiki/Bptutorial

chris

On May 8, 2007, at 1:37 PM, Bohr wrote:

>
> Hi,
>
> I installed bioperl under OSX Tiger via Fink. I tested the  
> installation
> using the test tutorial via: perl -w bptutorial.pl 5
>
> The script failed indicating that the file to retrieve was missing. To
> identify the problem, I used a script using 'get_sequence' that will
> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
> replace it
> with 'swiss' or 'swissprot' and substitute the ID with the  
> identical ID as
> in the tutorial, I am recreating the problem found with  
> bptutorial.pl. Other
> ID's do the same.
>
> Any pointers on the origin of this finding would be greatly  
> appreciated.
> -- 
> View this message in context: http://www.nabble.com/problem-with- 
> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
> tf3711391.html#a10381379
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From juheymann at yahoo.com  Wed May  9 22:17:27 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:17:27 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10403903.post@talk.nabble.com>


Thank you for the feedback and the suggestion.

I installed 1.5.2 via Build.pl and the results were the same e.g. embl and
genbank worked fine, swissprot failed

Here is the output:

MSG: acc (CALX_YEAST) does not exist
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Did not provide a valid Bio::PrimarySeqI object
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SeqIO::fasta::write_seq
/sw/lib/perl5/5.8.6/Bio/SeqIO/fasta.pm:181

Before contemplating too much:
Here my question: how do I verify the update to 1.5.2? (I ran ./Build test
and that came back positive.) And what else could have gone wrong here?

What might be a clever way to troubleshoot this?


---------------------------------------------------------------------------

Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10403903
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From ursula_cox at btinternet.com  Wed May  9 22:12:26 2007
From: ursula_cox at btinternet.com (Ursula at BT)
Date: Wed, 9 May 2007 23:12:26 +0100
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
Message-ID: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>

Dear BioPerl List,

 
I'm new to BioPerl (and Perl for that matter). I have an array of enzyme
names, and a larger collection of enzymes (guaranteed to be a superset by
the way it's constructed). I need to make a new collection containing just
the enzymes corresponding to the names I have in the array.

 
I was hoping that something like:

 
my $all_rebase =
Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');

my $all_rebase_collection = $all_rebase->read();

 
my @enzymes =
('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','AccB1I','
AccB7I','AccI');

 
my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

foreach $enzyme (all_rebase_collection)

            {

            $new_collection($enzyme) if grep $_ eq $enzyme->name, @enzymes;

            }

 
would work, but I get a syntax error near "$new_collection(".

 
Any clues much appreciated,

 
Ursula Cox


From juheymann at yahoo.com  Wed May  9 22:38:42 2007
From: juheymann at yahoo.com (Bohr)
Date: Wed, 9 May 2007 15:38:42 -0700 (PDT)
Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); 
In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
References: <10381379.post@talk.nabble.com>
	<2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu>
Message-ID: <10404211.post@talk.nabble.com>


Thank you for pointing that out! I installed 1.5.2 via Build.pl. The scripts
work as expected now.


Chris Fields wrote:
> 
> The Fink BioPerl distribution is 1.5.1.  You'll need to update to v  
> 1.5.2 due to changes on the various remote servers (NCBI, UniProt,  
> etc) accessed via bioperl.
> 
> As a note, the bptutorial.pl has been moved to the bioperl wiki:
> 
> http://www.bioperl.org/wiki/Bptutorial
> 
> chris
> 
> On May 8, 2007, at 1:37 PM, Bohr wrote:
> 
>>
>> Hi,
>>
>> I installed bioperl under OSX Tiger via Fink. I tested the  
>> installation
>> using the test tutorial via: perl -w bptutorial.pl 5
>>
>> The script failed indicating that the file to retrieve was missing. To
>> identify the problem, I used a script using 'get_sequence' that will
>> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I  
>> replace it
>> with 'swiss' or 'swissprot' and substitute the ID with the  
>> identical ID as
>> in the tutorial, I am recreating the problem found with  
>> bptutorial.pl. Other
>> ID's do the same.
>>
>> Any pointers on the origin of this finding would be greatly  
>> appreciated.
>> -- 
>> View this message in context: http://www.nabble.com/problem-with- 
>> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- 
>> tf3711391.html#a10381379
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10404211
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Wed May  9 23:37:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 May 2007 18:37:33 -0500
Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection
In-Reply-To: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
References: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore>
Message-ID: <E4472E55-AADB-4697-8C4D-2EC231923F0B@uiuc.edu>


On May 9, 2007, at 5:12 PM, Ursula at BT wrote:

> Dear BioPerl List,
>
>
>
> I'm new to BioPerl (and Perl for that matter). I have an array of  
> enzyme
> names, and a larger collection of enzymes (guaranteed to be a  
> superset by
> the way it's constructed). I need to make a new collection  
> containing just
> the enzymes corresponding to the names I have in the array.

First, prior to using BioPerl you should really brush up on perl  
itself (Learning Perl, or James Tisdall's Perl for Bioinformatics  
books, the former preferred).  Though there are several scripts  
available to get you started with Bioperl, much of the code is  
written with the expectation that you can write and debug a basic  
perl script (and there is some expectation that you are somewhat  
familiar with OO Perl).

Saying that, let's see what's wrong...

> I was hoping that something like:
>
>
>
> my $all_rebase =
> Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet');
>
> my $all_rebase_collection = $all_rebase->read();

The 'bionet' format is not supported; only 'withrefm', 'itype2',  
'bairoch' are (the latter only experimentally).  See 'perldoc  
Bio::Restriction::IO'.

> my @enzymes =
> ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','Acc 
> B1I','
> AccB7I','AccI');
>
>
>
> my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1);

Missing a new() constructor here.

> foreach $enzyme (all_rebase_collection)

Not sure what this is.  No '$' sigil for $all_rebase_collection will  
make the compiler look for (and fail to find) the sub  
all_rebase_collection().

>
>             {
>
>             $new_collection($enzyme) if grep $_ eq $enzyme->name,  
> @enzymes;
>
>             }
>
>
>
> would work, but I get a syntax error near "$new_collection(".

Yep.  You don't have your grep sub block in brackets {}, hence the  
error.  See 'perldoc -f grep'.

> Any clues much appreciated,
>
>
>
> Ursula Cox

No prob, but again you might want to brush up on perl.

chris


From darin.london at duke.edu  Thu May 10 16:17:38 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Thu, 10 May 2007 12:17:38 -0400
Subject: [Bioperl-l] BOSC 2007 Second Call For Papers
Message-ID: <200705101617.l4AGHceI002463@tenero.duhs.duke.edu>


The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th.  The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions.   Thank you, and we hope to see you in Vienna.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From lstein at cshl.edu  Thu May 10 17:13:09 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 10 May 2007 13:12:09 -0401
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0705101013w1923c173l5ec5d9288c67c9a2@mail.gmail.com>

It's a workaround for some broken data sources. It should "never happen."

Lincoln

On 5/8/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Why does Bio::DB::GFF::Feature::gff3_string swap start and stop
> coordinates,
>
> as in:
>   ($start,$stop) = ($stop,$start) if defined($start) && defined($stop)
> && $start > $stop;
>
> I thought it is not legal for a feature to be so composed.
>
> Anyone know?
>
> Cheers,
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Bank.Beszteri at awi.de  Thu May 10 16:13:00 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Thu, 10 May 2007 18:13:00 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
Message-ID: <4643448C.4000807@awi.de>

Dear Bioperl folks,

I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
but in some things it did not behave as I expected it to, so I had to 
look inside a bit.
In particular, I had problems with mixed up bootstrap values after 
re-rooting. After looking into the Bio::Tree::Tree data structures, it 
seems that

a) bootstrap values are stored as attributes of nodes of the tree [to my 
understanding, they should rather be attributes of branches but 
Bio::Tree::Tree apparently tries to simplify away branches]; each node 
stores the bootstrap value belonging to the branch that connects it to 
its ancestor node (I?m reading in trees from Newick strings, and 
bootstrap values arrive in the id fields of internal branches)

b) when re-rooting a tree, bootstrap values stay with the same node 
where they were before. Because the node that used to be the ancestor of 
a particular node in the original tree might have become its descendant 
after re-rooting, the bootstrap values are being mixed up.

Can you confirm my conclusion? Whether yes or no, have you got an easy 
workaround or alternative solution to re-rooting trees (without having 
to touch the reroot method) or any other hints that could be useful for 
me to deal with this issue?

Cheers,

Bank


--
Dr. B?nk Beszteri
Alfred Wegener Institute for Polar and Marine Research


From dmessina at wustl.edu  Thu May 10 20:16:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 10 May 2007 15:16:48 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
Message-ID: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>

Hi everyone,

Shin Leong here at the Wash U GSC has written SearchIO-compliant  
cross_match parsing and result modules. Specifically,  
Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.

To my knowledge this functionality doesn't exist in BioPerl. Any  
comments or objections before I commit these to CVS?

Thanks,
Dave


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From aperezp at uma.es  Thu May 10 17:58:32 2007
From: aperezp at uma.es (=?ISO-8859-1?Q?=22Antonio_J=2E_P=E9rez=22?=)
Date: Thu, 10 May 2007 19:58:32 +0200
Subject: [Bioperl-l] Get Swiss Entry
Message-ID: <46435D48.4020309@uma.es>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/ca4e893e/attachment-0004.html>

From jason at bioperl.org  Thu May 10 20:53:28 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 10 May 2007 13:53:28 -0700
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <FDBE1855-6252-4902-B32B-E984EC6B22E9@bioperl.org>

Awesome!
On May 10, 2007, at 1:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment.p7s>

From cjfields at uiuc.edu  Fri May 11 04:55:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 10 May 2007 23:55:05 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
Message-ID: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>

Sounds good to me!  Any tests to be added?

chris

On May 10, 2007, at 3:16 PM, David Messina wrote:

> Hi everyone,
>
> Shin Leong here at the Wash U GSC has written SearchIO-compliant
> cross_match parsing and result modules. Specifically,
> Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult.
>
> To my knowledge this functionality doesn't exist in BioPerl. Any
> comments or objections before I commit these to CVS?
>
> Thanks,
> Dave
>
>
> --
> Dave Messina
> Senior Analyst, Assembly Group
> Genome Sequencing Center
> Washington University
> St. Louis, MO
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Fri May 11 05:42:53 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 11 May 2007 00:42:53 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
Message-ID: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>

> Sounds good to me!  Any tests to be added?

No tests right now as far as I can tell. I'm swamped personally, but  
perhaps I can persuade Mark Johnson over here to crank out a few.


From cjfields at uiuc.edu  Fri May 11 15:25:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 10:25:34 -0500
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
	<57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>
Message-ID: <B654B314-FE39-4DB2-9B2F-5C812CF3E257@uiuc.edu>

Thanks Mark!  I don't think you'll need to add a ton of tests; just  
enough to demo anything that you feel is necessary or specific to the  
parser.  These could go into SearchIO.t or their own test suite.

chris

On May 11, 2007, at 10:14 AM, Mark Johnson wrote:

>>> Sounds good to me!  Any tests to be added?
>>
>> No tests right now as far as I can tell. I'm swamped personally, but
>> perhaps I can persuade Mark Johnson over here to crank out a few.
>
> I'll see what I can do.  I just had to open my mouth about getting  
> this
> contributed back after I noticed it, so I suppose this is appropriate
> retribution.  8)
>
>


From mjohnson at watson.wustl.edu  Fri May 11 15:14:56 2007
From: mjohnson at watson.wustl.edu (Mark Johnson)
Date: Fri, 11 May 2007 10:14:56 -0500 (CDT)
Subject: [Bioperl-l] Cross_match parser and Search::Result object
In-Reply-To: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu>
	<1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu>
	<9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu>
Message-ID: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu>

>> Sounds good to me!  Any tests to be added?
>
> No tests right now as far as I can tell. I'm swamped personally, but
> perhaps I can persuade Mark Johnson over here to crank out a few.

I'll see what I can do.  I just had to open my mouth about getting this
contributed back after I noticed it, so I suppose this is appropriate
retribution.  8)


From golharam at umdnj.edu  Fri May 11 20:20:41 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 16:20:41 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
Message-ID: <000501c79409$d8c03480$f6028a0a@PICO>

I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


From jason at bioperl.org  Fri May 11 20:53:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 11 May 2007 13:53:19 -0700
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>

Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup();


-jason
On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri May 11 20:57:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 May 2007 15:57:23 -0500
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
	after itself
In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO>
References: <000501c79409$d8c03480$f6028a0a@PICO>
Message-ID: <41E91E58-48A5-4E29-B6BA-E9417BF17513@uiuc.edu>

cleanup() is supposed to clean up temp directory stuff; it's  
inherited from Bio::Tools::Run::WrapperBase.

chris

On May 11, 2007, at 3:20 PM, Ryan Golhar wrote:

> I'm running a large series of clustalw alignments.  After a large  
> number of
> alignments, my perl script would die indicating too many links were  
> open.  I
> checked my /tmp directory (while the script is running) and noticed  
> that the
> temp directory created for ClustalW are not removed until after the  
> script
> exists.
> How can I force the cleanup of these directories after I am done  
> with the
> alignment?
>
> My code is essentially this;
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> $aa_aln = $aln_factory->align(\@aa_seqs);
> open(STDOUT, ">&OLDOUT");
> $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);
>
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From golharam at umdnj.edu  Fri May 11 22:11:47 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 11 May 2007 18:11:47 -0400
Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
 after itself
In-Reply-To: <F09252F7-8C3E-41D8-883C-7EF91A50233D@bioperl.org>
Message-ID: <001301c79419$5e794e90$f6028a0a@PICO>

No, I didn't, but I will now.  Thanks.  Interestingly enough ClustalW
removes the files from within the temp directory, but not the temp directory
itself.
 
 
-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason
Stajich
Sent: Friday, May 11, 2007 4:53 PM
To: golharam at umdnj.edu
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up
after itself


Did you try adding this after your calls getting the CDS aln.

$aln_factory->cleanup(); 


-jason

On May 11, 2007, at 1:20 PM, Ryan Golhar wrote:


I'm running a large series of clustalw alignments.  After a large number of
alignments, my perl script would die indicating too many links were open.  I
checked my /tmp directory (while the script is running) and noticed that the
temp directory created for ClustalW are not removed until after the script
exists.
How can I force the cleanup of these directories after I am done with the
alignment?

My code is essentially this;

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
$aa_aln = $aln_factory->align(\@aa_seqs);
open(STDOUT, ">&OLDOUT");
$dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs);


Ryan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From goshng at gmail.com  Sat May 12 15:21:59 2007
From: goshng at gmail.com (Sang Chul Choi)
Date: Sat, 12 May 2007 11:21:59 -0400
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
Message-ID: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>

Hi,

One Bio::Seq's sequence is "ACGT" and I want this object to have
"ACGA" by changing the fouth letter from T to A. I thought I could do
this by reading sequence string through the method of seq(), changing
the string by perl's general function, and generating another Bio::Seq
object with the new string. This seems to be silly, a little bit.

Is there any simple way to do this? Or, is there any method of
Bio::Seq to do this: to change one letter at a particular position, or
additionally to change letters with some range?

Thank you,

Sang Chul


From jason at bioperl.org  Sat May 12 16:50:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 09:50:10 -0700
Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object
	without making another object?
In-Reply-To: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
References: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com>
Message-ID: <22C99635-C22D-4F51-AADD-5CCF595222DF@bioperl.org>

You can get/set the seq data via the seq() method.

use Bio::Seq;
my $seq = Bio::Seq->new(-seq => 'ACGT');

my $str = $seq->seq;
print $str, "\n";

substr($str,3,1,'A');
$seq->seq($str);
print $seq->seq, "\n";

On May 12, 2007, at 8:21 AM, Sang Chul Choi wrote:

> Hi,
>
> One Bio::Seq's sequence is "ACGT" and I want this object to have
> "ACGA" by changing the fouth letter from T to A. I thought I could do
> this by reading sequence string through the method of seq(), changing
> the string by perl's general function, and generating another Bio::Seq
> object with the new string. This seems to be silly, a little bit.
>
> Is there any simple way to do this? Or, is there any method of
> Bio::Seq to do this: to change one letter at a particular position, or
> additionally to change letters with some range?
>
> Thank you,
>
> Sang Chul
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Sat May 12 22:12:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 12 May 2007 15:12:56 -0700
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <4643448C.4000807@awi.de>
References: <4643448C.4000807@awi.de>
Message-ID: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>


On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:

> Dear Bioperl folks,
>
> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees,
> but in some things it did not behave as I expected it to, so I had to
> look inside a bit.
> In particular, I had problems with mixed up bootstrap values after
> re-rooting. After looking into the Bio::Tree::Tree data structures, it
> seems that
>
> a) bootstrap values are stored as attributes of nodes of the tree  
> [to my
> understanding, they should rather be attributes of branches but
> Bio::Tree::Tree apparently tries to simplify away branches]; each node
> stores the bootstrap value belonging to the branch that connects it to
> its ancestor node (I?m reading in trees from Newick strings, and
> bootstrap values arrive in the id fields of internal branches)

Please feel free to suggest an alternative implementation if you  
don't agree with the object model.    It has worked quite well in our  
hands so I'd be all ears for someone wanting to get in an do some  
more work on it.

We have answered the question as to why bootstrap values are internal  
ids many times on this list and I believe on the wiki -- the parser  
can't tell the difference between a node id and a bootstrap value  
because nexus uses the same slot for both.  if you know you have  
bootstrap values in the internal node it is trivial to process your  
tree and copy the values over.


for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
  $node->bootstrap($node->id);
  $node->id('');
}

I just added this as a method to TreeFunctionI so that it can be  
easily called now to help satisfy everyone who hopes that the toolkit  
can guess whether the internal nodes are bootstraps or identifiers.


>
> b) when re-rooting a tree, bootstrap values stay with the same node
> where they were before. Because the node that used to be the  
> ancestor of
> a particular node in the original tree might have become its  
> descendant
> after re-rooting, the bootstrap values are being mixed up.
>
> Can you confirm my conclusion? Whether yes or no, have you got an easy
> workaround or alternative solution to re-rooting trees (without having
> to touch the reroot method) or any other hints that could be useful  
> for
> me to deal with this issue?
>

I think you are right, but I am not clear what should be value for  
the internal node attached to the root now.

Note that is always helpful to provide example code illustrating your  
problem.  Here is an example which I think illustrates your problem.

use Bio::TreeIO;

my $in = Bio::TreeIO->new(-format => 'newick',
			  -fh => \*DATA);
my $out = Bio::TreeIO->new(-format => 'newick');
while( my $t = $in->next_tree ){
     my ($a) = $t->find_node(-id =>"A");
     $out->write_tree($t);
     $t->reroot($a);
     $out->write_tree($t);
}
__DATA__
(((A:5,B:5)90:2,C:4)25:3,D:10);


> Cheers,
>
> Bank
>
>
>
> --
> Dr. B?nk Beszteri
> Alfred Wegener Institute for Polar and Marine Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From darin.london at duke.edu  Mon May 14 14:44:56 2007
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Mon, 14 May 2007 10:44:56 -0400
Subject: [Bioperl-l] BOSC 2007 Abstract Submission Deadline Extended
Message-ID: <200705141444.l4EEium2026969@tenero.duhs.duke.edu>


Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st.  The announcement day will remain the same so that it remains before the Early Discount Date.

http://open-bio.org/wiki/BOSC_2007


The BOSC organizing Committee


Please pass this email on to anyone that would be interested.


From thiago.venancio at gmail.com  Mon May 14 18:54:44 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Mon, 14 May 2007 15:54:44 -0300
Subject: [Bioperl-l] get regions
Message-ID: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>

Hi all,

Using Bio::Seq, is there any easy way to get the coordinates where a
regular expression matches or should I build a sliding window?

For example, looking for a given promoter region in a FASTA file. If
the region is found, I would like to recover exactly the coordinates
where it matches.

Thanks in advance.

Thiago
-- 
"Doubt is not a pleasant condition, but certainty is absurd."
            Voltaire

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From jason at bioperl.org  Mon May 14 19:06:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 12:06:11 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>

I assume you are doing the matches on the string with =~ so Bio::Seq  
doesn't really help you here I don't think.
See the $` variable in Perl for how to capture the position of where  
a regexp matches.

-jason
On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:

> Hi all,
>
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
>
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
>
> Thanks in advance.
>
> Thiago
> -- 
> "Doubt is not a pleasant condition, but certainty is absurd."
>             Voltaire
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Mon May 14 19:15:09 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 14 May 2007 12:15:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>

I do this in perl with the pos() function.  This requires the use of the
match operator (m) like

if ($gene =~ m/$pattern/gi)
{
	$start = pos($gene) - length($pattern) + 1;
}

pos() returns the location of the pointer where the regex left off after
finding a match.  I remove the length of my pattern (which is just a
string with a few placeholder (.) wildcards, so I know how long the
match will always be).

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Monday, May 14, 2007 12:06 PM
> To: Thiago Venancio
> Cc: bioperl-l list
> Subject: Re: [Bioperl-l] get regions
> 
> I assume you are doing the matches on the string with =~ so 
> Bio::Seq doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position 
> of where a regexp matches.
> 
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> 
> > Hi all,
> >
> > Using Bio::Seq, is there any easy way to get the 
> coordinates where a 
> > regular expression matches or should I build a sliding window?
> >
> > For example, looking for a given promoter region in a FASTA 
> file. If 
> > the region is found, I would like to recover exactly the 
> coordinates 
> > where it matches.
> >
> > Thanks in advance.
> >
> > Thiago
> > --
> > "Doubt is not a pleasant condition, but certainty is absurd."
> >             Voltaire
> >
> > ========================
> > Thiago Motta Venancio, MSc
> > PhD student in Bioinformatics
> > University of Sao Paulo
> > ========================
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Bank.Beszteri at awi.de  Mon May 14 13:20:07 2007
From: Bank.Beszteri at awi.de (Bank Beszteri)
Date: Mon, 14 May 2007 15:20:07 +0200
Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem
In-Reply-To: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
References: <4643448C.4000807@awi.de>
	<1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org>
Message-ID: <46486207.60304@awi.de>

Dear Jason,

thanks for your answer! Sorry about having been ambiguous - it is clear 
that bootstrap values are parsed as ids from newick files, I had no 
problem with that, it was only the first step of the explanation of my 
problem, which was the rerooting issue.

Thanks for your example code as well, it is indeed really useful to 
illustrate the problem. I modified the original tree a bit to make my 
point clearer:

In your example, there are two internal node ids in a four-taxon tree. 
This is not a realistic situtation for bootstrap values, because 
bootstrap values are attached to bipartitions of terminal nodes, i.e., 
edges / branches of a tree (in what proportion of the bootstrap 
replicates was a particular bipartition recovered - an alternative 
representation of bootstraps, like produced e.g. by PAUP, is indeed a 
"taxon bipartition table"). This means that in a four taxon tree, we can 
have at most one bootstrap value - corresponding to the single 
non-trivial bipartition (all other bipartitions are trivial, i.e., they 
separate a terminal node from the rest).

So here is an example 4-taxon tree with a bootstrap value:

(A:52,(B:46,C:50)68:11,D:70);

After rerooting at node B (using your example code) it looks like

((B:46,C:50,(A:52,D:70):11)68);

Now there are two problems:
    1) this seems to be a small problem with TreeIO rather than with 
rerooting: there is an extra pair of parentheses around the whole tree;

but more importantly: 
    2) the bootstrap value appears at the root node, which is not 
sensible according to the convention that "each node stores the 
bootstrap value belonging to the branch linking it to its ancestor". You 
would like the bootstrap value appear at the node connecting A & D in 
this situation, which would look like

(B:46,C:50,(A:52,D:70)68:11);

because in  this new situation, this position would correspond to the 
same bipartition as in the original tree [which is (A,D)(B,C)].

In the meanwhile, I got a mail showing me the solution (thx Daniel!), 
which is in fact pretty simple: all that has to be done is go through 
the nodes on the path from the old to the new root after rerooting, and 
for each node, take the bootstrap values from its ancestor (and remove 
it from the ancestor). This leaves the root node without a bootstrap 
value, which is exactly what you want (because it has no branch 
connecting it to its ancestor, there is no sensible bootstrap value 
attached to a root node).

So this exercise tells me that bootstraps and "real" node ids should be 
handled in different manners when rerooting: real ids should of course 
stay with the nodes, whereas bootstrap values on the path between the 
new and old root should move over to the other end of the corresponding 
branch.

Best wishes,

Bank

Jason Stajich wrote:
>
> On May 10, 2007, at 9:13 AM, Bank Beszteri wrote:
>
>> Dear Bioperl folks,
>>
>> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, 
>> but in some things it did not behave as I expected it to, so I had to 
>> look inside a bit.
>> In particular, I had problems with mixed up bootstrap values after 
>> re-rooting. After looking into the Bio::Tree::Tree data structures, it 
>> seems that
>>
>> a) bootstrap values are stored as attributes of nodes of the tree [to my 
>> understanding, they should rather be attributes of branches but 
>> Bio::Tree::Tree apparently tries to simplify away branches]; each node 
>> stores the bootstrap value belonging to the branch that connects it to 
>> its ancestor node (I?m reading in trees from Newick strings, and 
>> bootstrap values arrive in the id fields of internal branches)
>
> Please feel free to suggest an alternative implementation if you don't 
> agree with the object model.    It has worked quite well in our hands 
> so I'd be all ears for someone wanting to get in an do some more work 
> on it.
>
> We have answered the question as to why bootstrap values are internal 
> ids many times on this list and I believe on the wiki -- the parser 
> can't tell the difference between a node id and a bootstrap value 
> because nexus uses the same slot for both.  if you know you have 
> bootstrap values in the internal node it is trivial to process your 
> tree and copy the values over.  
>
>
> for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) {
>  $node->bootstrap($node->id); 
>  $node->id('');
> }
>
> I just added this as a method to TreeFunctionI so that it can be 
> easily called now to help satisfy everyone who hopes that the toolkit 
> can guess whether the internal nodes are bootstraps or identifiers.
>
>
>>
>> b) when re-rooting a tree, bootstrap values stay with the same node 
>> where they were before. Because the node that used to be the ancestor of 
>> a particular node in the original tree might have become its descendant 
>> after re-rooting, the bootstrap values are being mixed up.
>>
>> Can you confirm my conclusion? Whether yes or no, have you got an easy 
>> workaround or alternative solution to re-rooting trees (without having 
>> to touch the reroot method) or any other hints that could be useful for 
>> me to deal with this issue?
>>
>
> I think you are right, but I am not clear what should be value for the 
> internal node attached to the root now.
>
> Note that is always helpful to provide example code illustrating your 
> problem.  Here is an example which I think illustrates your problem.
>
> use Bio::TreeIO;
>
> my $in = Bio::TreeIO->new(-format => 'newick',
>   -fh => \*DATA);
> my $out = Bio::TreeIO->new(-format => 'newick');
> while( my $t = $in->next_tree ){
>     my ($a) = $t->find_node(-id =>"A");
>     $out->write_tree($t);
>     $t->reroot($a);
>     $out->write_tree($t);
> }
> __DATA__
> (((A:5,B:5)90:2,C:4)25:3,D:10);
>
>
>> Cheers,
>>
>> Bank
>>
>>
>>
>> --
>> Dr. B?nk Beszteri
>> Alfred Wegener Institute for Polar and Marine Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
>
>


From basu at pharm.sunysb.edu  Mon May 14 19:10:33 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Mon, 14 May 2007 15:10:33 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
Message-ID: <4648B429.2030907@pharm.sunysb.edu>

Thiago Venancio wrote:
> Hi all,
> 
> Using Bio::Seq, is there any easy way to get the coordinates where a
> regular expression matches or should I build a sliding window?
The perl core function "pos" should help you in this case. Do a 'perldoc
-f pos' for details.

-sidd


> 
> For example, looking for a given promoter region in a FASTA file. If
> the region is found, I would like to recover exactly the coordinates
> where it matches.
> 
> Thanks in advance.
> 
> Thiago


From cjfields at uiuc.edu  Mon May 14 20:48:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 May 2007 15:48:36 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
Message-ID: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>

I use pos() with m{}g; the quoted globals tend to slow things down  
for me.

Ah, see Kevin's answered that...

chris

On May 14, 2007, at 2:06 PM, Jason Stajich wrote:

> I assume you are doing the matches on the string with =~ so Bio::Seq
> doesn't really help you here I don't think.
> See the $` variable in Perl for how to capture the position of where
> a regexp matches.
>
> -jason
> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>
>> Hi all,
>>
>> Using Bio::Seq, is there any easy way to get the coordinates where a
>> regular expression matches or should I build a sliding window?
>>
>> For example, looking for a given promoter region in a FASTA file. If
>> the region is found, I would like to recover exactly the coordinates
>> where it matches.
>>
>> Thanks in advance.
>>
>> Thiago
>> -- 
>> "Doubt is not a pleasant condition, but certainty is absurd."
>>             Voltaire
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon May 14 21:50:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 May 2007 14:50:09 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu>
Message-ID: <A5BECADC-6516-41FF-A5DB-EE865AD63842@bioperl.org>

yep you are right pos() much better and faster for getting the position.

-j
On May 14, 2007, at 1:48 PM, Chris Fields wrote:

> I use pos() with m{}g; the quoted globals tend to slow things down  
> for me.
>
> Ah, see Kevin's answered that...
>
> chris
>
> On May 14, 2007, at 2:06 PM, Jason Stajich wrote:
>
>> I assume you are doing the matches on the string with =~ so Bio::Seq
>> doesn't really help you here I don't think.
>> See the $` variable in Perl for how to capture the position of where
>> a regexp matches.
>>
>> -jason
>> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
>>
>>> Hi all,
>>>
>>> Using Bio::Seq, is there any easy way to get the coordinates where a
>>> regular expression matches or should I build a sliding window?
>>>
>>> For example, looking for a given promoter region in a FASTA file. If
>>> the region is found, I would like to recover exactly the coordinates
>>> where it matches.
>>>
>>> Thanks in advance.
>>>
>>> Thiago
>>> -- 
>>> "Doubt is not a pleasant condition, but certainty is absurd."
>>>             Voltaire
>>>
>>> ========================
>>> Thiago Motta Venancio, MSc
>>> PhD student in Bioinformatics
>>> University of Sao Paulo
>>> ========================
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From sac at bioperl.org  Tue May 15 01:46:55 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 14 May 2007 18:46:55 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
Message-ID: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>

On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> I do this in perl with the pos() function.  This requires the use of the
> match operator (m) like
>
> if ($gene =~ m/$pattern/gi)
> {
>         $start = pos($gene) - length($pattern) + 1;
> }
>
> pos() returns the location of the pointer where the regex left off after
> finding a match.

Cool. I hadn't known that was possible.

> I remove the length of my pattern (which is just a
> string with a few placeholder (.) wildcards, so I know how long the
> match will always be).

To generalize your code so that it will work for any pattern, such as
one that can match strings of variable length like "A{5,10}", just
subtract the length of the actual string that was matched:

if ($gene =~ m/$pattern/gi)
{
    $start = pos($gene) - length($&) + 1;
 }

Steve

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > Jason Stajich
> > Sent: Monday, May 14, 2007 12:06 PM
> > To: Thiago Venancio
> > Cc: bioperl-l list
> > Subject: Re: [Bioperl-l] get regions
> >
> > I assume you are doing the matches on the string with =~ so
> > Bio::Seq doesn't really help you here I don't think.
> > See the $` variable in Perl for how to capture the position
> > of where a regexp matches.
> >
> > -jason
> > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> >
> > > Hi all,
> > >
> > > Using Bio::Seq, is there any easy way to get the
> > coordinates where a
> > > regular expression matches or should I build a sliding window?
> > >
> > > For example, looking for a given promoter region in a FASTA
> > file. If
> > > the region is found, I would like to recover exactly the
> > coordinates
> > > where it matches.
> > >
> > > Thanks in advance.
> > >
> > > Thiago
> > > --
> > > "Doubt is not a pleasant condition, but certainty is absurd."
> > >             Voltaire
> > >
> > > ========================
> > > Thiago Motta Venancio, MSc
> > > PhD student in Bioinformatics
> > > University of Sao Paulo
> > > ========================
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shameer at ncbs.res.in  Tue May 15 03:03:57 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Tue, 15 May 2007 08:33:57 +0530 (IST)
Subject: [Bioperl-l] How to produce Bio::Graphics images using PROSITE
	output ?
In-Reply-To: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
References: <10259461.post@talk.nabble.com>
	<a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>
	<41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in>
	<1178028249.2644.13.camel@localhost.localdomain>
	<42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in>
	<6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com>
Message-ID: <49697.192.168.1.1.1179198237.squirrel@mail.ncbs.res.in>

Dear All,

Thanks a lot for all your inputs [Help : Imagemaps using Bio::Graphics ].
I am still working on the other part of this project. Now, I am sure that
I can impliment it using Bio::Graphics. I will come back to imagemaps with
in a week or two.

Meanwhile, I need to parse a prosite output to present it as a
Bio::Graphics image. Any one had tries Bio::Graphics to create images
using prosite output ? I tried in the How-to I couldnt find anything
related to prosite.

My output looks like this :
    >Sequence : PS00001 ASN_GLYCOSYLATION N-glycosylation site.
          75 - 78  NGSM
    >Sequence : PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation
site.
         41 - 43  SpK
    >Sequence : PS00008 MYRISTYL N-myristoylation site.
           6 - 11  GTitNQ
    >Sequence : PS00009 AMIDATION Amidation site.
          78 - 81  mGKR

I need to impliment an image like blast-parser image.
Thanks to any inputs/pointers.

> The width of the image is determined by the -width attribute and is given
> in
> pixels. You cannot control the height of the image as it is computed
> dynamically based on the number of features and bumping options.
>
> Lincoln
>
> On 5/1/07, Shameer Khadar <shameer at ncbs.res.in> wrote:
>>
>> Dear Scot,
>>
>> > There is a fair amount of documentation in the perldoc for
>> > Bio::Graphics::Panel under the section called 'Creating Imagemaps';
>> have
>> > you read that?
>>
>> I agreed, but I couldnt the exact information I needed :( (may be I
>> missed
>> something important).
>>
>> >  Also, for changing the scale, that should happen
>> > automatically--have you tried yet?
>>
>> I tried by changing the Lincoln's program eg: blast3.pl
>> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
>> to my
>> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300);
>>
>> But it had given me a smaller scale of length upto 300. I was looking
>> for
>> an option where I need same width and height of given image and a
>> dynamic
>> start and end values depending on length of my sequence. Since I couldnt
>> accomplish, I thought of getting some help from you guys. I think I need
>> to play a little bit with the value for reformat the scale to accomodate
>> my hits as well.
>>
>> Thanks a lot for your inputs,
>> --
>> Shameer Khadar


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From bix at sendu.me.uk  Tue May 15 08:23:52 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 15 May 2007 09:23:52 +0100
Subject: [Bioperl-l] New Blast parser
Message-ID: <46496E18.1000809@sendu.me.uk>

Back in August of last year I introduced Bio::PullParserI, a module that 
aids in the creation of fast SearchIO and Search modules. I've finally 
gotten around to implementing a Blast parser using the interface, which 
I've called Bio::SearchIO::blast_pull.

To use it you say:

my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");

or in the near future (when I've committed StandAloneBlast changes):

my $sab = Bio::Tools::Run::StandAloneBlast->new(-_READMETHOD => 
"blast_pull");


Currently the parser is incomplete: I've only tested it with NCBI BLASTN 
and BLASTP. However, results are promising. In one particular real-world 
usage-case involving running and parsing multiple Blast jobs via 
StandAloneBlast (amongst other things), changing only the _READMETHOD 
from 'blast' to 'blast_pull' in the code dropped run time from 20223s to 
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40% 
less).

Please try it out and feed-back any bugs you discover.


Cheers,
Sendu.


From aaron.j.mackey at gsk.com  Tue May 15 14:30:13 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 15 May 2007 10:30:13 -0400
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <OFFA3F7652.5382601A-ON852572DC.004F5B2A-852572DC.004FAF72@gsk.com>

Or, use a zero-width, positive look ahead assertion, and don't incur the 
penalty of either $` or $&:

  if ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
  }

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/14/2007 09:46:55 PM:

> On 5/14/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > I do this in perl with the pos() function.  This requires the use of 
the
> > match operator (m) like
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >         $start = pos($gene) - length($pattern) + 1;
> > }
> >
> > pos() returns the location of the pointer where the regex left off 
after
> > finding a match.
> 
> Cool. I hadn't known that was possible.
> 
> > I remove the length of my pattern (which is just a
> > string with a few placeholder (.) wildcards, so I know how long the
> > match will always be).
> 
> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
> 
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
> 
> Steve
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > > Jason Stajich
> > > Sent: Monday, May 14, 2007 12:06 PM
> > > To: Thiago Venancio
> > > Cc: bioperl-l list
> > > Subject: Re: [Bioperl-l] get regions
> > >
> > > I assume you are doing the matches on the string with =~ so
> > > Bio::Seq doesn't really help you here I don't think.
> > > See the $` variable in Perl for how to capture the position
> > > of where a regexp matches.
> > >
> > > -jason
> > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote:
> > >
> > > > Hi all,
> > > >
> > > > Using Bio::Seq, is there any easy way to get the
> > > coordinates where a
> > > > regular expression matches or should I build a sliding window?
> > > >
> > > > For example, looking for a given promoter region in a FASTA
> > > file. If
> > > > the region is found, I would like to recover exactly the
> > > coordinates
> > > > where it matches.
> > > >
> > > > Thanks in advance.
> > > >
> > > > Thiago
> > > > --
> > > > "Doubt is not a pleasant condition, but certainty is absurd."
> > > >             Voltaire
> > > >
> > > > ========================
> > > > Thiago Motta Venancio, MSc
> > > > PhD student in Bioinformatics
> > > > University of Sao Paulo
> > > > ========================
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > jason at bioperl.org
> > > http://jason.open-bio.org/
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From diogoat at gmail.com  Tue May 15 22:44:59 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 19:44:59 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
Message-ID: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>

Dear All,

I need to download a lot of sequence of Leishmania major in genbank
format...
But i can't download on the page of NCBI, because the downloaded file are
corrupted... when i use a browser to download this sequences
And them i looking for some script to download that`s file and fink
something like that:


#########################################################
use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>teste6.gb');
$out->write_seq($seqio);
#########################################################

And the system return me this erros
[diogo1 at genome perl]$ perl teste6.pl

-------------------- WARNING ---------------------
MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
Attempting to dump, but may fail!
---------------------------------------------------
Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.

Any Ideia?

Thank`s

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http:biowebdb.org <http://www.ncbs.res.in/>


From diogoat at gmail.com  Tue May 15 23:27:05 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 May 2007 20:27:05 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>
Message-ID: <638512560705151627t2e25f17cg7f820f3097a67748@mail.gmail.com>

Thank for your help Barry!!

It`s work very fine and i`'m using the script... like you said...
The error was on the print that`s right?
I need to use a while to print all sequeces...

Thanks a Lot

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

2007/5/15, Barry Moore <barry.moore at genetics.utah.edu>:
>
> Diogo-
>
> write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO
> object.  Try this
>
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                  (-query   =>'Leishmania major
> [Organism]',
>                                   -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                            -file => '>>teste6.gb');
> while (my $seq = $seqio->next_seq) {
>          $out->write_seq($seq);
> }
>
> Barry
>
> On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:
>
> > Dear All,
> >
> > I need to download a lot of sequence of Leishmania major in genbank
> > format...
> > But i can't download on the page of NCBI, because the downloaded
> > file are
> > corrupted... when i use a browser to download this sequences
> > And them i looking for some script to download that`s file and fink
> > something like that:
> >
> >
> > #########################################################
> > use strict;
> > use warnings;
> >
> > use Bio::Seq;
> > use Bio::SeqIO;
> > use Bio::DB::GenBank;
> >
> > my $query = Bio::DB::Query::GenBank->new
> >                                 (-query   =>'Leishmania major
> > [Organism]',
> >                                 -db      => 'nucleotide');
> > my $gb = new Bio::DB::GenBank;
> > my $seqio = $gb->get_Stream_by_query($query);
> >
> > my $out = Bio::SeqIO->new(-format => 'genbank',
> >                           -file => '>>teste6.gb');
> > $out->write_seq($seqio);
> > #########################################################
> >
> > And the system return me this erros
> > [diogo1 at genome perl]$ perl teste6.pl
> >
> > -------------------- WARNING ---------------------
> > MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant
> > module.
> > Attempting to dump, but may fail!
> > ---------------------------------------------------
> > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
> >
> > Any Ideia?
> >
> > Thank`s
> >
> > Diogo Tschoeke
> > Laboratory of Molecular Biology of Trypanosomatides
> > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> > http://biowebdb.org
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From barry.moore at genetics.utah.edu  Tue May 15 23:17:39 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue, 15 May 2007 17:17:39 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format
In-Reply-To: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <A303709D-043B-4FCB-B1F2-2603A8FF48A8@genetics.utah.edu>

Diogo-

write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO  
object.  Try this

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                 (-query   =>'Leishmania major  
[Organism]',
                                  -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'genbank',
                           -file => '>>teste6.gb');
while (my $seq = $seqio->next_seq) {
         $out->write_seq($seq);
}

Barry

On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded  
> file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
> module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 16 02:44:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 May 2007 21:44:43 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
Message-ID: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>


On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
...

> To generalize your code so that it will work for any pattern, such as
> one that can match strings of variable length like "A{5,10}", just
> subtract the length of the actual string that was matched:
>
> if ($gene =~ m/$pattern/gi)
> {
>     $start = pos($gene) - length($&) + 1;
>  }
>
> Steve

Right, but $& (as well as $` and $') inflict a significant penalty  
for their use, as Aaron alludes to.  Their use, even indirectly via a  
library module, can cause a significant performance hit.

chris


From sac at bioperl.org  Wed May 16 08:16:38 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 01:16:38 -0700
Subject: [Bioperl-l] get regions
In-Reply-To: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
Message-ID: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>

On 5/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On May 14, 2007, at 8:46 PM, Steve Chervitz wrote:
> ...
>
> > To generalize your code so that it will work for any pattern, such as
> > one that can match strings of variable length like "A{5,10}", just
> > subtract the length of the actual string that was matched:
> >
> > if ($gene =~ m/$pattern/gi)
> > {
> >     $start = pos($gene) - length($&) + 1;
> >  }
> >
> > Steve
>
> Right, but $& (as well as $` and $') inflict a significant penalty
> for their use, as Aaron alludes to.  Their use, even indirectly via a
> library module, can cause a significant performance hit.
>
> chris

Yes. I had forgotten how poisonous $&, $` and $' were to regex
performance. Please forgive me. We might consider regularly auditing
the bioperl module tree for use of these in committed code.

But regarding the use of the look ahead assertion, there's a problem
if you want to find *all* occurrences of the pattern in a target
string and the pattern can have variable length hits: it may report
overlapping hits because it only collects the starting points of the
match, and does not determine how long each match would be. For
example:

$gene = 'TTTAAAAAAAAGG';
$pattern="A{5,10}";
while ($gene =~ m/(?=$pattern)/gi) {
    $start = pos($gene) + 1;
    print ++$hit, " hit starts at $start\n";
}

Generates:
1 hit starts at 4
2 hit starts at 5
3 hit starts at 6
4 hit starts at 7

You could get around this by imposing a constraint to avoid trivial
overlaps. OK if you know the length of the pattern, but not so good
for more complex patterns. If there was I way to get the look ahead to
match the longest string possible for a variable length pattern, then
this approach could work, but I'm not sure if that is possible.

Here's a solution I think does the job of reporting the extent of each
match without a performance hit and works for patterns of any
complexity, taking advantage of the special arrays containing hit
indexes, @- and @+:

$gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
while ($gene =~ m/$pattern/gi){
    $hit++;
    printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
}

Generates:
1 hit at:  4 - 11
2 hit at: 16 - 21

You can also use this approach to report the locations of any internal
back references, if the pattern contains any parentheses, via $-[1],
$+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
patterns, but patterns not containing parens won't be penalized.

Steve


From georg.otto at tuebingen.mpg.de  Wed May 16 09:19:06 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 16 May 2007 11:19:06 +0200
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
Message-ID: <m17ir9m8hh.fsf@tuebingen.mpg.de>


Dear all,

I have a problem that has to do with downloading data from GenBank as
well, therefor I put it in this thread.

I try to get all entries from organism Danio rerio using the something
like this:


use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $query = "Danio rerio[ORGN]";
my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
					       -query => $query);
my $gb_obj = Bio::DB::GenBank->new;
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);


while (my $seq_obj = $stream_obj->next_seq) {
  my $out = Bio::SeqIO->new(-format => 'fasta',
			    -file => '>>output.fas');
  $out->write_seq($seq_obj);
}


However, the download process aborts after a few thousand entries. I
do not think that this is due to the request itself or problems with
specific entries, since the number of transferred sequences varies
before the stop. It might rather have to do with GenBank terminating
the connection.

Has anybody a suggestion of a better strategy to achieve what I want
(e.g. a different kind of query, a method to reassume the download at
the point where it terminated etc.)?

Best,

Georg


"Diogo Tschoeke" <diogoat at gmail.com> writes:

> Dear All,
>
> I need to download a lot of sequence of Leishmania major in genbank
> format...
> But i can't download on the page of NCBI, because the downloaded file are
> corrupted... when i use a browser to download this sequences
> And them i looking for some script to download that`s file and fink
> something like that:
>
>
> #########################################################
> use strict;
> use warnings;
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>teste6.gb');
> $out->write_seq($seqio);
> #########################################################
>
> And the system return me this erros
> [diogo1 at genome perl]$ perl teste6.pl
>
> -------------------- WARNING ---------------------
> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module.
> Attempting to dump, but may fail!
> ---------------------------------------------------
> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>
> Any Ideia?
>
> Thank`s
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http:biowebdb.org <http://www.ncbs.res.in/>


From cjfields at uiuc.edu  Wed May 16 13:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 08:05:59 -0500
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <m17ir9m8hh.fsf@tuebingen.mpg.de>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
Message-ID: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>

It's likely from a timeout issue on the remote server.  One thing  
which will speed things up is to retrieve the remote sequences in  
fasta format to begin with (described in the Bio::DB::GenBank POD):

my $gb_obj = Bio::DB::GenBank->new(-retrievaltype => 'tempfile' ,
			                      -format => 'fasta');
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);

while (my $seq_obj = $stream_obj->next_seq) {
   $out->write_seq($seq_obj);
}

I also suggest using the direct ftp downloads if at all possible  
(i.e. you are downloading WGS or contig sequences).  It's much faster.

chris

On May 16, 2007, at 4:19 AM, Georg Otto wrote:

>
> Dear all,
>
> I have a problem that has to do with downloading data from GenBank as
> well, therefor I put it in this thread.
>
> I try to get all entries from organism Danio rerio using the something
> like this:
>
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $query = "Danio rerio[ORGN]";
> my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
> 					       -query => $query);
> my $gb_obj = Bio::DB::GenBank->new;
> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
>
> while (my $seq_obj = $stream_obj->next_seq) {
>   my $out = Bio::SeqIO->new(-format => 'fasta',
> 			    -file => '>>output.fas');
>   $out->write_seq($seq_obj);
> }
>
>
> However, the download process aborts after a few thousand entries. I
> do not think that this is due to the request itself or problems with
> specific entries, since the number of transferred sequences varies
> before the stop. It might rather have to do with GenBank terminating
> the connection.
>
> Has anybody a suggestion of a better strategy to achieve what I want
> (e.g. a different kind of query, a method to reassume the download at
> the point where it terminated etc.)?
>
> Best,
>
> Georg
>
>
> "Diogo Tschoeke" <diogoat at gmail.com> writes:
>
>> Dear All,
>>
>> I need to download a lot of sequence of Leishmania major in genbank
>> format...
>> But i can't download on the page of NCBI, because the downloaded  
>> file are
>> corrupted... when i use a browser to download this sequences
>> And them i looking for some script to download that`s file and fink
>> something like that:
>>
>>
>> #########################################################
>> use strict;
>> use warnings;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> my $query = Bio::DB::Query::GenBank->new
>>                                 (-query   =>'Leishmania major  
>> [Organism]',
>>                                 -db      => 'nucleotide');
>> my $gb = new Bio::DB::GenBank;
>> my $seqio = $gb->get_Stream_by_query($query);
>>
>> my $out = Bio::SeqIO->new(-format => 'genbank',
>>                           -file => '>>teste6.gb');
>> $out->write_seq($seqio);
>> #########################################################
>>
>> And the system return me this erros
>> [diogo1 at genome perl]$ perl teste6.pl
>>
>> -------------------- WARNING ---------------------
>> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
>> module.
>> Attempting to dump, but may fail!
>> ---------------------------------------------------
>> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
>> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>>
>> Any Ideia?
>>
>> Thank`s
>>
>> Diogo Tschoeke
>> Laboratory of Molecular Biology of Trypanosomatides
>> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
>> http:biowebdb.org <http://www.ncbs.res.in/>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ferraria at gmail.com  Wed May 16 14:38:47 2007
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 16 May 2007 16:38:47 +0200
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
Message-ID: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>

Hi all,

I want to do something relatively simple and I want to know how far Bioperl
tools could help me because I'm having troubles to get to the point.
Here is the pipeline :

"EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
"GeneStructure"

(*) :
>From the EntrezGene ID, I want to retrieve the structure of the gene which
means having the whole genomic sequence and having the start and end
positions of each exons, introns, UTR'....

I thought of 2 ways to accomplish that :

  -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
desired positions.
     this method should work but would take a little time to be ok.

  -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
obtain a Bio::Seq object but I am not able to find any features stored in
it. So it doesn't seem that the get_Seq_by_id function get all information
contained in a EntrezGene entry (?) .

Can somebody help me to make the right choice or show me the right way?

I also saw that some packages detinated to deal with  gene structure exist
but I don't manage to know how to use it properly and even how to create one
of those objects !
Are those packages currently usable ?


Thanks in advance.
Best regards,
tony


From cjfields at uiuc.edu  Wed May 16 16:02:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 16 May 2007 11:02:28 -0500
Subject: [Bioperl-l] get regions
In-Reply-To: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com>
	<13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org>
	<1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu>
	<8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com>
	<6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu>
	<8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com>
Message-ID: <9C6F4829-4E06-4751-8B10-B2726B5288B9@uiuc.edu>


On May 16, 2007, at 3:16 AM, Steve Chervitz wrote:
...

>>
>> Right, but $& (as well as $` and $') inflict a significant penalty
>> for their use, as Aaron alludes to.  Their use, even indirectly via a
>> library module, can cause a significant performance hit.
>>
>> chris
>
> Yes. I had forgotten how poisonous $&, $` and $' were to regex
> performance. Please forgive me. We might consider regularly auditing
> the bioperl module tree for use of these in committed code.

Already done!  We have run a few audits for gotchas like that:

http://www.bioperl.org/wiki/Auditing

http://www.bioperl.org/wiki/Bioperl_Best_Practices

If there is anything we should be looking for please feel free to add  
as needed.  There shouldn't be any use of the 'naughty' variables in  
CVS, but it might be worth a second look...

> But regarding the use of the look ahead assertion, there's a problem
> if you want to find *all* occurrences of the pattern in a target
> string and the pattern can have variable length hits: it may report
> overlapping hits because it only collects the starting points of the
> match, and does not determine how long each match would be. For
> example:
>
> $gene = 'TTTAAAAAAAAGG';
> $pattern="A{5,10}";
> while ($gene =~ m/(?=$pattern)/gi) {
>     $start = pos($gene) + 1;
>     print ++$hit, " hit starts at $start\n";
> }
>
> Generates:
> 1 hit starts at 4
> 2 hit starts at 5
> 3 hit starts at 6
> 4 hit starts at 7
>
> You could get around this by imposing a constraint to avoid trivial
> overlaps. OK if you know the length of the pattern, but not so good
> for more complex patterns. If there was I way to get the look ahead to
> match the longest string possible for a variable length pattern, then
> this approach could work, but I'm not sure if that is possible.
>
> Here's a solution I think does the job of reporting the extent of each
> match without a performance hit and works for patterns of any
> complexity, taking advantage of the special arrays containing hit
> indexes, @- and @+:
>
> $gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG';
> while ($gene =~ m/$pattern/gi){
>     $hit++;
>     printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0];
> }
>
> Generates:
> 1 hit at:  4 - 11
> 2 hit at: 16 - 21
>
> You can also use this approach to report the locations of any internal
> back references, if the pattern contains any parentheses, via $-[1],
> $+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such
> patterns, but patterns not containing parens won't be penalized.
>
> Steve

Friedl's Regex book has outlined a few ways to get around the  
'naughty' variables $`, $&, and $' using substr() and $-[0], $+[0],  
or both, which makes sense since @+ and @- are arrays of positions  
instead of actual text.

$`  substr(target, 0, $-[0])
$&  substr(target, $-[0], $+[0] - $-[0])
$'  substr(target, $+[0])

Wonderful book!

chris


From benoit at ebi.ac.uk  Wed May 16 16:35:39 2007
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 16 May 2007 17:35:39 +0100
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
Message-ID: <464B32DB.6080607@ebi.ac.uk>

Hi Tony,

I don't know how simple it is in bioperl, but it is quite simple using 
the ensembl perl API.

Have a look here :

API instalation:
http://www.ensembl.org/info/software/api_installation.html
API tutorial :
http://www.ensembl.org/info/software/core/core_tutorial.html
API Perl module Documentation :
http://www.ensembl.org/info/software/Pdoc/ensembl/index.html

so you can do something similar to the example below :

# Get the 'COG6' gene from human

my $gene = $gene_adaptor->fetch_by_display_label('COG6');

print "GENE ", $gene->stable_id(), "\n";
# here you get gene coordinate

foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
     print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
     #print transcript coordinates
	
	foreach my $exon ( @{ $transcript->get_all_exons() } ) {
	#print the exon coordinates

	}
     }
}

Hope this helps

Benoit


Anthony Ferrari wrote:
 > Hi all,
 >
 > I want to do something relatively simple and I want to know how far 
Bioperl
 > tools could help me because I'm having troubles to get to the point.
 > Here is the pipeline :
 >
 > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
 > "GeneStructure"
 >
 > (*) :
 >>From the EntrezGene ID, I want to retrieve the structure of the gene 
which
 > means having the whole genomic sequence and having the start and end
 > positions of each exons, introns, UTR'....
 >
 > I thought of 2 ways to accomplish that :
 >
 >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
 > desired positions.
 >      this method should work but would take a little time to be ok.
 >
 >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
 > obtain a Bio::Seq object but I am not able to find any features stored in
 > it. So it doesn't seem that the get_Seq_by_id function get all 
information
 > contained in a EntrezGene entry (?) .
 >
 > Can somebody help me to make the right choice or show me the right way?
 >
 > I also saw that some packages detinated to deal with  gene structure 
exist
 > but I don't manage to know how to use it properly and even how to 
create one
 > of those objects !
 > Are those packages currently usable ?
 >
 >
 > Thanks in advance.
 > Best regards,
 > tony
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnsonm at gmail.com  Wed May 16 19:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 16 May 2007 14:11:18 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
Message-ID: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>

On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I believe all seqfeature location coordinates are designed to have
> start < stop for consistency; in cases where the strand matters (CDS,
> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> the two are reversed and the strand is flipped; at least that's the
> way locations are set up in BioPerl.
>
> chris

    Oh yeah?  I always tend to ensure that (start < stop), regardless
of strand, when working with sequence features...the other day, I
caught Glimmer2 emitting a prediction on the plus strand with start >
stop.  I was going to work up a patch for the parser, but I wonder,
should I just force everything to start < stop?  Or only predictions
on the plus strand?  Should all the parsers for all the ab initio
predictors ensure they emit features with coordinates like this?


From diogoat at gmail.com  Wed May 16 20:02:44 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Wed, 16 May 2007 17:02:44 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
Message-ID: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>

Dear all,

The script wich i wrote with your helps is working very good ( I paste the
script in the end of e-mail).
But I have another problem now, all the times wich I use the script im every
all the file have a diferent size...
Any ideia? what is the problem..? My conection? Problem on Ncbi? The script
maybe?

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org

#############################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Trypanosoma cruzi [Organism]',
                                -db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);
my $out = Bio::SeqIO->new(-format => 'genbank',
                          -file => '>>Trypanosoma_cruzi1.gb');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
#########################################################


From barry.moore at genetics.utah.edu  Wed May 16 21:13:27 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 16 May 2007 15:13:27 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
Message-ID: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>

Diogo,

I'd guess that this is a result of NCBI terminating the connection as  
Chris suggested previously.  There are a number of approaches you  
could use:  Download only fasta if that's all you need.  Download  
only IDs, and then use SeqHound, Batch Entrez or BioPerl to download  
those sequences or you could download the genbank files from the ftp  
site as Chris also suggested, and then run a bioperl script on each  
of those files.  I can see that you are looking at Trypanosomes, so  
doing this (on linux or  Mac OSX):

wget ftp://ftp.ncbi.nih.gov/genbank/gbinv*.seq.gz

will get you the 10 files in the invertebrate division from GenBank,  
and you could run a bioperl script  on those 10 files.

Barry

On May 16, 2007, at 2:02 PM, Diogo Tschoeke wrote:

> Dear all,
>
> The script wich i wrote with your helps is working very good ( I  
> paste the
> script in the end of e-mail).
> But I have another problem now, all the times wich I use the script  
> im every
> all the file have a diferent size...
> Any ideia? what is the problem..? My conection? Problem on Ncbi?  
> The script
> maybe?
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org
>
> #############################################################
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Trypanosoma cruzi  
> [Organism]',
>                                 -db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
> my $out = Bio::SeqIO->new(-format => 'genbank',
>                           -file => '>>Trypanosoma_cruzi1.gb');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> #########################################################
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sac at bioperl.org  Wed May 16 22:29:16 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 16 May 2007 15:29:16 -0700
Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure
In-Reply-To: <464B32DB.6080607@ebi.ac.uk>
References: <b2ec54b90705160738r475306b6u1e4ea90a7721efe1@mail.gmail.com>
	<464B32DB.6080607@ebi.ac.uk>
Message-ID: <8f200b4c0705161529h26e7c44fk54082a1156201861@mail.gmail.com>

Another option is to use DAS ( http://biodas.org ), which was designed
precisely to solve this sort of problem.

A DAS genome query is a URL that specifies the genome assembly version
on which the returned coordinates should be based. For example, get
all features and their coordinates associated with the human actin
gene on hg17:

http://das.biopackages.net/das/genome/human/17/feature?name=ACTA1

Ensembl, UCSC, and  other sites also provide DAS servers for genomic
features, but these serve up a different XML response format (DAS/1.x)
from what biopackages.net is serving (DAS/2). Here's are some links to
these servers, both DAS/1 and DAS/2:

http://www.biodas.org/wiki/DAS/1#Servers
http://www.biodas.org/wiki/DAS/2#Servers

By default, a DAS/2 server will return data in DAS2XML format, but you
can specify alternative formats if a server supports them. This is one
advantage of the DAS/2 retrieval spec, which is stable and is
described here:

http://biodas.org/documents/das2/das2_get.html

You may not be able to user an Entrez gene ID directly in the query.
It depends on whether these IDs are available on the given server.
Accessions and gene names should be OK. You can always map your Entrez
ids to accessions or gene names using this file
ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz .

Steve

On 5/16/07, Benoit Ballester <benoit at ebi.ac.uk> wrote:
> Hi Tony,
>
> I don't know how simple it is in bioperl, but it is quite simple using
> the ensembl perl API.
>
> Have a look here :
>
> API instalation:
> http://www.ensembl.org/info/software/api_installation.html
> API tutorial :
> http://www.ensembl.org/info/software/core/core_tutorial.html
> API Perl module Documentation :
> http://www.ensembl.org/info/software/Pdoc/ensembl/index.html
>
> so you can do something similar to the example below :
>
> # Get the 'COG6' gene from human
>
> my $gene = $gene_adaptor->fetch_by_display_label('COG6');
>
> print "GENE ", $gene->stable_id(), "\n";
> # here you get gene coordinate
>
> foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) {
>      print "TRANSCRIPT ", $transcript->stable_id(), "\n";;
>      #print transcript coordinates
>
>         foreach my $exon ( @{ $transcript->get_all_exons() } ) {
>         #print the exon coordinates
>
>         }
>      }
> }
>
> Hope this helps
>
> Benoit
>
>
> Anthony Ferrari wrote:
>  > Hi all,
>  >
>  > I want to do something relatively simple and I want to know how far
> Bioperl
>  > tools could help me because I'm having troubles to get to the point.
>  > Here is the pipeline :
>  >
>  > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) ----->
>  > "GeneStructure"
>  >
>  > (*) :
>  >>From the EntrezGene ID, I want to retrieve the structure of the gene
> which
>  > means having the whole genomic sequence and having the start and end
>  > positions of each exons, introns, UTR'....
>  >
>  > I thought of 2 ways to accomplish that :
>  >
>  >   -  use 'efetch', get raw xml or asn1 and then parse it to obtain the
>  > desired positions.
>  >      this method should work but would take a little time to be ok.
>  >
>  >   -  use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I
>  > obtain a Bio::Seq object but I am not able to find any features stored in
>  > it. So it doesn't seem that the get_Seq_by_id function get all
> information
>  > contained in a EntrezGene entry (?) .
>  >
>  > Can somebody help me to make the right choice or show me the right way?
>  >
>  > I also saw that some packages detinated to deal with  gene structure
> exist
>  > but I don't manage to know how to use it properly and even how to
> create one
>  > of those objects !
>  > Are those packages currently usable ?
>  >
>  >
>  > Thanks in advance.
>  > Best regards,
>  > tony
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From heikki at sanbi.ac.za  Thu May 17 06:46:44 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 17 May 2007 08:46:44 +0200
Subject: [Bioperl-l] Writing OBO fiies
Message-ID: <200705170846.44641.heikki@sanbi.ac.za>


I've started putting together Bio::OntologyIO::obo::write_ontology().
The current parser ignores a number of fields in common obo files.
If anyone knows any issues regarding adding more information into obo ontology 
object, shout now.

I need to start parsing at least "xref_analog" and "subset" to get a 
reasonable roundtrip of obo files representing cell ontology and sequence 
ontology.

I am not aiming at extending the existing ontology interfaces but simply 
patching obo parsing, but I am open to suggestions.

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bernd.web at gmail.com  Thu May 17 10:48:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 17 May 2007 12:48:07 +0200
Subject: [Bioperl-l] (Simple)Align
Message-ID: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>

Hi,

I am playing with alignment and would like to insert strings at
certain columns (so in all sequences in the alignment). I know about
the slice and remove_columns.
Is there already an insert_columns type of functionality?
Otherwise I'll just iterate over the sequences similar to
remove_columns (and give it a try to implement add_columns like
remove_columns).


Regards
Bernd


From Kevin.M.Brown at asu.edu  Thu May 17 15:17:04 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 17 May 2007 08:17:04 -0700
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B403284273@EX02.asurite.ad.asu.edu>

> I am playing with alignment and would like to insert strings 
> at certain columns (so in all sequences in the alignment). I 
> know about the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to 
> remove_columns (and give it a try to implement add_columns 
> like remove_columns).

Try reading the deobfuscator to see all the methods available to the
simplealign object.
http://bioperl.org/cgi-bin/deob_interface.cgi


From diogoat at gmail.com  Thu May 17 18:14:14 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Thu, 17 May 2007 15:14:14 -0300
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
Message-ID: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>

Hi Barry thank's for all your help,

I choose download the Invertebrates division of NCBI to machine...
but the I don't have thus script to get the sequences of the local file and
I know how to write...
i tried choose change in the script
the -db => 'nucleotide' for -db => 'local-gbdi.gb'
like I wrote below

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'Leishmania major',
                                -db     => '>local-gbdi.gb );
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

but didn't work because de Bio:DB::Query::GenBank is a perl module wich
conect at Ncbi to do my query and my Database is now local.

 I need the genomes of Trypanosoma cruzi, Trypanosoma brucei, Leishmania
major, Entamoeba and Plasmodium falciparum in the genbank format file.
Any Sugestion? Somebody have this script?
Help!
And thank's for the help!

Diogo Tschoeke
Laboratory of Molecular Biology of Trypanosomatides
Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
http://biowebdb.org


From barry.moore at genetics.utah.edu  Thu May 17 18:19:46 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 17 May 2007 12:19:46 -0600
Subject: [Bioperl-l] Downloading a sequence in genbank format - related
	problem
In-Reply-To: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com>
	<m17ir9m8hh.fsf@tuebingen.mpg.de>
	<B51242C4-06A9-4B84-947F-C15C00096D22@uiuc.edu>
	<638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com>
	<2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu>
	<638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com>
Message-ID: <F5104D8D-030D-4F01-884C-623B5F2D63CC@genetics.utah.edu>

Diogo-

Look at the bioperl documentation - there you will find a HowTo on  
SeqIO.  This will help you learn how to write scripts to load genbank  
flat files and you can then iterate over those files and check the  
organism to see if it's one that you want.  You should be able to  
find everything that you need in the documentation.

B

On May 17, 2007, at 12:14 PM, Diogo Tschoeke wrote:

> Hi Barry thank's for all your help,
>
> I choose download the Invertebrates division of NCBI to machine...
> but the I don't have thus script to get the sequences of the local  
> file and I know how to write...
> i tried choose change in the script
> the -db => 'nucleotide' for -db => 'local-gbdi.gb'
> like I wrote below
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'Leishmania major',
>                                 -db     => '>local-gbdi.gb );
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> but didn't work because de Bio:DB::Query::GenBank is a perl module  
> wich conect at Ncbi to do my query and my Database is now local.
>
>  I need the genomes of Trypanosoma cruzi, Trypanosoma brucei,  
> Leishmania major, Entamoeba and Plasmodium falciparum in the  
> genbank format file.
> Any Sugestion? Somebody have this script?
> Help!
> And thank's for the help!
>
> Diogo Tschoeke
> Laboratory of Molecular Biology of Trypanosomatides
> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil
> http://biowebdb.org


From torsten.seemann at infotech.monash.edu.au  Fri May 18 08:13:38 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 18 May 2007 18:13:38 +1000
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <46496E18.1000809@sendu.me.uk>
References: <46496E18.1000809@sendu.me.uk>
Message-ID: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>

Sendu,

> Back in August of last year I introduced Bio::PullParserI, a module that
> aids in the creation of fast SearchIO and Search modules. I've finally
> gotten around to implementing a Blast parser using the interface, which
> I've called Bio::SearchIO::blast_pull.
> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file");
> Please try it out and feed-back any bugs you discover.

This is very cool!
Here's hoping NCBI don't change the default output format too much.

You should be able to add "rpsblast -p T" support as this is identical
to "blastall -p blastp" except for first line:
BLASTP 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]

The only problem is the (rarely used) "rpsblast -p F" mode which
looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
"Frame"

 Score = 29.6 bits (65), Expect = 0.26
 Identities = 10/26 (38%), Positives = 12/26 (46%)
 Frame = -1

BUT has the same header line, so you can't know -p F was used until
you see a "Frame = ??" in a hit (what were they thinking???).

TBLASTN 2.2.16 [Mar-25-2007]
RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...

Thanks for the good work. Shame I converted most of our systems to blastxml :-(

--Torsten


From cjfields at uiuc.edu  Fri May 18 13:39:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 08:39:05 -0500
Subject: [Bioperl-l] New Blast parser
In-Reply-To: <a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
References: <46496E18.1000809@sendu.me.uk>
	<a79f6a4b0705180113q1a62706ct2ce4822ba263a649@mail.gmail.com>
Message-ID: <2219EED8-F721-4586-B029-EF6CD9C32246@uiuc.edu>

I'll be looking at cleaning up SearchIO::blastxml soon myself.  It  
needs to be more memory-friendly with large XML files and PSI-BLAST  
iterations need to be addressed (nope, I haven't forgot about that!).

There is a XML::LibXML pull parser interface (XML::LibXML::Reader) we  
could look into...

chris

On May 18, 2007, at 3:13 AM, Torsten Seemann wrote:

> Sendu,
>
>> Back in August of last year I introduced Bio::PullParserI, a  
>> module that
>> aids in the creation of fast SearchIO and Search modules. I've  
>> finally
>> gotten around to implementing a Blast parser using the interface,  
>> which
>> I've called Bio::SearchIO::blast_pull.
>> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file =>  
>> "file");
>> Please try it out and feed-back any bugs you discover.
>
> This is very cool!
> Here's hoping NCBI don't change the default output format too much.
>
> You should be able to add "rpsblast -p T" support as this is identical
> to "blastall -p blastp" except for first line:
> BLASTP 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]
>
> The only problem is the (rarely used) "rpsblast -p F" mode which
> looks/behaves like a "blastall -p tblastn", ie. has hit summaries with
> "Frame"
>
>  Score = 29.6 bits (65), Expect = 0.26
>  Identities = 10/26 (38%), Positives = 12/26 (46%)
>  Frame = -1
>
> BUT has the same header line, so you can't know -p F was used until
> you see a "Frame = ??" in a hit (what were they thinking???).
>
> TBLASTN 2.2.16 [Mar-25-2007]
> RPS-BLAST 2.2.16 [Mar-25-2007]    # should be RPS-TBLASTN perhaps...
>
> Thanks for the good work. Shame I converted most of our systems to  
> blastxml :-(
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri May 18 14:00:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 May 2007 09:00:38 -0500
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <239FDEF1-38D4-47B8-AC71-514B61BDF9E0@uiuc.edu>

Sounds great to me!  Sohel Merchant might have some ideas...

chris

On May 17, 2007, at 1:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sun May 20 00:54:11 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 20:54:11 -0400
Subject: [Bioperl-l] Writing OBO fiies
In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za>
References: <200705170846.44641.heikki@sanbi.ac.za>
Message-ID: <221DB1CF-2F4E-47D4-80A8-D8D8BD777423@gmx.net>

Sounds great to me! -hilmar

On May 17, 2007, at 2:46 AM, Heikki Lehvaslaiho wrote:

>
> I've started putting together Bio::OntologyIO::obo::write_ontology().
> The current parser ignores a number of fields in common obo files.
> If anyone knows any issues regarding adding more information into  
> obo ontology
> object, shout now.
>
> I need to start parsing at least "xref_analog" and "subset" to get a
> reasonable roundtrip of obo files representing cell ontology and  
> sequence
> ontology.
>
> I am not aiming at extending the existing ontology interfaces but  
> simply
> patching obo parsing, but I am open to suggestions.
>
> 	-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun May 20 01:36:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 21:36:49 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
Message-ID: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>

FYI. Is it worth thinking about implementing a remote access  
interface to the CIPRES tree inference tools, similar to what we have  
for RemoteBlast?

	-hilmar

Begin forwarded message:

From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
Date: May 16, 2007 6:48:49 AM EDT
Subject: FW: release of cipres portal for tree inference

The CIPRES Central Resource team is pleased to announce the first public
release of the CIPRES portal for Tree Inference.

The portal is based on capabilities exposed by the Cipres software
libraries, which were constructed as a Joint Effort between Mark Holder
at Florida State University and the SDSC SW engineering team led by
Terri Liebowitz.

It currently presents Parsimony (PAUP) and Likelihood (GARLI and RAxML)
tools with or without boosting from RecIDCM3 created by Usman Roshan and
co-workers. Nexus and Phylip files are currently supported.

The site is available to all, and is underwritten by the CIPRES cluster
at SDSC.

The portal is fully supported by the SDSC team, with contributions and
new features introduced by the team in collaboration with Mark Holder
and Rutger Vos. At present weekly releases are made with improvements
and new features.

You can visit the portal at the Cipres Web Site.

http://www.phylo.org/sub_sections/portal.htm

Please forward this information to anyone you feel may find the
portal useful.

On behalf of the whole CIPRES team,

Mark

Mark A. Miller, PhD
Principal Investigator, Biology
San Diego Supercomputer Center
University of California, San Diego
La Jolla, CA, 92093-0505
Tel: 858-822-0866
Fax: 858-822-3610

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun May 20 02:10:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 19 May 2007 21:10:53 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
Message-ID: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>

I think it would be worthwhile.  Would we place it in bioperl-run?

chris

On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:

> FYI. Is it worth thinking about implementing a remote access
> interface to the CIPRES tree inference tools, similar to what we have
> for RemoteBlast?
>
> 	-hilmar
>
> Begin forwarded message:
>
> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
> Date: May 16, 2007 6:48:49 AM EDT
> Subject: FW: release of cipres portal for tree inference
>
> The CIPRES Central Resource team is pleased to announce the first  
> public
> release of the CIPRES portal for Tree Inference.
>
> The portal is based on capabilities exposed by the Cipres software
> libraries, which were constructed as a Joint Effort between Mark  
> Holder
> at Florida State University and the SDSC SW engineering team led by
> Terri Liebowitz.
>
> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
> RAxML)
> tools with or without boosting from RecIDCM3 created by Usman  
> Roshan and
> co-workers. Nexus and Phylip files are currently supported.
>
> The site is available to all, and is underwritten by the CIPRES  
> cluster
> at SDSC.
>
> The portal is fully supported by the SDSC team, with contributions and
> new features introduced by the team in collaboration with Mark Holder
> and Rutger Vos. At present weekly releases are made with improvements
> and new features.
>
> You can visit the portal at the Cipres Web Site.
>
> http://www.phylo.org/sub_sections/portal.htm
>
> Please forward this information to anyone you feel may find the
> portal useful.
>
> On behalf of the whole CIPRES team,
>
> Mark
>
> Mark A. Miller, PhD
> Principal Investigator, Biology
> San Diego Supercomputer Center
> University of California, San Diego
> La Jolla, CA, 92093-0505
> Tel: 858-822-0866
> Fax: 858-822-3610
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sun May 20 02:19:47 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 19 May 2007 22:19:47 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
Message-ID: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>

I guess so. That's where RemoteBlast is too, if I'm not mistaken?

What sucks about the UI from a programming perspective is that it  
goes through multiple screens. There may be a lot of screen-scraping.

	-hilmar

On May 19, 2007, at 10:10 PM, Chris Fields wrote:

> I think it would be worthwhile.  Would we place it in bioperl-run?
>
> chris
>
> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>
>> FYI. Is it worth thinking about implementing a remote access
>> interface to the CIPRES tree inference tools, similar to what we have
>> for RemoteBlast?
>>
>> 	-hilmar
>>
>> Begin forwarded message:
>>
>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>> Date: May 16, 2007 6:48:49 AM EDT
>> Subject: FW: release of cipres portal for tree inference
>>
>> The CIPRES Central Resource team is pleased to announce the first  
>> public
>> release of the CIPRES portal for Tree Inference.
>>
>> The portal is based on capabilities exposed by the Cipres software
>> libraries, which were constructed as a Joint Effort between Mark  
>> Holder
>> at Florida State University and the SDSC SW engineering team led by
>> Terri Liebowitz.
>>
>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and  
>> RAxML)
>> tools with or without boosting from RecIDCM3 created by Usman  
>> Roshan and
>> co-workers. Nexus and Phylip files are currently supported.
>>
>> The site is available to all, and is underwritten by the CIPRES  
>> cluster
>> at SDSC.
>>
>> The portal is fully supported by the SDSC team, with contributions  
>> and
>> new features introduced by the team in collaboration with Mark Holder
>> and Rutger Vos. At present weekly releases are made with improvements
>> and new features.
>>
>> You can visit the portal at the Cipres Web Site.
>>
>> http://www.phylo.org/sub_sections/portal.htm
>>
>> Please forward this information to anyone you feel may find the
>> portal useful.
>>
>> On behalf of the whole CIPRES team,
>>
>> Mark
>>
>> Mark A. Miller, PhD
>> Principal Investigator, Biology
>> San Diego Supercomputer Center
>> University of California, San Diego
>> La Jolla, CA, 92093-0505
>> Tel: 858-822-0866
>> Fax: 858-822-3610
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sun May 20 05:06:53 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 19 May 2007 22:06:53 -0700
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
Message-ID: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>

technically remoteblast is in bioperl-live, but for historical/ease  
of user-install purposes (i.e. so many people want to use blast out  
of the box, we kept it in bioperl-live to not force them to install  
bioperl-run).

I think it would be great to have the interface - can we do it all  
via HTTP or will it require some installation of client software and/ 
or CORBA?

-jason
On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:

> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>
> What sucks about the UI from a programming perspective is that it
> goes through multiple screens. There may be a lot of screen-scraping.
>
> 	-hilmar
>
> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>
>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>
>> chris
>>
>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>
>>> FYI. Is it worth thinking about implementing a remote access
>>> interface to the CIPRES tree inference tools, similar to what we  
>>> have
>>> for RemoteBlast?
>>>
>>> 	-hilmar
>>>
>>> Begin forwarded message:
>>>
>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>> Date: May 16, 2007 6:48:49 AM EDT
>>> Subject: FW: release of cipres portal for tree inference
>>>
>>> The CIPRES Central Resource team is pleased to announce the first
>>> public
>>> release of the CIPRES portal for Tree Inference.
>>>
>>> The portal is based on capabilities exposed by the Cipres software
>>> libraries, which were constructed as a Joint Effort between Mark
>>> Holder
>>> at Florida State University and the SDSC SW engineering team led by
>>> Terri Liebowitz.
>>>
>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>> RAxML)
>>> tools with or without boosting from RecIDCM3 created by Usman
>>> Roshan and
>>> co-workers. Nexus and Phylip files are currently supported.
>>>
>>> The site is available to all, and is underwritten by the CIPRES
>>> cluster
>>> at SDSC.
>>>
>>> The portal is fully supported by the SDSC team, with contributions
>>> and
>>> new features introduced by the team in collaboration with Mark  
>>> Holder
>>> and Rutger Vos. At present weekly releases are made with  
>>> improvements
>>> and new features.
>>>
>>> You can visit the portal at the Cipres Web Site.
>>>
>>> http://www.phylo.org/sub_sections/portal.htm
>>>
>>> Please forward this information to anyone you feel may find the
>>> portal useful.
>>>
>>> On behalf of the whole CIPRES team,
>>>
>>> Mark
>>>
>>> Mark A. Miller, PhD
>>> Principal Investigator, Biology
>>> San Diego Supercomputer Center
>>> University of California, San Diego
>>> La Jolla, CA, 92093-0505
>>> Tel: 858-822-0866
>>> Fax: 858-822-3610
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment.p7s>

From bernd.web at gmail.com  Sun May 20 14:56:07 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 20 May 2007 16:56:07 +0200
Subject: [Bioperl-l] (Simple)Align
In-Reply-To: <C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com>
	<C2058FEA-4B28-4B6B-89C9-CA3288ADE496@bioperl.org>
Message-ID: <716af09c0705200756h46bf2134x3d6841d2a98744c0@mail.gmail.com>

Hi

I have made a simple add_columns function in SimpleAlign along the
lines of remove_columns. I only need to insert characters that are the
same for all sequences:

=head2 add_columns

 Title     : add_columns
  Usage     : $aln2 = $aln->add_columns([0, 10, '.'], [12, 15])
  Function  : Creates an alignment with columns added by specifying
the columns by number and supplying the character (optional) to insert
in all sequences. Default character is gap_char.
  Returns   : Bio::SimpleAlign object
  Args      : Array ref where the referenced array contains a pair of
integers that
             that specify a column range and optionally the character to insert.
             The first column is 0.

=cut

The functionalilty could be extended:
- possibility to supply a string to insert (for all sequences)
- possibility to define the string to insert on a per sequence basis
(although this may be more transparant to do outside SimpleAlign).

After some final checks I could supply it (e.g. via bugzilla).


Regards,
Bernd


On 5/17/07, Jason Stajich <jason at bioperl.org> wrote:
> not yet - when I did this to insert intron positions I just manipulated the
> sequence strings outside of SimpleAlign, but I think it would be nice to
> have an insert function.
>
> -jason
>
> On May 17, 2007, at 3:48 AM, Bernd Web wrote:
>
> Hi,
>
> I am playing with alignment and would like to insert strings at
> certain columns (so in all sequences in the alignment). I know about
> the slice and remove_columns.
> Is there already an insert_columns type of functionality?
> Otherwise I'll just iterate over the sequences similar to
> remove_columns (and give it a try to implement add_columns like
> remove_columns).
>
>
> Regards
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


From hlapp at gmx.net  Sun May 20 15:59:03 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 20 May 2007 11:59:03 -0400
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
Message-ID: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>

Just HTTP, no CORBA or other stuff needed client-side.

Ultimately it would of course be nice if they offered a more SOA  
compliant interface too, to obviate the screen-scraping need.  
However, if I understand the UI correctly the screen scraping is - if  
at all - only needed for walking through the steps, and for  
extracting the location of the result. The result itself is in NEXUS  
format, as a separate file.

	-hilmar

On May 20, 2007, at 1:06 AM, Jason Stajich wrote:

> technically remoteblast is in bioperl-live, but for historical/ease  
> of user-install purposes (i.e. so many people want to use blast out  
> of the box, we kept it in bioperl-live to not force them to install  
> bioperl-run).
>
> I think it would be great to have the interface - can we do it all  
> via HTTP or will it require some installation of client software  
> and/or CORBA?
>
> -jason
> On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote:
>
>> I guess so. That's where RemoteBlast is too, if I'm not mistaken?
>>
>> What sucks about the UI from a programming perspective is that it
>> goes through multiple screens. There may be a lot of screen-scraping.
>>
>> 	-hilmar
>>
>> On May 19, 2007, at 10:10 PM, Chris Fields wrote:
>>
>>> I think it would be worthwhile.  Would we place it in bioperl-run?
>>>
>>> chris
>>>
>>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote:
>>>
>>>> FYI. Is it worth thinking about implementing a remote access
>>>> interface to the CIPRES tree inference tools, similar to what we  
>>>> have
>>>> for RemoteBlast?
>>>>
>>>> 	-hilmar
>>>>
>>>> Begin forwarded message:
>>>>
>>>> From: "Vision, Todd (Biology)" <tjv at bio.unc.edu>
>>>> Date: May 16, 2007 6:48:49 AM EDT
>>>> Subject: FW: release of cipres portal for tree inference
>>>>
>>>> The CIPRES Central Resource team is pleased to announce the first
>>>> public
>>>> release of the CIPRES portal for Tree Inference.
>>>>
>>>> The portal is based on capabilities exposed by the Cipres software
>>>> libraries, which were constructed as a Joint Effort between Mark
>>>> Holder
>>>> at Florida State University and the SDSC SW engineering team led by
>>>> Terri Liebowitz.
>>>>
>>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and
>>>> RAxML)
>>>> tools with or without boosting >from RecIDCM3 created by Usman
>>>> Roshan and
>>>> co-workers. Nexus and Phylip files are currently supported.
>>>>
>>>> The site is available to all, and is underwritten by the CIPRES
>>>> cluster
>>>> at SDSC.
>>>>
>>>> The portal is fully supported by the SDSC team, with contributions
>>>> and
>>>> new features introduced by the team in collaboration with Mark  
>>>> Holder
>>>> and Rutger Vos. At present weekly releases are made with  
>>>> improvements
>>>> and new features.
>>>>
>>>> You can visit the portal at the Cipres Web Site.
>>>>
>>>> http://www.phylo.org/sub_sections/portal.htm
>>>>
>>>> Please forward this information to anyone you feel may find the
>>>> portal useful.
>>>>
>>>> On behalf of the whole CIPRES team,
>>>>
>>>> Mark
>>>>
>>>> Mark A. Miller, PhD
>>>> Principal Investigator, Biology
>>>> San Diego Supercomputer Center
>>>> University of California, San Diego
>>>> La Jolla, CA, 92093-0505
>>>> Tel: 858-822-0866
>>>> Fax: 858-822-3610
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Mon May 21 15:19:56 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 10:19:56 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
Message-ID: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>

Sounds like time to bust out WWW::Mechanize.  I didn't step through
the whole process, but the first screen/step looks okay.  Plain HTML
form with plain buttons.  Looks like the Javascript is only getting
involved for client-side sanity checking.  Should be easy to automate
(Don't look at me, I've bitten off a bit too much as it is).

On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> Just HTTP, no CORBA or other stuff needed client-side.
>
> Ultimately it would of course be nice if they offered a more SOA
> compliant interface too, to obviate the screen-scraping need.
> However, if I understand the UI correctly the screen scraping is - if
> at all - only needed for walking through the steps, and for
> extracting the location of the result. The result itself is in NEXUS
> format, as a separate file.
>
>         -hilmar


From cjfields at uiuc.edu  Mon May 21 20:11:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:11:36 -0500
Subject: [Bioperl-l] FW: release of cipres portal for tree inference
In-Reply-To: <ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu>
	<E59D2DBA-6E54-485A-948A-DECFE6C47DB8@gmx.net>
	<9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu>
	<A24FD2B8-66D7-41E7-8FD0-AB3338AB568C@gmx.net>
	<5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org>
	<AA0E9C4E-C812-4401-ABF6-3ADC815A0555@gmx.net>
	<ebf5eb170705210819x3993af3bu5fc73e5712932b22@mail.gmail.com>
Message-ID: <61E0D74B-77F7-499B-A0B7-B1E5106964E6@uiuc.edu>

It would be nice to have a generalized interface (SOAP, CGI,  
anything), as Hilmar states.  I agree WWW::Mechanize is prob. the way  
to go for now.  Don't know who wants to take it up...

chris

On May 21, 2007, at 10:19 AM, Mark Johnson wrote:

> Sounds like time to bust out WWW::Mechanize.  I didn't step through
> the whole process, but the first screen/step looks okay.  Plain HTML
> form with plain buttons.  Looks like the Javascript is only getting
> involved for client-side sanity checking.  Should be easy to automate
> (Don't look at me, I've bitten off a bit too much as it is).
>
> On 5/20/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> Just HTTP, no CORBA or other stuff needed client-side.
>>
>> Ultimately it would of course be nice if they offered a more SOA
>> compliant interface too, to obviate the screen-scraping need.
>> However, if I understand the UI correctly the screen scraping is - if
>> at all - only needed for walking through the steps, and for
>> extracting the location of the result. The result itself is in NEXUS
>> format, as a separate file.
>>
>>         -hilmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon May 21 20:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:35:41 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
Message-ID: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>

On May 16, 2007, at 2:11 PM, Mark Johnson wrote:

> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> I believe all seqfeature location coordinates are designed to have
>> start < stop for consistency; in cases where the strand matters (CDS,
>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>> the two are reversed and the strand is flipped; at least that's the
>> way locations are set up in BioPerl.
>>
>> chris
>
>     Oh yeah?  I always tend to ensure that (start < stop), regardless
> of strand, when working with sequence features...the other day, I
> caught Glimmer2 emitting a prediction on the plus strand with start >
> stop.  I was going to work up a patch for the parser, but I wonder,
> should I just force everything to start < stop?  Or only predictions
> on the plus strand?  Should all the parsers for all the ab initio
> predictors ensure they emit features with coordinates like this?

Odd that it would predict a start > stop on the plus strand, though  
it may be corrected in Glimmer3.  Does the same prediction show up in  
Glimmer3?

chris


From johnsonm at gmail.com  Mon May 21 20:48:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 15:48:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>

Check the test data for Glimmer2 and Glimmer3.  They both predict one
large gene, I'd guess covering most of the sequence, in frame +1.
That's probably a bogus prediction, but that's not up to the parser to
decide.  I hadn't noticed it until recently.

I sent a patch via bugzilla to swap the coordinates if start > end and
strand > 0.

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
> > On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> I believe all seqfeature location coordinates are designed to have
> >> start < stop for consistency; in cases where the strand matters (CDS,
> >> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
> >> the two are reversed and the strand is flipped; at least that's the
> >> way locations are set up in BioPerl.
> >>
> >> chris
> >
> >     Oh yeah?  I always tend to ensure that (start < stop), regardless
> > of strand, when working with sequence features...the other day, I
> > caught Glimmer2 emitting a prediction on the plus strand with start >
> > stop.  I was going to work up a patch for the parser, but I wonder,
> > should I just force everything to start < stop?  Or only predictions
> > on the plus strand?  Should all the parsers for all the ab initio
> > predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris
>


From cjfields at uiuc.edu  Mon May 21 20:56:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 15:56:50 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
Message-ID: <6186D928-A47E-4EED-B06A-50E25A4893CC@uiuc.edu>

On May 21, 2007, at 3:35 PM, Chris Fields wrote:

> On May 16, 2007, at 2:11 PM, Mark Johnson wrote:
>
>> On 5/8/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> I believe all seqfeature location coordinates are designed to have
>>> start < stop for consistency; in cases where the strand matters  
>>> (CDS,
>>> gene, etc.) then the strand is set to 1 or -1.  When start > stop,
>>> the two are reversed and the strand is flipped; at least that's the
>>> way locations are set up in BioPerl.
>>>
>>> chris
>>
>>     Oh yeah?  I always tend to ensure that (start < stop), regardless
>> of strand, when working with sequence features...the other day, I
>> caught Glimmer2 emitting a prediction on the plus strand with start >
>> stop.  I was going to work up a patch for the parser, but I wonder,
>> should I just force everything to start < stop?  Or only predictions
>> on the plus strand?  Should all the parsers for all the ab initio
>> predictors ensure they emit features with coordinates like this?
>
> Odd that it would predict a start > stop on the plus strand, though
> it may be corrected in Glimmer3.  Does the same prediction show up in
> Glimmer3?
>
> chris

... and I see that it does (per your bug report).  The next thing to  
ask is how often these odd Glimmer hits occur and whether others have  
seen the same thing.  Maybe there's an explanation (bug, etc) but I  
can't immediately think of anything that makes sense unless it's  
running the reverse of the + strand as a control for some reason.

chris


From cjfields at uiuc.edu  Mon May 21 21:17:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 16:17:37 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
Message-ID: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>


On May 21, 2007, at 3:48 PM, Mark Johnson wrote:

> Check the test data for Glimmer2 and Glimmer3.  They both predict one
> large gene, I'd guess covering most of the sequence, in frame +1.
> That's probably a bogus prediction, but that's not up to the parser to
> decide.  I hadn't noticed it until recently.
>
> I sent a patch via bugzilla to swap the coordinates if start > end and
> strand > 0.

I think I know what it is.  If you mean these predictions:

Glimmer2:

    27    29263        6  [+1 L= 684 r=-1.187]

Glimmer3:

orf00001    29263        9  +1     9.60

Glimmer2/3 are predicting a gene for a circular chromosome that  
starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off  
the stop codon).  Note in Glimmer2 detailed output the end is 29946  
and the length of the sequence is 29940, so Glimmer2 artificially  
extends the end of the sequence with part of the start.

This is handled as a split location in bioperl and in most GenBank  
files; the above would be a location string like 'join 
(29263..29940,1..9)'.  If you switched the start and stop the  
location would be '9..29263' which wouldn't be correct (and would be  
a huge gene).

chris


From johnsonm at gmail.com  Mon May 21 21:21:52 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 16:21:52 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
Message-ID: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>

    That makes sense.  Is that behavior documented anywhere?  I'll
feel like less of an idiot if it's not.  8)  Either way, if you're
sure that's whats going on, I'll fix up the parser to handle that as a
split location.

> I think I know what it is.  If you mean these predictions:
>
> Glimmer2:
>
>     27    29263        6  [+1 L= 684 r=-1.187]
>
> Glimmer3:
>
> orf00001    29263        9  +1     9.60
>
> Glimmer2/3 are predicting a gene for a circular chromosome that
> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> and the length of the sequence is 29940, so Glimmer2 artificially
> extends the end of the sequence with part of the start.
>
> This is handled as a split location in bioperl and in most GenBank
> files; the above would be a location string like 'join
> (29263..29940,1..9)'.  If you switched the start and stop the
> location would be '9..29263' which wouldn't be correct (and would be
> a huge gene).
>
> chris
>


From cjfields at uiuc.edu  Mon May 21 23:13:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 18:13:24 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
Message-ID: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>

glimmer2/3 both assume the genome is circular by default (I'm  
assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to  
the Glimmer3 release notes the detail file has the information in the  
header; from the Glimmer3 data used for tests:

Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA  
Glimmer3.icm Glimmer3

Sequence file = ../BCTDNA
ICM model file = Glimmer3.icm
Excluded regions file = none
List of orfs file = none
Truncated orfs = false
Circular genome = true
...

There are options available for glimmer3 (-L, -X) that specify a  
linear sequence or allow ORFs to extend past the end of the sequence  
analyzed (the latter assumes a linear sequence).

chris

On May 21, 2007, at 4:21 PM, Mark Johnson wrote:

>     That makes sense.  Is that behavior documented anywhere?  I'll
> feel like less of an idiot if it's not.  8)  Either way, if you're
> sure that's whats going on, I'll fix up the parser to handle that as a
> split location.
>
>> I think I know what it is.  If you mean these predictions:
>>
>> Glimmer2:
>>
>>     27    29263        6  [+1 L= 684 r=-1.187]
>>
>> Glimmer3:
>>
>> orf00001    29263        9  +1     9.60
>>
>> Glimmer2/3 are predicting a gene for a circular chromosome that
>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>> and the length of the sequence is 29940, so Glimmer2 artificially
>> extends the end of the sequence with part of the start.
>>
>> This is handled as a split location in bioperl and in most GenBank
>> files; the above would be a location string like 'join
>> (29263..29940,1..9)'.  If you switched the start and stop the
>> location would be '9..29263' which wouldn't be correct (and would be
>> a huge gene).
>>
>> chris
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnsonm at gmail.com  Mon May 21 23:57:03 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 21 May 2007 18:57:03 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>

    Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
this for a fix?  For plus strand predictions with start > end, use a
split location.  For minus strand predictions with start < end, use a
split location.  Without knowing the length of the sequence, that's
the best that can be done, I think.
    Unless there are objections, I'll go code that up.  Close that bug
out as 'requester is an idiot'.  8)

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:
>
> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3
>
> Sequence file = ../BCTDNA
> ICM model file = Glimmer3.icm
> Excluded regions file = none
> List of orfs file = none
> Truncated orfs = false
> Circular genome = true
> ...
>
> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).
>
> chris
>
> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>
> >     That makes sense.  Is that behavior documented anywhere?  I'll
> > feel like less of an idiot if it's not.  8)  Either way, if you're
> > sure that's whats going on, I'll fix up the parser to handle that as a
> > split location.
> >
> >> I think I know what it is.  If you mean these predictions:
> >>
> >> Glimmer2:
> >>
> >>     27    29263        6  [+1 L= 684 r=-1.187]
> >>
> >> Glimmer3:
> >>
> >> orf00001    29263        9  +1     9.60
> >>
> >> Glimmer2/3 are predicting a gene for a circular chromosome that
> >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> >> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> >> and the length of the sequence is 29940, so Glimmer2 artificially
> >> extends the end of the sequence with part of the start.
> >>
> >> This is handled as a split location in bioperl and in most GenBank
> >> files; the above would be a location string like 'join
> >> (29263..29940,1..9)'.  If you switched the start and stop the
> >> location would be '9..29263' which wouldn't be correct (and would be
> >> a huge gene).
> >>
> >> chris
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From torsten.seemann at infotech.monash.edu.au  Tue May 22 00:29:47 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 22 May 2007 10:29:47 +1000
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
Message-ID: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>

> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:

You beat me to the reply Chris - yes, Glimmer2/3 assume circular
chromosome by default. I had forgotten about this in earlier
discussions of the new Glimmer parsers as I normally run it in
--linear / -L mode (even if I know it is circular) because it is
easier to handle, and our sequencer/assembler team usually gets the
origin of replication right.

> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3

I did a double-take here - that's the path to my Glimmer3
installation! It took me a couple of minutes to realise that you got
it from the bioperl test data I created. D'oh! :-)

> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).

If the -L mode should produce Bio::Location::Split objects, I guess if
-X is used
it should produce Bio::Location::Fuzzy objects too...

--Torsten


From cjfields at uiuc.edu  Tue May 22 00:59:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 19:59:20 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
Message-ID: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>

You can add the necessary patch to the bug report when it's ready; no  
need to close it out.

The most complete file format to parse seems to be the details file;  
it contains the sequence length:

 >BCTDNA
Sequence length = 29940

which can be used for the split location.  As Torsten points out, use  
of -X could also potentially produce fuzzy locations.

Since the parser currently only parses predict files, you could  
optionally supply the parser with the seq length and emit a warning  
if seqfeatures requiring it are produced, such as the sporadic ones  
which wrap around.  If one were using the bioperl-run module this  
could be automated a bit by passing the seq length in to the parser  
object by adding the seq length to the constructor argument list.

chris

On May 21, 2007, at 6:57 PM, Mark Johnson wrote:

>     Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
> this for a fix?  For plus strand predictions with start > end, use a
> split location.  For minus strand predictions with start < end, use a
> split location.  Without knowing the length of the sequence, that's
> the best that can be done, I think.
>     Unless there are objections, I'll go code that up.  Close that bug
> out as 'requester is an idiot'.  8)
>
> On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>>
>> Sequence file = ../BCTDNA
>> ICM model file = Glimmer3.icm
>> Excluded regions file = none
>> List of orfs file = none
>> Truncated orfs = false
>> Circular genome = true
>> ...
>>
>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>>
>> chris
>>
>> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>>
>>>     That makes sense.  Is that behavior documented anywhere?  I'll
>>> feel like less of an idiot if it's not.  8)  Either way, if you're
>>> sure that's whats going on, I'll fix up the parser to handle that  
>>> as a
>>> split location.
>>>
>>>> I think I know what it is.  If you mean these predictions:
>>>>
>>>> Glimmer2:
>>>>
>>>>     27    29263        6  [+1 L= 684 r=-1.187]
>>>>
>>>> Glimmer3:
>>>>
>>>> orf00001    29263        9  +1     9.60
>>>>
>>>> Glimmer2/3 are predicting a gene for a circular chromosome that
>>>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>>>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>>>> and the length of the sequence is 29940, so Glimmer2 artificially
>>>> extends the end of the sequence with part of the start.
>>>>
>>>> This is handled as a split location in bioperl and in most GenBank
>>>> files; the above would be a location string like 'join
>>>> (29263..29940,1..9)'.  If you switched the start and stop the
>>>> location would be '9..29263' which wouldn't be correct (and  
>>>> would be
>>>> a huge gene).
>>>>
>>>> chris
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue May 22 01:00:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 May 2007 20:00:58 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<a79f6a4b0705211729j3ff17d60v610fab7f5e135303@mail.gmail.com>
Message-ID: <E22A8442-E00D-4732-9D80-EE61C75732B7@uiuc.edu>


On May 21, 2007, at 7:29 PM, Torsten Seemann wrote:

>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>
> You beat me to the reply Chris - yes, Glimmer2/3 assume circular
> chromosome by default. I had forgotten about this in earlier
> discussions of the new Glimmer parsers as I normally run it in
> --linear / -L mode (even if I know it is circular) because it is
> easier to handle, and our sequencer/assembler team usually gets the
> origin of replication right.
>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>
> I did a double-take here - that's the path to my Glimmer3
> installation! It took me a couple of minutes to realise that you got
> it from the bioperl test data I created. D'oh! :-)

Yep, I forgot about that!

>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>
> If the -L mode should produce Bio::Location::Split objects, I guess if
> -X is used
> it should produce Bio::Location::Fuzzy objects too...
>
> --Torsten

True, didn't think about that one.  Def. something to consider adding  
in.

chris


From johnsonm at gmail.com  Tue May 22 18:04:31 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 22 May 2007 13:04:31 -0500
Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap
	start and stop coordinates??
In-Reply-To: <A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
References: <CED81D34E37D5043A1211565277A51E507E23161@exchkc02.stowers-institute.org>
	<79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu>
	<ebf5eb170705161211m6fb570b5r86ee055299993172@mail.gmail.com>
	<B012903E-7C0F-4E34-9BFE-E551855B6C62@uiuc.edu>
	<ebf5eb170705211348w57c37f18oeb128656c446cff@mail.gmail.com>
	<62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu>
	<ebf5eb170705211421w244933fcu4db8ba748653c090@mail.gmail.com>
	<9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu>
	<ebf5eb170705211657j233dc7efs88a8a22e0597c235@mail.gmail.com>
	<A81C4C78-798B-4D5A-B542-A526ABA563E4@uiuc.edu>
Message-ID: <ebf5eb170705221104s486ff488u1d8c0b87dd193861@mail.gmail.com>

Yes, Glimmer3 outputs the length of the input sequence.  I don't
believe Glimmer2 does.

> The most complete file format to parse seems to be the details file;
> it contains the sequence length:
>
>  >BCTDNA
> Sequence length = 29940

> Since the parser currently only parses predict files, you could
> optionally supply the parser with the seq length and emit a warning
> if seqfeatures requiring it are produced, such as the sporadic ones
> which wrap around.  If one were using the bioperl-run module this
> could be automated a bit by passing the seq length in to the parser
> object by adding the seq length to the constructor argument list.

I think we can spot wrap-around genes easily enough without knowing
the length of the input sequence.  Having it just means we can perform
a sanity check or two, such as making sure 'wraparound' genes are
within N bases of the end of the input sequence.  Any suggestions on a
good default value for N?

Parsing both output files for glimmer3 will be a little tricky.  The
constructor for Bio::Tools::Glimmer calls $class->SUPER::new(@args);,
which hits the constructor for Bio::Tools::AnalysisResult, which does
the same thing.  It all ends up in Bio::Root::IO::_initialize_io,
which grabs the -file arg and opens it.  So, either let, Bio::Root::IO
handle -file and have Bio::Tools::Glimmer handle, say -detail file, or
have Bio::Tools::Glimmer just implement   intialize_io() and hopefully
that will fly..


From ClarkeW at AGR.GC.CA  Tue May 22 21:10:08 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Tue, 22 May 2007 15:10:08 -0600
Subject: [Bioperl-l] TextResultWriter
Message-ID: <C278B850.1002%ClarkeW@AGR.GC.CA>

Hi, 

I am interested in becoming a bioperl developer as I have recently found a
bug in TextResultWriter. I know that I should submit the bug fixes using the
protocol outlined in the How To but I haven't been able to login to the CVS
anonymously to check it out. However, I have checked that the bug still
exists in the most recent version of the code using the web interface to the
CVS repositories. The bug is between lines 433 and 443, and deals with the
reporting of the number of letters in the database and the number of entries
in the database. My fix would be to change the existing code block:

from:

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_entries()),
        &_numwithcommas($result->database_letters()),
        $result->get_parameter('matrix') || '');

to: 

    Number of letters in database: %s
    Number of sequences in database: %s

Matrix: %s
},
        $result->database_name(),
        $result->get_statistic('posted_date') ||
        POSIX::strftime("%b %d, %Y %I:%M %p",localtime),
        &_numwithcommas($result->database_letters()),
        &_numwithcommas($result->database_entries()),
        $result->get_parameter('matrix') || '');

I believe that this is a simple enough modification that it does not require
any new test cases.

Cheers, Wayne


From dmessina at wustl.edu  Wed May 23 06:06:52 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 23 May 2007 01:06:52 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C278B850.1002%ClarkeW@AGR.GC.CA>
References: <C278B850.1002%ClarkeW@AGR.GC.CA>
Message-ID: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>

Hi Wayne,

I submitted the bug report on your behalf

	http://bugzilla.open-bio.org/show_bug.cgi?id=2300

and committed your patch. Thanks for reporting this, and thanks even  
more for including a patch!

Regarding your trouble checking out the repository via anonymous CVS,  
could you post the transcript of your attempt so we can get a better  
look at what's going wrong?

Dave


From ClarkeW at AGR.GC.CA  Wed May 23 14:39:17 2007
From: ClarkeW at AGR.GC.CA (ClarkeW)
Date: Wed, 23 May 2007 08:39:17 -0600
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu>
Message-ID: <C279AE35.1008%ClarkeW@AGR.GC.CA>

With regards to not being able to connect, I have discovered that the reason
I cannot connect is that our firewall is blocking my access. It appears that
I am not the first person to have this problem but that the people in charge
are firm in their position to block the anonymous access port. However, if I
obtain a developer account I will be able to access the CVS.

Cheers, Wayne


On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:

> Hi Wayne,
> 
> I submitted the bug report on your behalf
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
> 
> and committed your patch. Thanks for reporting this, and thanks even
> more for including a patch!
> 
> Regarding your trouble checking out the repository via anonymous CVS,
> could you post the transcript of your attempt so we can get a better
> look at what's going wrong?
> 
> Dave
> 
> 


From cjfields at uiuc.edu  Wed May 23 16:16:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 23 May 2007 11:16:32 -0500
Subject: [Bioperl-l] TextResultWriter
In-Reply-To: <C279AE35.1008%ClarkeW@AGR.GC.CA>
References: <C279AE35.1008%ClarkeW@AGR.GC.CA>
Message-ID: <7077B4AB-A3B5-4EAE-9994-0EF629D2DE2B@uiuc.edu>

You can always use the browsable CVS link to download a tarball if  
that works for you.

http://www.bioperl.org/wiki/Using_CVS
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/? 
cvsroot=bioperl

The link to download is at the bottom of the page.

chris

On May 23, 2007, at 9:39 AM, ClarkeW wrote:

> With regards to not being able to connect, I have discovered that  
> the reason
> I cannot connect is that our firewall is blocking my access. It  
> appears that
> I am not the first person to have this problem but that the people  
> in charge
> are firm in their position to block the anonymous access port.  
> However, if I
> obtain a developer account I will be able to access the CVS.
>
> Cheers, Wayne
>
>
> On 5/23/07 12:06 AM, "David Messina" <dmessina at wustl.edu> wrote:
>
>> Hi Wayne,
>>
>> I submitted the bug report on your behalf
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2300
>>
>> and committed your patch. Thanks for reporting this, and thanks even
>> more for including a patch!
>>
>> Regarding your trouble checking out the repository via anonymous CVS,
>> could you post the transcript of your attempt so we can get a better
>> look at what's going wrong?
>>
>> Dave
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Xianjun.Dong at bccs.uib.no  Tue May 29 11:57:39 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 13:57:39 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
Message-ID: <465C1533.6070900@ii.uib.no>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kaks_methods.pl
Type: application/x-perl
Size: 2732 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment.pl>

From avilella at gmail.com  Tue May 29 13:02:44 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:02:44 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>

codeml in PAML can give different results in cases where the optimization
reaches different local maxima depending on the different starting points of
each run (seed values). So, at least for some methods and options, this
instability is inherent to the underlying algorithm.

Even more, for some methods and options, it is even recommended in PAML
documentation to run the same data more than once, to see if the results are
the same, which would be a good indication that the model is robust given
the data.

Maybe PAML's author can give a more specific answer for your data at:

http://www.rannala.org/gsf/viewforum.php?f=1

Cheers,

    Albert.

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
>
> I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> for my sequences. Could you help me?
>
> 1.  Codeml could produce different Ka/Ks value if I run it twice. I check
> it both in command line and in Perl wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml;
>
> The input sequences are:
> >seq1
> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> >seq2
>
> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
>
> For command-line program, I used Codeml in PAML3.14, with specifications
> in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program four
> times.  The output are like below (from the output file). We could see that
> they are different from each other. they should be same or slightly
> different. Right? But they are NOT.  Weird!
>
> ----------------------------------------------------------------------------------------------------------------------------------
> t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
>
> ----------------------------------------------------------------------------------------------------------------------------------
> I found the same problem when I use the Perl Wrapper of
> Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> similar to the one in BioPerl HOWTO).
>
> 2. Another strange thing is, if I switch to use program YN00 in the
> package of PAML, the output are stable. However, it's much different from
> Codeml. (see below)
>
> ----------------------------------------------------------------------------------------------------------------------------------
> seq. seq.     S       N        t   kappa    omega      dN +- SE
> dS +- SE
>    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> 2.1300 +- 1.2272
>
> ----------------------------------------------------------------------------------------------------------------------------------
> Why like this? Which one I should believe?
>
>
> Is there any guy who would kindly help me to run the perl script (twice to
> check whether they are different)? or help to run the codeml in command
> line?
> I don't know whether there is anyone noticed this before, or because of
> the wrong version of PAML.
>
> Regards,
>
> Xianjun
>
>
>
> Himanshu Ardawatia wrote:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
>
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::Tools::Run::Alignment::Clustalw;
>
> # for projecting alignments from protein to R/DNA space
> use Bio::Align::Utilities qw(aa_to_dna_aln);
>
> # for input of the sequence data
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>
> #my $seqdata = 'chuck.fa';
> my $seqdata = 'xianjun.fa ';
>
> my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>                            -format => 'fasta');
> my %seqs;
> my @prots;
>
> my $output;
> # process each sequence
> while( my $seq = $seqIO->next_seq ) {
>     $seqs{$seq->display_id} = $seq;
>     # translate them into protein
>     my $protein = $seq->translate();
>     my $pseq = $protein->seq();
>     if( $pseq =~ /\*/ &&
>     $pseq !~ /\*$/ ) {
>     warn("provided a cDNA sequence with a stop codon, PAML will choke!");
>     exit(0);
>     }
>     # Tcoffee can't handle '*' even if it is trailing
>     $pseq =~ s/\*//g;
>     $protein->seq($pseq);
>     push @prots, $protein;
> }
>
> if( @prots < 2 ) {
>     warn("Need at least 2 cDNA sequences to proceed");
>     exit(0);
> }
>
> open(OUT, ">align_output.txt") ||
>       die("cannot open output $output for writing");
> # Align the sequences with clustalw
>
> my $aa_aln = $aln_factory->align(\@prots);
>
> # project the protein alignment back to cDNA coordinates
> my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>
> my @each = $dna_aln->each_seq();
>
> my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>                   ( -params => { 'runmode' => -2,
>                          'seqtype' => 1,
>                  'model' => 1,
>                 }
>               );
>
> # set the alignment object
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map {
>     my $c= 1;
>     foreach my $s ( @each ) {
>     last if( $s->display_id eq $_->display_id );
>     $c++;
>     }
>     $c;
> } @otus;
>
> print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> CDNA_PERCENTID)), "\n";
> for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print OUT join("\t",
>                $otus[$i]->display_id,
>                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                $MLmatrix->[$i]->[$j]->{'dS'},
>                $MLmatrix->[$i]->[$j]->{'omega'},
>                sprintf("%.2f",$sub_aa_aln->percentage_identity),
>                sprintf("%.2f",$sub_dna_aln->percentage_identity),
>                ), "\n";
>     }
> }
>
>
> On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> >
> > Hi Xianjun,
> >
> > I recognize this script. But it was a bit cumbersom to use this as many
> > things are done in the script (like multiple alignment, aa to dna alignment
> > and ka/ks calculation) so one does not have real control on these different
> > aspect.
> > I do not remeber getting different Ka/Ks in different runs though. But I
> > remeber that one I ran the script with different versions of clustalw and it
> > REALLY gave different results !! So please make sure if the clustalw
> > versions are the same in all your runs. Best is to use the latest version.
> >
> > Finally I wrote my simple script which would generate a codeml.ctl file
> > for each set of sequences and run the codeml based on that and then more on.
> > Disadvantage of this can be that some files keep getting over-written (like
> > the one which have their names hard-coded in codeml program) and if one
> > needs those files as well then one needs to run the codeml cycles for each
> > set of sequences in different directories.
> >
> > One advantage of this kind of script is that you can use whichever
> > alignment program you want to use and so on....But then its also extra steps
> > of yourself doing multiple alignment and aa to dna alignment etc....
> >
> > Does it make sense? If you still get different outputs with same version
> > of clustalw then I can sit with you and look at things together. Or else try
> > the script method which I mentioned.
> >
> > Cheers  and Fu
> > Himanshu
> > \\
> > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > >
> > > HI, Himanshu
> > >
> > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > question
> > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > not
> > > stable(different for each runtime), and also different from the output
> > >
> > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > >
> > > Here I attached the script. Could you help to have a look and try to
> > > run
> > > the script? How is your way to calculate the Kaks ratio?
> > >
> > > Thanks
> > >
> > > --
> > > ---------------------------
> > > Sterding (Xianjun) Dong
> > > PhD student, Boris Lenhard's group
> > > Bergen Center of Computational Science
> > > Bergen University, Norway
> > > Mobile: 0047-47361688
> > > Telephone: 0047-55276381
> > > Skype: xianjun.dong
> > >
> > >
> > >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From Xianjun.Dong at bccs.uib.no  Tue May 29 13:30:09 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 15:30:09 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
References: <465AD6E8.3030707@ii.uib.no>	
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>	
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
Message-ID: <465C2AE1.30101@ii.uib.no>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/532a333d/attachment-0004.html>

From avilella at gmail.com  Tue May 29 13:45:28 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 29 May 2007 14:45:28 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2AE1.30101@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>
	<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>
	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no>
Message-ID: <358f4d650705290645s65f596cbp37715f12064a5ced@mail.gmail.com>

On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
>
>  Thanks for information, Albert.
>
> But still in two questions:
> Albert Vilella wrote:
>
> codeml in PAML can give different results in cases where the optimization
> reaches different local maxima depending on the different starting points of
> each run (seed values). So, at least for some methods and options, this
> instability is inherent to the underlying algorithm.
>
> 1. How to set the initial value in order to get a reasonable estimation?
> Do you have some experience for that?
>

People usually change the initial omega in the conf. For example, 3 runs
with 0.001, 1 and 5.

Even more, for some methods and options, it is even recommended in PAML
> documentation to run the same data more than once, to see if the results are
> the same, which would be a good indication that the model is robust given
> the data.
>
> 2. Is there a recommend way to test the significance if the results are
> different? For example, in my case, dS could range from 10.1852 to 14.9961for the four runtime. If that means the model is not robust(how to check
> this?), should I change to use another model?
>

I would prefer PAML's author to answer this question :)

How could YN00 reach stable result? (Is it because YN00 does not require
> initial value for optimization?) Why could YN00 produce so different result
> from Codeml? (for YN00, dS=2.1300 with SE=1.2272; for Codeml, dS=
> 10.1852-14.9961)
>

I think Yn00 is less prone to give different local maxima than some codeml
models, but then, codeml is better in giving true positives in cases where
yn00 will give false negatives...

Maybe PAML's author can give a more specific answer for your data at:
> http://www.rannala.org/gsf/viewforum.php?f=1
>
>
> Actually I already post my question in the author's forum. Let's wait and
> see.
>

Yes, I would wait for his answers, which should be way more reliable than
mine :)

Cheers,
>
>     Albert.
>
> On 5/29/07, Dong Xianjun <Xianjun.Dong at bccs.uib.no> wrote:
> >
> > HI, dear all, //sorry for duplicated msg for *Jason* and *Neil*
> >
> > I'm bothering by two problems when I use PAML module to calculate Ka/Ks
> > for my sequences. Could you help me?
> >
> > 1.  Codeml could produce different Ka/Ks value if I run it twice. I
> > check it both in command line and in Perl wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml;
> >
> > The input sequences are:
> > >seq1
> > TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG
> > >seq2
> >
> > TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG
> >
> > For command-line program, I used Codeml in PAML3.14, with specifications
> > in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program
> > four times.  The output are like below (from the output file). We could see
> > that they are different from each other. they should be same or slightly
> > different. Right? But they are NOT.  Weird!
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522  dS=14.8339
> > t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507  dS=12.2349
> > t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510  dS=14.9961
> > t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505  dS=10.1852
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > I found the same problem when I use the Perl Wrapper of
> > Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,
> > similar to the one in BioPerl HOWTO).
> >
> > 2. Another strange thing is, if I switch to use program YN00 in the
> > package of PAML, the output are stable. However, it's much different from
> > Codeml. (see below)
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > seq. seq.     S       N        t   kappa    omega      dN +- SE
> > dS +- SE
> >    2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +- 0.0265
> > 2.1300 +- 1.2272
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> > Why like this? Which one I should believe?
> >
> >
> > Is there any guy who would kindly help me to run the perl script (twice
> > to check whether they are different)? or help to run the codeml in command
> > line?
> > I don't know whether there is anyone noticed this before, or because of
> > the wrong version of PAML.
> >
> > Regards,
> >
> > Xianjun
> >
> >
> >
> > Himanshu Ardawatia wrote:
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> >
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::Tools::Run::Alignment::Clustalw;
> >
> > # for projecting alignments from protein to R/DNA space
> > use Bio::Align::Utilities qw(aa_to_dna_aln);
> >
> > # for input of the sequence data
> > use Bio::SeqIO;
> > use Bio::AlignIO;
> >
> > my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
> >
> > #my $seqdata = 'chuck.fa';
> > my $seqdata = 'xianjun.fa ';
> >
> > my $seqIO = new Bio::SeqIO(-file   => $seqdata,
> >                            -format => 'fasta');
> > my %seqs;
> > my @prots;
> >
> > my $output;
> > # process each sequence
> > while( my $seq = $seqIO->next_seq ) {
> >     $seqs{$seq->display_id} = $seq;
> >     # translate them into protein
> >     my $protein = $seq->translate();
> >     my $pseq = $protein->seq();
> >     if( $pseq =~ /\*/ &&
> >     $pseq !~ /\*$/ ) {
> >     warn("provided a cDNA sequence with a stop codon, PAML will
> > choke!");
> >     exit(0);
> >     }
> >     # Tcoffee can't handle '*' even if it is trailing
> >     $pseq =~ s/\*//g;
> >     $protein->seq($pseq);
> >     push @prots, $protein;
> > }
> >
> > if( @prots < 2 ) {
> >     warn("Need at least 2 cDNA sequences to proceed");
> >     exit(0);
> > }
> >
> > open(OUT, ">align_output.txt") ||
> >       die("cannot open output $output for writing");
> > # Align the sequences with clustalw
> >
> > my $aa_aln = $aln_factory->align(\@prots);
> >
> > # project the protein alignment back to cDNA coordinates
> > my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
> >
> > my @each = $dna_aln->each_seq();
> >
> > my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
> >                   ( -params => { 'runmode' => -2,
> >                          'seqtype' => 1,
> >                  'model' => 1,
> >                 }
> >               );
> >
> > # set the alignment object
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >     last if( $s->display_id eq $_->display_id );
> >     $c++;
> >     }
> >     $c;
> > } @otus;
> >
> > print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
> > CDNA_PERCENTID)), "\n";
> > for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
> >     for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print OUT join("\t",
> >                $otus[$i]->display_id,
> >                $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                $MLmatrix->[$i]->[$j]->{'dS'},
> >                $MLmatrix->[$i]->[$j]->{'omega'},
> >                sprintf("%.2f",$sub_aa_aln->percentage_identity),
> >                sprintf("%.2f",$sub_dna_aln->percentage_identity),
> >                ), "\n";
> >     }
> > }
> >
> >
> > On 5/29/07, Himanshu Ardawatia <himanshu.ardawatia at bccs.uib.no > wrote:
> > >
> > > Hi Xianjun,
> > >
> > > I recognize this script. But it was a bit cumbersom to use this as
> > > many things are done in the script (like multiple alignment, aa to dna
> > > alignment and ka/ks calculation) so one does not have real control on these
> > > different aspect.
> > > I do not remeber getting different Ka/Ks in different runs though. But
> > > I remeber that one I ran the script with different versions of clustalw and
> > > it REALLY gave different results !! So please make sure if the clustalw
> > > versions are the same in all your runs. Best is to use the latest version.
> > >
> > > Finally I wrote my simple script which would generate a codeml.ctlfile for each set of sequences and run the codeml based on that and then
> > > more on. Disadvantage of this can be that some files keep getting
> > > over-written (like the one which have their names hard-coded in codeml
> > > program) and if one needs those files as well then one needs to run the
> > > codeml cycles for each set of sequences in different directories.
> > >
> > > One advantage of this kind of script is that you can use whichever
> > > alignment program you want to use and so on....But then its also extra steps
> > > of yourself doing multiple alignment and aa to dna alignment etc....
> > >
> > > Does it make sense? If you still get different outputs with same
> > > version of clustalw then I can sit with you and look at things together. Or
> > > else try the script method which I mentioned.
> > >
> > > Cheers  and Fu
> > > Himanshu
> > > \\
> > > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote:
> > > >
> > > > HI, Himanshu
> > > >
> > > > I am sure you did some work in Ka/Ks calculation. Here I have a
> > > > question
> > > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is
> > > > not
> > > > stable(different for each runtime), and also different from the
> > > > output
> > > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
> > > >
> > > > Here I attached the script. Could you help to have a look and try to
> > > > run
> > > > the script? How is your way to calculate the Kaks ratio?
> > > >
> > > > Thanks
> > > >
> > > > --
> > > > ---------------------------
> > > > Sterding (Xianjun) Dong
> > > > PhD student, Boris Lenhard's group
> > > > Bergen Center of Computational Science
> > > > Bergen University, Norway
> > > > Mobile: 0047-47361688
> > > > Telephone: 0047-55276381
> > > > Skype: xianjun.dong
> > > >
> > > >
> > > >
> > >
> >
> > --
> > ---------------------------
> > Sterding (Xianjun) Dong
> > PhD student, Boris Lenhard's group
> > Bergen Center of Computational Science
> > Bergen University, Norway
> > Mobile: 0047-47361688
> > Telephone: 0047-55276381
> >
> > Skype: xianjun.dong
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> --
> ---------------------------
> Sterding (Xianjun) Dong
> PhD student, Boris Lenhard's group
> Bergen Center of Computational Science
> Bergen University, Norway
> Mobile: 0047-47361688
> Telephone: 0047-55276381
> Skype: xianjun.dong
>
>


From roy at colibase.bham.ac.uk  Tue May 29 14:05:12 2007
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Tue, 29 May 2007 15:05:12 +0100
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C1533.6070900@ii.uib.no>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>	<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>
	<465C1533.6070900@ii.uib.no>
Message-ID: <465C3318.5080201@colibase.bham.ac.uk>

Hi Xianjun,

I'm not sure if it is the cause of your problem, but your sequences seem
to be quite short. This paper:
http://mbe.oxfordjournals.org/cgi/content/full/21/12/2290

suggests that the codeml method of calculating Ka and Ks may be
unreliable for sequences shorter than 300 codons.

Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.


From gbr0wn at comcast.net  Wed May 30 15:44:13 2007
From: gbr0wn at comcast.net (gbr0wn at comcast.net)
Date: Wed, 30 May 2007 15:44:13 +0000
Subject: [Bioperl-l] getting started in windows
Message-ID: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070530/2f640e16/attachment.ksh>

From golharam at umdnj.edu  Wed May 30 15:40:28 2007
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 30 May 2007 11:40:28 -0400
Subject: [Bioperl-l] ClustalW Score?
Message-ID: <00c201c7a2d0$d971f550$2d01a8c0@PICO>

How do I get the clustalw score from a clustalw alignment?  I'm using the
following code to align my sequences:

$aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();

$seq[0] = ...
$seq[1] = ...
$seq[2] = ...
$seq[3] = ...

$aln = $aln_factory->align(\@seq);

I can get the percentage identity from the Bio::SimpleAlign object, but
there is no score.  I looked into it further and it doesn't look like the
score is being captured anywhere.  So, how does one get the score from
ClustalW using this method?

Ryan


From barry.moore at genetics.utah.edu  Wed May 30 16:21:16 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 30 May 2007 10:21:16 -0600
Subject: [Bioperl-l] getting started in windows
In-Reply-To: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
References: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net>
Message-ID: <CA090066-0624-4C52-8306-E783278484B0@genetics.utah.edu>

Try opening up a terminal window (I think you'll find that under  
accessories).  Change to the directory where you code is and run it  
off the command line.

B

On May 30, 2007, at 9:44 AM, gbr0wn at comcast.net wrote:

> I am a perl novice trying to run perl 5.8.8 on windows xp system.   
> I have used 'wordpad' to paste tutorial code into an executable  
> file and when I double click the icon for the file a window opens  
> up briefly with output and/or error message but closes too fast for  
> me to read.  Any idea why this might be happening?
> Thanks, Greg Brown - gbr0wn at comcast.net
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Kevin.M.Brown at asu.edu  Wed May 30 17:16:49 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 10:16:49 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
References: <00c201c7a2d0$d971f550$2d01a8c0@PICO>
Message-ID: <1A4207F8295607498283FE9E93B775B403349DAB@EX02.asurite.ad.asu.edu>

> How do I get the clustalw score from a clustalw alignment?  
> I'm using the following code to align my sequences:
> 
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
> 
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
> 
> $aln = $aln_factory->align(\@seq);
> 
> I can get the percentage identity from the Bio::SimpleAlign 
> object, but there is no score.  I looked into it further and 
> it doesn't look like the score is being captured anywhere.  
> So, how does one get the score from ClustalW using this method?


        open(OUTCOPY, ">&STDOUT")  or die "Couldn't dup STDOUT: $!";
        open(STDOUT,  ">log.test") or die "Couldn't open log.test: $!";
        push @aln, $factory->align(\@seq);
        close STDOUT;
        open(STDOUT, ">&OUTCOPY");
        open(TEMP,   "log.test");
        while (<TEMP>)
        {

                if ($_ =~ /Score:(\d+)/)
                {
                        $aln->score($1);
                        print "Found score of $1\n";
                }
        }
        close TEMP;
        unlink("log.test");


From jason at bioperl.org  Wed May 30 18:54:20 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 May 2007 11:54:20 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
Message-ID: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>

You can do it without redirecting STDOUT or creating a new file, just  
change the system call to:

Here is the code for running in _run in the module:
    my $commandstring = $self->executable."$instring"."$param_string";
     $self->debug( "clustal command = $commandstring");
     my $status = system($commandstring);
      unless( $status == 0 ) {
           $self->warn( "Clustalw call ($commandstring) crashed: $?  
\n");
           return undef;
      }

Do something like:

my $fh;
open($fh, "$commandstring |");
my $score;
while(<$fh>) {
   $score = $1 if ($_ =~ /Score:(\d+)/);
}
close($fh);

... then at the bottom after the alignment is created do:

$aln->score($score);


There may be some more debugging b/c if you invoke the quiet => 1  
parameter there may be an automatic ">& /dev/null" appended to the  
end of the parameter string that you'll need to figure out how to  
override.

Sorry I don't have more time to help; I hope this gets you started.

-jason
On May 30, 2007, at 10:18 AM, Ryan Golhar wrote:

> Did you see Kevin's response?  That's one possible solution that  
> could be
> implemented...
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of  
> Jason
> Stajich
> Sent: Wednesday, May 30, 2007 12:05 PM
> To: golharam at umdnj.edu
> Subject: Re: [Bioperl-l] ClustalW Score?
>
>
> Nope it isn't parsed since it is part of the STDOUT from the  
> program not the
> alignment.  If you want to add parsing of the STDOUT from Clustalw  
> someone
> will need to refactor how the program is run and capture and parse the
> STDOUT. The score can be added to the score field of the  
> SimpleAlign object,
> but again since there is no where for it to be stored in a clustalw
> alignment file it won't be round tripped anywhere. I think  
> stockholm will
> manage it for you though.
>
> Do you know what the score represents - can it be computed from the
> alignment itsself?
>
> -jason
>
> On May 30, 2007, at 8:40 AM, Ryan Golhar wrote:
>
>
> How do I get the clustalw score from a clustalw alignment?  I'm  
> using the
> following code to align my sequences:
>
> $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new();
>
> $seq[0] = ...
> $seq[1] = ...
> $seq[2] = ...
> $seq[3] = ...
>
> $aln = $aln_factory->align(\@seq);
>
> I can get the percentage identity from the Bio::SimpleAlign object,  
> but
> there is no score.  I looked into it further and it doesn't look  
> like the
> score is being captured anywhere.  So, how does one get the score from
> ClustalW using this method?
>
> Ryan
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Wed May 30 19:52:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 30 May 2007 12:52:01 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403349E4D@EX02.asurite.ad.asu.edu>

> You can do it without redirecting STDOUT or creating a new 
> file, just change the system call to:
> 
> Here is the code for running in _run in the module:
>     my $commandstring = $self->executable."$instring"."$param_string";
>      $self->debug( "clustal command = $commandstring");
>      my $status = system($commandstring);
>       unless( $status == 0 ) {
>            $self->warn( "Clustalw call ($commandstring) crashed: $?  
> \n");
>            return undef;
>       }
> 
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/); } close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet 
> => 1 parameter there may be an automatic ">& /dev/null" 
> appended to the end of the parameter string that you'll need 
> to figure out how to override.
> 
> Sorry I don't have more time to help; I hope this gets you started.

I did it my way as I was doing it without modifying the Bioperl code (in
case I later updated to a new version and forgot about the changes I had
put into it).  So that code just sits in my perl script where it calls
the Bioperl module to create the Clustal alignment object.


From Xianjun.Dong at bccs.uib.no  Tue May 29 15:02:21 2007
From: Xianjun.Dong at bccs.uib.no (Dong Xianjun)
Date: Tue, 29 May 2007 17:02:21 +0200
Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why?
In-Reply-To: <465C2F8E.2070309@ed.ac.uk>
References: <465AD6E8.3030707@ii.uib.no>		<62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com>		<62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com>		<465C1533.6070900@ii.uib.no>	<358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com>
	<465C2AE1.30101@ii.uib.no> <465C2F8E.2070309@ed.ac.uk>
Message-ID: <465C407D.608@ii.uib.no>

HI, Darren

The sequences are from Human and zebrafish. I currently use two 
sequences. I just want to see what's the substitution pattern there is. 
But your comment remind me whether I should get the other species 
involved, like mouse, chicken.

BTW, what's you mean 'per codon, not per site'? Do you mean the Ds(Ks) 
of Codeml is for per codon, and Yn00 is for per site?
I think there should be a possible/reasonable way to calculate the 
synonymous substitution, even if the divergence is big enough. If the 
Codeml is not a good solution for that case, do you have better suggestion?

Thanks

Xianjun

Darren Obbard wrote:
> Out of interest, what are the species, and how much sequence are you 
> using?
>
> - Estimating Ds when it is >>1 is very hard anyway, since the 
> substitutions are saturated. i.e. Regardless of the method, there will 
> be some level of divergence for which Ds can no longer be estimated. A 
> Ds of ~14 (for PAML I think this is per codon, not per site) sounds 
> very high to me - higher than I would want to try to estimate Ds.
>
> Dong Xianjun wrote:
>> Thanks for information, Albert.
>>
>> But still in two questions:
>> Albert Vilella wrote:
>>> codeml in PAML can give different results in cases where the 
>>> optimization reaches different local maxima depending on the 
>>> different starting points of each run (seed values). So, at least 
>>> for some methods and options, this instability is inherent to the 
>>> underlying algorithm.
>> 1. How to set the initial value in order to get a reasonable 
>> estimation? Do you have some experience for that?
>>> Even more, for some methods and options, it is even recommended in 
>>> PAML documentation to run the same data more than once, to see if 
>>> the results are the same, which would be a good indication that the 
>>> model is robust given the data.
>> 2. Is there a recommend way to test the significance if the results 
>> are different? For example, in my case, dS could range from 10.1852 
>> to 14.9961 for the four runtime. If that means the model is not 
>> robust(how to check this?), should I change to use another model?
>>
>> How could YN00 reach stable result? (Is it because YN00 does not 
>> require initial value for optimization?) Why could YN00 produce so 
>> different result from Codeml? (for YN00, dS=2.1300 with SE=1.2272; 
>> for Codeml, dS=10.1852-14.9961)
>>> Maybe PAML's author can give a more specific answer for your data at:
>>> http://www.rannala.org/gsf/viewforum.php?f=1
>>
>> Actually I already post my question in the author's forum. Let's wait 
>> and see.
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> On 5/29/07, *Dong Xianjun* <Xianjun.Dong at bccs.uib.no 
>>> <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>
>>>     HI, dear all, //sorry for duplicated msg for /Jason/ and /Neil/
>>>
>>>     I'm bothering by two problems when I use PAML module to calculate
>>>     Ka/Ks for my sequences. Could you help me?
>>>
>>>     1.  Codeml could produce different Ka/Ks value if I run it twice.
>>>     I check it both in command line and in Perl wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml;
>>>
>>>     The input sequences are:
>>>     >seq1
>>>     
>>> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG 
>>>
>>>     >seq2
>>>     
>>> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG 
>>>
>>>
>>>     For command-line program, I used Codeml in PAML3.14, with
>>>     specifications in codeml.ctl (runmode = -2, seqtype = 1). I tried
>>>     to run the program four times.  The output are like below (from
>>>     the output file). We could see that they are different from each
>>>     other. they should be same or slightly different. Right? But they
>>>     are NOT.  Weird!
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     t=11.5447  S=    42.4  N=   122.6  dN/dS= 0.0035  dN= 0.0522     
>>> dS=14.8339
>>>     t= 9.4132  S=    41.8  N=   123.2  dN/dS= 0.0041  dN= 0.0507     
>>> dS=12.2349
>>>     t=11.6305  S=    42.2  N=   122.8  dN/dS= 0.0034  dN= 0.0510     
>>> dS=14.9961
>>>     t= 7.7879  S=    41.4  N=   123.6  dN/dS= 0.0050  dN= 0.0505     
>>> dS=10.1852
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     I found the same problem when I use the Perl Wrapper of
>>>     Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script
>>>     here, similar to the one in BioPerl HOWTO).
>>>
>>>     2. Another strange thing is, if I switch to use program YN00 in
>>>     the package of PAML, the output are stable. However, it's much
>>>     different from Codeml. (see below)
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     seq. seq.     S       N        t   kappa    omega      dN +- SE  
>>>            dS +- SE
>>>        2    1    40.4   124.6   1.7452  1.3163  0.0378 0.0804 +-
>>>     0.0265  2.1300 +- 1.2272
>>>     
>>> ---------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>     Why like this? Which one I should believe?
>>>
>>>
>>>     Is there any guy who would kindly help me to run the perl script
>>>     (twice to check whether they are different)? or help to run the
>>>     codeml in command line?
>>>     I don't know whether there is anyone noticed this before, or
>>>     because of the wrong version of PAML.
>>>
>>>     Regards,
>>>
>>>     Xianjun
>>>
>>>
>>>
>>>     Himanshu Ardawatia wrote:
>>>>     #!/usr/bin/perl
>>>>
>>>>     use strict;
>>>>     use warnings;
>>>>
>>>>
>>>>     use Bio::Tools::Run::Phylo::PAML::Codeml;
>>>>     use Bio::Tools::Run::Alignment::Clustalw;
>>>>
>>>>     # for projecting alignments from protein to R/DNA space
>>>>     use Bio::Align::Utilities qw(aa_to_dna_aln);
>>>>
>>>>     # for input of the sequence data
>>>>     use Bio::SeqIO;
>>>>     use Bio::AlignIO;
>>>>
>>>>     my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
>>>>
>>>>     #my $seqdata = 'chuck.fa';
>>>>     my $seqdata = 'xianjun.fa ';
>>>>
>>>>     my $seqIO = new Bio::SeqIO(-file   => $seqdata,
>>>>                                -format => 'fasta');
>>>>     my %seqs;
>>>>     my @prots;
>>>>
>>>>     my $output;
>>>>     # process each sequence
>>>>     while( my $seq = $seqIO->next_seq ) {
>>>>         $seqs{$seq->display_id} = $seq;
>>>>         # translate them into protein
>>>>         my $protein = $seq->translate();
>>>>         my $pseq = $protein->seq();
>>>>         if( $pseq =~ /\*/ &&
>>>>         $pseq !~ /\*$/ ) {
>>>>         warn("provided a cDNA sequence with a stop codon, PAML will
>>>>     choke!");
>>>>         exit(0);
>>>>         }
>>>>         # Tcoffee can't handle '*' even if it is trailing
>>>>         $pseq =~ s/\*//g;
>>>>         $protein->seq($pseq);
>>>>         push @prots, $protein;
>>>>     }
>>>>
>>>>     if( @prots < 2 ) {
>>>>         warn("Need at least 2 cDNA sequences to proceed");
>>>>         exit(0);
>>>>     }
>>>>
>>>>     open(OUT, ">align_output.txt") ||
>>>>           die("cannot open output $output for writing");
>>>>     # Align the sequences with clustalw
>>>>
>>>>     my $aa_aln = $aln_factory->align(\@prots);
>>>>
>>>>     # project the protein alignment back to cDNA coordinates
>>>>     my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
>>>>
>>>>     my @each = $dna_aln->each_seq();
>>>>
>>>>     my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml
>>>>                       ( -params => { 'runmode' => -2,
>>>>                              'seqtype' => 1,
>>>>                      'model' => 1,
>>>>                     }
>>>>                   );
>>>>
>>>>     # set the alignment object
>>>>     $kaks_factory->alignment($dna_aln);
>>>>
>>>>     # run the KaKs analysis
>>>>     my ($rc,$parser) = $kaks_factory->run();
>>>>     my $result = $parser->next_result;
>>>>     my $MLmatrix = $result->get_MLmatrix();
>>>>
>>>>     my @otus = $result->get_seqs();
>>>>     # this gives us a mapping from the PAML order of sequences back to
>>>>     # the input order (since names get truncated)
>>>>     my @pos = map {
>>>>         my $c= 1;
>>>>         foreach my $s ( @each ) {
>>>>         last if( $s->display_id eq $_->display_id );
>>>>         $c++;
>>>>         }
>>>>         $c;
>>>>     } @otus;
>>>>
>>>>     print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
>>>>     CDNA_PERCENTID)), "\n";
>>>>     for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
>>>>         for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
>>>>         my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>>>>         print OUT join("\t",                    $otus[$i]->display_id,
>>>>                    
>>>> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>>>>                    $MLmatrix->[$i]->[$j]->{'dS'},
>>>>                    $MLmatrix->[$i]->[$j]->{'omega'},
>>>>                    sprintf("%.2f",$sub_aa_aln->percentage_identity),
>>>>                    sprintf("%.2f",$sub_dna_aln->percentage_identity),
>>>>                    ), "\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>     On 5/29/07, *Himanshu Ardawatia* <himanshu.ardawatia at bccs.uib.no
>>>>     <mailto:himanshu.ardawatia at bccs.uib.no>> wrote:
>>>>
>>>>         Hi Xianjun,
>>>>
>>>>         I recognize this script. But it was a bit cumbersom to use
>>>>         this as many things are done in the script (like multiple
>>>>         alignment, aa to dna alignment and ka/ks calculation) so one
>>>>         does not have real control on these different aspect.
>>>>         I do not remeber getting different Ka/Ks in different runs
>>>>         though. But I remeber that one I ran the script with
>>>>         different versions of clustalw and it REALLY gave different
>>>>         results !! So please make sure if the clustalw versions are
>>>>         the same in all your runs. Best is to use the latest version.
>>>>
>>>>         Finally I wrote my simple script which would generate a
>>>>         codeml.ctl file for each set of sequences and run the codeml
>>>>         based on that and then more on. Disadvantage of this can be
>>>>         that some files keep getting over-written (like the one
>>>>         which have their names hard-coded in codeml program) and if
>>>>         one needs those files as well then one needs to run the
>>>>         codeml cycles for each set of sequences in different
>>>>         directories.
>>>>
>>>>         One advantage of this kind of script is that you can use
>>>>         whichever alignment program you want to use and so on....But
>>>>         then its also extra steps of yourself doing multiple
>>>>         alignment and aa to dna alignment etc....
>>>>
>>>>         Does it make sense? If you still get different outputs with
>>>>         same version of clustalw then I can sit with you and look at
>>>>         things together. Or else try the script method which I
>>>>         mentioned.
>>>>
>>>>         Cheers  and Fu
>>>>         Himanshu
>>>>         \\
>>>>
>>>>         On 5/28/07, *Dong Xianjun* < Xianjun.Dong at bccs.uib.no
>>>>         <mailto:Xianjun.Dong at bccs.uib.no>> wrote:
>>>>
>>>>             HI, Himanshu
>>>>
>>>>             I am sure you did some work in Ka/Ks calculation. Here I
>>>>             have a question
>>>>             bothering me; the output for
>>>>             Bio::Tools::Run::Phylo::PAML::Codeml is not
>>>>             stable(different for each runtime), and also different
>>>>             from the output
>>>>             with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.
>>>>
>>>>             Here I attached the script. Could you help to have a
>>>>             look and try to run
>>>>             the script? How is your way to calculate the Kaks ratio?
>>>>
>>>>             Thanks
>>>>
>>>>             --
>>>>             ---------------------------
>>>>             Sterding (Xianjun) Dong
>>>>             PhD student, Boris Lenhard's group
>>>>             Bergen Center of Computational Science
>>>>             Bergen University, Norway
>>>>             Mobile: 0047-47361688
>>>>             Telephone: 0047-55276381
>>>>             Skype: xianjun.dong
>>>>
>>>>
>>>>
>>>>
>>>
>>>     --     ---------------------------
>>>     Sterding (Xianjun) Dong
>>>     PhD student, Boris Lenhard's group
>>>     Bergen Center of Computational Science
>>>     Bergen University, Norway
>>>     Mobile: 0047-47361688
>>>     Telephone: 0047-55276381
>>>
>>>     Skype: xianjun.dong
>>>        
>>>
>>>     _______________________________________________
>>>     Bioperl-l mailing list
>>>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> ---------------------------
>> Sterding (Xianjun) Dong
>> PhD student, Boris Lenhard's group
>> Bergen Center of Computational Science
>> Bergen University, Norway
>> Mobile: 0047-47361688
>> Telephone: 0047-55276381
>> Skype: xianjun.dong
>>   
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
---------------------------
Sterding (Xianjun) Dong
PhD student, Boris Lenhard's group
Bergen Center of Computational Science
Bergen University, Norway
Mobile: 0047-47361688
Telephone: 0047-55276381
Skype: xianjun.dong


From bix at sendu.me.uk  Thu May 31 08:34:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 09:34:38 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E889E.3090304@sendu.me.uk>

Jason Stajich wrote:
> Do something like:
> 
> my $fh;
> open($fh, "$commandstring |");
> my $score;
> while(<$fh>) {
>    $score = $1 if ($_ =~ /Score:(\d+)/);
> }
> close($fh);
> 
> ... then at the bottom after the alignment is created do:
> 
> $aln->score($score);
> 
> 
> There may be some more debugging b/c if you invoke the quiet => 1  
> parameter there may be an automatic ">& /dev/null" appended to the  
> end of the parameter string that you'll need to figure out how to  
> override.

Is there any particular reason for not having something along these 
lines committed to the module? Shall I go ahead and implement?


From bix at sendu.me.uk  Thu May 31 09:54:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 31 May 2007 10:54:32 +0100
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
Message-ID: <465E9B58.1020403@sendu.me.uk>

Jason Stajich wrote:
>    $score = $1 if ($_ =~ /Score:(\d+)/);

I see that there are lots of lines in the output that match the above 
regex, but there is also a single /Alignment Score (\d+)/ line printed 
at the end. Isn't that the score that should get stored in $aln->score()?


From jason at bioperl.org  Thu May 31 18:08:19 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 31 May 2007 11:08:19 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <465E9B58.1020403@sendu.me.uk>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO>
	<DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org>
	<465E9B58.1020403@sendu.me.uk>
Message-ID: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>

you're right --- it is not really my code, I was just elaborating  
Kevin's example --- it would probably need to be more specific or  
perhaps the last Score seen is sufficient for what one is trying to  
capture?

-j
On May 31, 2007, at 2:54 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>>    $score = $1 if ($_ =~ /Score:(\d+)/);
>
> I see that there are lots of lines in the output that match the  
> above regex, but there is also a single /Alignment Score (\d+)/  
> line printed at the end. Isn't that the score that should get  
> stored in $aln->score()?
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From Kevin.M.Brown at asu.edu  Thu May 31 18:15:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 31 May 2007 11:15:38 -0700
Subject: [Bioperl-l] ClustalW Score?
In-Reply-To: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
References: <00e201c7a2de$91f60f50$2d01a8c0@PICO><DFEEDFC9-68C4-4821-846F-69AC9559C70B@bioperl.org><465E9B58.1020403@sendu.me.uk>
	<49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org>
Message-ID: <1A4207F8295607498283FE9E93B775B40334A01A@EX02.asurite.ad.asu.edu>

> you're right --- it is not really my code, I was just 
> elaborating Kevin's example --- it would probably need to be 
> more specific or perhaps the last Score seen is sufficient 
> for what one is trying to capture?

I took that code from a pairwise clustal alignment script that I wrote
to deal with aligning a bunch of short sequences against a long one to
see where they line up at.  When all of them were fed to Clustal the
short sequences all ended up aligned to each other and not well aligned
to the longer sequence.  I only saw one score in the output from the
pairwise, so that is what I used to find a reasonable value.