From barry.moore at genetics.utah.edu  Thu Nov  1 00:03:01 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 31 Oct 2007 22:03:01 -0600
Subject: [Bioperl-l] BLAST output parsing
In-Reply-To: <a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
References: <13519112.post@talk.nabble.com>
	<a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu>

Swapna-

If you are using NCBI fasta files you can use files from NCBIs gene  
database to map your gene IDs to names and organisms.  Look in  
particular at the files gene2accession, gene2refseq, and gene_info.   
For example, if you had RefSeq protein IDs like NP_123456, you could  
use gene2refseq to map those RefSeq accessions to gene IDs and then  
gene_info to map the gene IDs to organisms and gene name.

B

On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote:

> Swapna,
>
>> I am new to bioperl.  I did BLAST search of ~4000 genes and I need  
>> to parse
>> it.  I did use -m 9 option to get a tabular information of the  
>> blast data.
>> But it does not include the gene names or the names of the  
>> organisms of each
>> hit.  Are there any parsers that can do this job ??
>
> The -m 9 tabular output does not include gene descriptions and
> organisms. It only includes the "gene id" that was present immediately
> after the ">" sign in the FASTA file that was used to create the BLAST
> database you specified with the -d option when you ran BLAST.
>
> Hence, no parser will help you. You either have to re-do the BLAST
> with a different -m value that includes the information you desire, or
> write code to convert your gene IDs into what you want.
>
> --
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 05:45:43 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 10:45:43 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on
	windows
Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de>

Dear all,

I have emboss installed on a windows machine. (Embosswin). I can run
this from the dos command line and the path is present. However, when I 
try to call
an emboss application from bioperl I get a "Application not found error"


  my $f = Bio::Factory::EMBOSS->new();
  # get an EMBOSS application  object from the factory
  my $fuzznuc = $f->program('fuzznuc');
    $fuzznuc->run(
                  { -sequence  => $infile,
                        -pattern   => $motif,
                       -outfile   => $outfile                       
              });
 gives the following error

-------------------- WARNING ---------------------
MSG: Application [fuzznuc] is not available!
---------------------------------------------------
Can't call method "run" on an undefined value at searchPatterns.pl line 
102.

Can somebody help me fix this ?

best regards
Rohit

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 10:22:14 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:22:14 -0400
Subject: [Bioperl-l] PAML/Codeml parsing
Message-ID: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>

PAML4 breaks our PAML parser right now because the order of things in  
the result file has changed.  Now sequences precede the information  
about the version or the program run.  This means that $result- 
 >get_seqs() fails because we don't parse the sequences.

We'll see what we can do, but as usual with supporting 3rd party  
programs it is brittle when file formats change.  Th

-jason

--
Jason Stajich
jason at bioperl.org


From jason at bioperl.org  Thu Nov  1 10:32:06 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:32:06 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>

Presumably the PATH is not getting set properly - you should play  
around printing the $ENV{PATH} variable in a perl script to see if  
actually contains the directory where the emboss programs are  
installed.  Bioperl can only guess so much as to where to find an  
application.  It is also possible that we aren't creating the proper  
path to the executable - you can print the executable path with
print $fuzznuc->executable
I believe unless it is throwing an error at the program() line.

It looks like the code in the Factory object is a little fragile  
assuming that the programs HAVE to be in your $PATH.  I don't know if  
windows+perl is special in any way that it run things so I can't  
really tell if there is specific things you have to do here. You may  
have to run this through cygwin in case PATH and such are just not  
available properly to windowsPerl.

-jason
On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:

> Dear all,
>
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. However,  
> when I
> try to call
> an emboss application from bioperl I get a "Application not found  
> error"
>
>
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile
>               });
>  gives the following error
>
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at searchPatterns.pl  
> line
> 102.
>
> Can somebody help me fix this ?
>
> best regards
> Rohit
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Thu Nov  1 10:54:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 09:54:09 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu>

This worked for me previously when I tested with WinXP on my old  
machine using EMBOSS v5:

ftp://emboss.open-bio.org/pub/EMBOSS/windows

I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably  
better to use the latest EMBOSS version anyway so I suggest trying  
the version in the above link.  I'll test it again today and let you  
know what I find.

chris

On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote:

> Presumably the PATH is not getting set properly - you should play
> around printing the $ENV{PATH} variable in a perl script to see if
> actually contains the directory where the emboss programs are
> installed.  Bioperl can only guess so much as to where to find an
> application.  It is also possible that we aren't creating the proper
> path to the executable - you can print the executable path with
> print $fuzznuc->executable
> I believe unless it is throwing an error at the program() line.
>
> It looks like the code in the Factory object is a little fragile
> assuming that the programs HAVE to be in your $PATH.  I don't know if
> windows+perl is special in any way that it run things so I can't
> really tell if there is specific things you have to do here. You may
> have to run this through cygwin in case PATH and such are just not
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However,
>> when I
>> try to call
>> an emboss application from bioperl I get a "Application not found
>> error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl
>> line
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Thu Nov  1 11:31:40 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 11:31:40 -0400
Subject: [Bioperl-l] PAML3 vs 4
Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org>

Small tweaks were needed to parse PAML4 results.

Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly  
now on both PAML 3 and 4.
You'll need to get the latest code from CVS in order to see the  
changes to Bio/Tools/Phylo/PAML.pm

I've added tests for PAML4 in the parser and the run code.

If you have scripts that use codeml please give it a try.  I have not  
attempted to play with BASEML or AAML results at this point so if you  
also have codes that use those programs, please try it out and  
provide bugreports if we need to fix things.

-jason

--
Jason Stajich
jason at bioperl.org


From Kevin.M.Brown at asu.edu  Thu Nov  1 13:25:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 1 Nov 2007 10:25:30 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	onwindows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu>

Sounds like a path issue.  Try to tell bioperl the full path to the
executable rather than just the executable name. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 2:46 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl: cannot run emboss programs 
> using bioperl onwindows
> 
> Dear all,
> 
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. 
> However, when I 
> try to call
> an emboss application from bioperl I get a "Application not 
> found error"
> 
> 
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile                       
>               });
>  gives the following error
> 
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at 
> searchPatterns.pl line 
> 102.
> 
> Can somebody help me fix this ?
> 
> best regards
> Rohit
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 14:06:48 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:06:48 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de>


Thanks for all the suggestions... but I unfortunately still cannot run 
emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
and the
path is set correctly. I printed $ENV{$PATH} and this contains 
C:\EMBOSSwin which is the correct location.
I also tried setting the path directly but I'm not sure how to do this, 
so I tried this...

my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');

this also did not work.

Also tried printing...
$fuzznuc->executable()

gave the following error again
-------------------- WARNING ---------------------
MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
---------------------------------------------------

Any more ideas ?

thanks !
Rohit


here's the code...

use strict;
use Bio::Factory::EMBOSS;
use Data::Dumper;

#
# print "PATH=$ENV{PATH}\n";
# path contains C:\EMBOSSwin which is the correct location
# embossversion is 2.10.0-Win-0.8

 my $f = Bio::Factory::EMBOSS->new();
 # get an EMBOSS application  object from the factory
 print Dumper ($f);
 my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
as well,
 print Dump ($fuzznuc);
 
 #dump of fuzznuc
 #$VAR1 = bless( {
 #                '_programgroup' => {},
 #                '_programs' => {},
 #                '_groups' => {}
 #              }, 'Bio::Factory::EMBOSS' );
 
 #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
 
 my $infile = "temp.fasta";
 my $motif  = "ATGTCGATC";
 my $outfile = "test.out";

 
 $fuzznuc->run(
                  { -sequence  => $infile,
                    -pattern   => $motif,
                    -outfile   => $outfile                      
              });
    
Here's the error again....

#-------------------- WARNING ---------------------
#MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
#---------------------------------------------------


Jason Stajich wrote:
> Presumably the PATH is not getting set properly - you should play 
> around printing the $ENV{PATH} variable in a perl script to see if 
> actually contains the directory where the emboss programs are 
> installed.  Bioperl can only guess so much as to where to find an 
> application.  It is also possible that we aren't creating the proper 
> path to the executable - you can print the executable path with 
> print $fuzznuc->executable 
> I believe unless it is throwing an error at the program() line.  
>
> It looks like the code in the Factory object is a little fragile 
> assuming that the programs HAVE to be in your $PATH.  I don't know if 
> windows+perl is special in any way that it run things so I can't 
> really tell if there is specific things you have to do here. You may 
> have to run this through cygwin in case PATH and such are just not 
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However, when I 
>> try to call
>> an emboss application from bioperl I get a "Application not found error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile                       
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl line 
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 14:37:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 14:37:24 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>

You could try this - can't test it though so not sure.
my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\fuzznuc');

-jason
On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:

>
>
> Thanks for all the suggestions... but I unfortunately still cannot run
> emboss. I am running the latest version of embosswin  (2.10.0- 
> Win-0.8),
> and the
> path is set correctly. I printed $ENV{$PATH} and this contains
> C:\EMBOSSwin which is the correct location.
> I also tried setting the path directly but I'm not sure how to do  
> this,
> so I tried this...
>
> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>
> this also did not work.
>
> Also tried printing...
> $fuzznuc->executable()
>
> gave the following error again
> -------------------- WARNING ---------------------
> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> ---------------------------------------------------
>
> Any more ideas ?
>
> thanks !
> Rohit
>
>
> here's the code...
>
> use strict;
> use Bio::Factory::EMBOSS;
> use Data::Dumper;
>
> #
> # print "PATH=$ENV{PATH}\n";
> # path contains C:\EMBOSSwin which is the correct location
> # embossversion is 2.10.0-Win-0.8
>
>  my $f = Bio::Factory::EMBOSS->new();
>  # get an EMBOSS application  object from the factory
>  print Dumper ($f);
>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
> fuzznuc.exe
> as well,
>  print Dump ($fuzznuc);
>
>  #dump of fuzznuc
>  #$VAR1 = bless( {
>  #                '_programgroup' => {},
>  #                '_programs' => {},
>  #                '_groups' => {}
>  #              }, 'Bio::Factory::EMBOSS' );
>
>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>
>  my $infile = "temp.fasta";
>  my $motif  = "ATGTCGATC";
>  my $outfile = "test.out";
>
>
>  $fuzznuc->run(
>                   { -sequence  => $infile,
>                     -pattern   => $motif,
>                     -outfile   => $outfile
>               });
>
> Here's the error again....
>
> #-------------------- WARNING ---------------------
> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> #---------------------------------------------------
>
>
>
>
> Jason Stajich wrote:
>> Presumably the PATH is not getting set properly - you should play
>> around printing the $ENV{PATH} variable in a perl script to see if
>> actually contains the directory where the emboss programs are
>> installed.  Bioperl can only guess so much as to where to find an
>> application.  It is also possible that we aren't creating the proper
>> path to the executable - you can print the executable path with
>> print $fuzznuc->executable
>> I believe unless it is throwing an error at the program() line.
>>
>> It looks like the code in the Factory object is a little fragile
>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>> windows+perl is special in any way that it run things so I can't
>> really tell if there is specific things you have to do here. You may
>> have to run this through cygwin in case PATH and such are just not
>> available properly to windowsPerl.
>>
>> -jason
>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>
>>> Dear all,
>>>
>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>> this from the dos command line and the path is present. However,  
>>> when I
>>> try to call
>>> an emboss application from bioperl I get a "Application not found  
>>> error"
>>>
>>>
>>>   my $f = Bio::Factory::EMBOSS->new();
>>>   # get an EMBOSS application  object from the factory
>>>   my $fuzznuc = $f->program('fuzznuc');
>>>     $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                         -pattern   => $motif,
>>>                        -outfile   => $outfile
>>>               });
>>>  gives the following error
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Application [fuzznuc] is not available!
>>> ---------------------------------------------------
>>> Can't call method "run" on an undefined value at  
>>> searchPatterns.pl line
>>> 102.
>>>
>>> Can somebody help me fix this ?
>>>
>>> best regards
>>> Rohit
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 14:41:41 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:41:41 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de>

Hi Jason

I tried this as well. This also gives the same error message.

-Rohit

Jason Stajich wrote:
> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>
>>
>>
>> Thanks for all the suggestions... but I unfortunately still cannot run 
>> emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
>> and the
>> path is set correctly. I printed $ENV{$PATH} and this contains 
>> C:\EMBOSSwin which is the correct location.
>> I also tried setting the path directly but I'm not sure how to do this, 
>> so I tried this...
>>
>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>
>> this also did not work.
>>
>> Also tried printing...
>> $fuzznuc->executable()
>>
>> gave the following error again
>> -------------------- WARNING ---------------------
>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> ---------------------------------------------------
>>
>> Any more ideas ?
>>
>> thanks !
>> Rohit
>>
>>
>> here's the code...
>>
>> use strict;
>> use Bio::Factory::EMBOSS;
>> use Data::Dumper;
>>
>> #
>> # print "PATH=$ENV{PATH}\n";
>> # path contains C:\EMBOSSwin which is the correct location
>> # embossversion is 2.10.0-Win-0.8
>>
>>  my $f = Bio::Factory::EMBOSS->new();
>>  # get an EMBOSS application  object from the factory
>>  print Dumper ($f);
>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,
>>  print Dump ($fuzznuc);
>>
>>  #dump of fuzznuc
>>  #$VAR1 = bless( {
>>  #                '_programgroup' => {},
>>  #                '_programs' => {},
>>  #                '_groups' => {}
>>  #              }, 'Bio::Factory::EMBOSS' );
>>
>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>
>>  my $infile = "temp.fasta";
>>  my $motif  = "ATGTCGATC";
>>  my $outfile = "test.out";
>>
>>
>>  $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                     -pattern   => $motif,
>>                     -outfile   => $outfile                      
>>               });
>>
>> Here's the error again....
>>
>> #-------------------- WARNING ---------------------
>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> #---------------------------------------------------
>>
>>
>>
>>
>> Jason Stajich wrote:
>>> Presumably the PATH is not getting set properly - you should play 
>>> around printing the $ENV{PATH} variable in a perl script to see if 
>>> actually contains the directory where the emboss programs are 
>>> installed.  Bioperl can only guess so much as to where to find an 
>>> application.  It is also possible that we aren't creating the proper 
>>> path to the executable - you can print the executable path with 
>>> print $fuzznuc->executable 
>>> I believe unless it is throwing an error at the program() line.  
>>>
>>> It looks like the code in the Factory object is a little fragile 
>>> assuming that the programs HAVE to be in your $PATH.  I don't know if 
>>> windows+perl is special in any way that it run things so I can't 
>>> really tell if there is specific things you have to do here. You may 
>>> have to run this through cygwin in case PATH and such are just not 
>>> available properly to windowsPerl.
>>>
>>> -jason
>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>> this from the dos command line and the path is present. However, 
>>>> when I 
>>>> try to call
>>>> an emboss application from bioperl I get a "Application not found 
>>>> error"
>>>>
>>>>
>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>   # get an EMBOSS application  object from the factory
>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>     $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                         -pattern   => $motif,
>>>>                        -outfile   => $outfile                       
>>>>               });
>>>>  gives the following error
>>>>
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>> Can't call method "run" on an undefined value at searchPatterns.pl 
>>>> line 
>>>> 102.
>>>>
>>>> Can somebody help me fix this ?
>>>>
>>>> best regards
>>>> Rohit
>>>>
>>>> -- 
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de> 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From MEC at stowers-institute.org  Thu Nov  1 14:57:33 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 1 Nov 2007 13:57:33 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs
	usingbioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>


in the code
http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 

there is a call to `wossname` (c.f.
http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
)

is wossname in your path?

Maybe it needs to be wossname.exe under windows?


Malcolm Cook
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 1:42 PM
> To: Jason Stajich
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
> usingbioperlonwindows
> 
> Hi Jason
> 
> I tried this as well. This also gives the same error message.
> 
> -Rohit
> 
> Jason Stajich wrote:
> > You could try this - can't test it though so not sure.
> > my $fuzznuc = $f->program('fuzznuc');
> > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> >
> > -jason
> > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> >
> >>
> >>
> >> Thanks for all the suggestions... but I unfortunately still cannot 
> >> run emboss. I am running the latest version of embosswin  
> >> (2.10.0-Win-0.8), and the path is set correctly. I printed 
> >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
> >> location.
> >> I also tried setting the path directly but I'm not sure how to do 
> >> this, so I tried this...
> >>
> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >>
> >> this also did not work.
> >>
> >> Also tried printing...
> >> $fuzznuc->executable()
> >>
> >> gave the following error again
> >> -------------------- WARNING ---------------------
> >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> ---------------------------------------------------
> >>
> >> Any more ideas ?
> >>
> >> thanks !
> >> Rohit
> >>
> >>
> >> here's the code...
> >>
> >> use strict;
> >> use Bio::Factory::EMBOSS;
> >> use Data::Dumper;
> >>
> >> #
> >> # print "PATH=$ENV{PATH}\n";
> >> # path contains C:\EMBOSSwin which is the correct location # 
> >> embossversion is 2.10.0-Win-0.8
> >>
> >>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
> application  
> >> object from the factory  print Dumper ($f);  my $fuzznuc = 
> >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
> as well,  
> >> print Dump ($fuzznuc);
> >>
> >>  #dump of fuzznuc
> >>  #$VAR1 = bless( {
> >>  #                '_programgroup' => {},
> >>  #                '_programs' => {},
> >>  #                '_groups' => {}
> >>  #              }, 'Bio::Factory::EMBOSS' );
> >>
> >>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
> doesn't work
> >>
> >>  my $infile = "temp.fasta";
> >>  my $motif  = "ATGTCGATC";
> >>  my $outfile = "test.out";
> >>
> >>
> >>  $fuzznuc->run(
> >>                   { -sequence  => $infile,
> >>                     -pattern   => $motif,
> >>                     -outfile   => $outfile                      
> >>               });
> >>
> >> Here's the error again....
> >>
> >> #-------------------- WARNING ---------------------
> >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> #---------------------------------------------------
> >>
> >>
> >>
> >>
> >> Jason Stajich wrote:
> >>> Presumably the PATH is not getting set properly - you should play 
> >>> around printing the $ENV{PATH} variable in a perl script 
> to see if 
> >>> actually contains the directory where the emboss programs are 
> >>> installed.  Bioperl can only guess so much as to where to find an 
> >>> application.  It is also possible that we aren't creating 
> the proper 
> >>> path to the executable - you can print the executable path with 
> >>> print $fuzznuc->executable I believe unless it is 
> throwing an error 
> >>> at the program() line.
> >>>
> >>> It looks like the code in the Factory object is a little fragile 
> >>> assuming that the programs HAVE to be in your $PATH.  I 
> don't know 
> >>> if
> >>> windows+perl is special in any way that it run things so I can't
> >>> really tell if there is specific things you have to do 
> here. You may 
> >>> have to run this through cygwin in case PATH and such are 
> just not 
> >>> available properly to windowsPerl.
> >>>
> >>> -jason
> >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> I have emboss installed on a windows machine. (Embosswin). I can 
> >>>> run this from the dos command line and the path is present. 
> >>>> However, when I try to call an emboss application from bioperl I 
> >>>> get a "Application not found error"
> >>>>
> >>>>
> >>>>   my $f = Bio::Factory::EMBOSS->new();
> >>>>   # get an EMBOSS application  object from the factory
> >>>>   my $fuzznuc = $f->program('fuzznuc');
> >>>>     $fuzznuc->run(
> >>>>                   { -sequence  => $infile,
> >>>>                         -pattern   => $motif,
> >>>>                        -outfile   => $outfile            
>            
> >>>>               });
> >>>>  gives the following error
> >>>>
> >>>> -------------------- WARNING ---------------------
> >>>> MSG: Application [fuzznuc] is not available!
> >>>> ---------------------------------------------------
> >>>> Can't call method "run" on an undefined value at 
> searchPatterns.pl 
> >>>> line 102.
> >>>>
> >>>> Can somebody help me fix this ?
> >>>>
> >>>> best regards
> >>>> Rohit
> >>>>
> >>>> --
> >>>>
> >>>> Dr. Rohit Ghai
> >>>> Institute of Medical Microbiology
> >>>> Faculty of Medicine
> >>>> Justus-Liebig University
> >>>> Frankfurter Strasse 107
> >>>> 35392 - Giessen
> >>>> GERMANY
> >>>>
> >>>> Tel  : 0049 (0)641-9946413
> >>>> Fax  : 0049 (0)641-9946409
> >>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org 
> <mailto:Bioperl-l at lists.open-bio.org>
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> --
> >>> Jason Stajich
> >>> jason at bioperl.org <mailto:jason at bioperl.org>
> >>>
> >>
> >> --
> >>
> >> Dr. Rohit Ghai
> >> Institute of Medical Microbiology
> >> Faculty of Medicine
> >> Justus-Liebig University
> >> Frankfurter Strasse 107
> >> 35392 - Giessen
> >> GERMANY
> >>
> >> Tel  : 0049 (0)641-9946413
> >> Fax  : 0049 (0)641-9946409
> >> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> >
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From arareko at campus.iztacala.unam.mx  Thu Nov  1 15:51:41 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Nov 2007 13:51:41 -0600
Subject: [Bioperl-l] bioperl: cannot run emboss
	programs	usingbioperlonwindows
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx>

Doesn't EMBOSS binaries live under 'bin'? Perhaps setting 
PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this:

my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc');

Adding .exe might be worth trying as well.

Mauricio.

Cook, Malcolm wrote:
> in the code
> http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 
> 
> there is a call to `wossname` (c.f.
> http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
> )
> 
> is wossname in your path?
> 
> Maybe it needs to be wossname.exe under windows?
> 
> 
> Malcolm Cook
>   
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
>> Sent: Thursday, November 01, 2007 1:42 PM
>> To: Jason Stajich
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
>> usingbioperlonwindows
>>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot 
>>>> run emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8), and the path is set correctly. I printed 
>>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
>>>> location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this, so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location # 
>>>> embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
>> application  
>>>> object from the factory  print Dumper ($f);  my $fuzznuc = 
>>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,  
>>>> print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
>> doesn't work
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile                      
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play 
>>>>> around printing the $ENV{PATH} variable in a perl script 
>> to see if 
>>>>> actually contains the directory where the emboss programs are 
>>>>> installed.  Bioperl can only guess so much as to where to find an 
>>>>> application.  It is also possible that we aren't creating 
>> the proper 
>>>>> path to the executable - you can print the executable path with 
>>>>> print $fuzznuc->executable I believe unless it is 
>> throwing an error 
>>>>> at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile 
>>>>> assuming that the programs HAVE to be in your $PATH.  I 
>> don't know 
>>>>> if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do 
>> here. You may 
>>>>> have to run this through cygwin in case PATH and such are 
>> just not 
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can 
>>>>>> run this from the dos command line and the path is present. 
>>>>>> However, when I try to call an emboss application from bioperl I 
>>>>>> get a "Application not found error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile            
>>            
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at 
>> searchPatterns.pl 
>>>>>> line 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Dr. Rohit Ghai
>>>>>> Institute of Medical Microbiology
>>>>>> Faculty of Medicine
>>>>>> Justus-Liebig University
>>>>>> Frankfurter Strasse 107
>>>>>> 35392 - Giessen
>>>>>> GERMANY
>>>>>>
>>>>>> Tel  : 0049 (0)641-9946413
>>>>>> Fax  : 0049 (0)641-9946409
>>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org 
>> <mailto:Bioperl-l at lists.open-bio.org>
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> --
>>>>> Jason Stajich
>>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>>
>>>> --
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Nov  1 16:07:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 15:07:39 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>

I did a little investigating using my old PC and was able to get  
fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a  
hoop or two but I managed to get it working.

First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.   
You need to remove EMBOSSWin and install the one I linked to  
previously (this is an actual EMBOSS beta release).  It's possible  
older EMBOSSWin can be configured, but I don't plan on checking it  
out myself.

Next, you need to ensure the binaries are in your PATH env. variable  
(test by running 'wossname' on the command line), then set  
EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like  
path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me  
and WinXP recognizes the UNIX'y form as a valid path.  If you don't  
know how to set env. variables go here:

http://vlaurie.com/computers2/Articles/environment.htm

Once that is set up you should be able to run the script using the  
latest (greatest?) EMBOSS.

chris

On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:

> Hi Jason
>
> I tried this as well. This also gives the same error message.
>
> -Rohit
>
> Jason Stajich wrote:
>> You could try this - can't test it though so not sure.
>> my $fuzznuc = $f->program('fuzznuc');
>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>
>> -jason
>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>
>>>
>>>
>>> Thanks for all the suggestions... but I unfortunately still  
>>> cannot run
>>> emboss. I am running the latest version of embosswin  (2.10.0- 
>>> Win-0.8),
>>> and the
>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>> C:\EMBOSSwin which is the correct location.
>>> I also tried setting the path directly but I'm not sure how to do  
>>> this,
>>> so I tried this...
>>>
>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>
>>> this also did not work.
>>>
>>> Also tried printing...
>>> $fuzznuc->executable()
>>>
>>> gave the following error again
>>> -------------------- WARNING ---------------------
>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> ---------------------------------------------------
>>>
>>> Any more ideas ?
>>>
>>> thanks !
>>> Rohit
>>>
>>>
>>> here's the code...
>>>
>>> use strict;
>>> use Bio::Factory::EMBOSS;
>>> use Data::Dumper;
>>>
>>> #
>>> # print "PATH=$ENV{PATH}\n";
>>> # path contains C:\EMBOSSwin which is the correct location
>>> # embossversion is 2.10.0-Win-0.8
>>>
>>>  my $f = Bio::Factory::EMBOSS->new();
>>>  # get an EMBOSS application  object from the factory
>>>  print Dumper ($f);
>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
>>> fuzznuc.exe
>>> as well,
>>>  print Dump ($fuzznuc);
>>>
>>>  #dump of fuzznuc
>>>  #$VAR1 = bless( {
>>>  #                '_programgroup' => {},
>>>  #                '_programs' => {},
>>>  #                '_groups' => {}
>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>
>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't  
>>> work
>>>
>>>  my $infile = "temp.fasta";
>>>  my $motif  = "ATGTCGATC";
>>>  my $outfile = "test.out";
>>>
>>>
>>>  $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                     -pattern   => $motif,
>>>                     -outfile   => $outfile
>>>               });
>>>
>>> Here's the error again....
>>>
>>> #-------------------- WARNING ---------------------
>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> #---------------------------------------------------
>>>
>>>
>>>
>>>
>>> Jason Stajich wrote:
>>>> Presumably the PATH is not getting set properly - you should play
>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>> actually contains the directory where the emboss programs are
>>>> installed.  Bioperl can only guess so much as to where to find an
>>>> application.  It is also possible that we aren't creating the  
>>>> proper
>>>> path to the executable - you can print the executable path with
>>>> print $fuzznuc->executable
>>>> I believe unless it is throwing an error at the program() line.
>>>>
>>>> It looks like the code in the Factory object is a little fragile
>>>> assuming that the programs HAVE to be in your $PATH.  I don't  
>>>> know if
>>>> windows+perl is special in any way that it run things so I can't
>>>> really tell if there is specific things you have to do here. You  
>>>> may
>>>> have to run this through cygwin in case PATH and such are just not
>>>> available properly to windowsPerl.
>>>>
>>>> -jason
>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have emboss installed on a windows machine. (Embosswin). I  
>>>>> can run
>>>>> this from the dos command line and the path is present. However,
>>>>> when I
>>>>> try to call
>>>>> an emboss application from bioperl I get a "Application not found
>>>>> error"
>>>>>
>>>>>
>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>   # get an EMBOSS application  object from the factory
>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>     $fuzznuc->run(
>>>>>                   { -sequence  => $infile,
>>>>>                         -pattern   => $motif,
>>>>>                        -outfile   => $outfile
>>>>>               });
>>>>>  gives the following error
>>>>>
>>>>> -------------------- WARNING ---------------------
>>>>> MSG: Application [fuzznuc] is not available!
>>>>> ---------------------------------------------------
>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>> line
>>>>> 102.
>>>>>
>>>>> Can somebody help me fix this ?
>>>>>
>>>>> best regards
>>>>> Rohit
>>>>>
>>>>> -- 
>>>>>
>>>>> Dr. Rohit Ghai
>>>>> Institute of Medical Microbiology
>>>>> Faculty of Medicine
>>>>> Justus-Liebig University
>>>>> Frankfurter Strasse 107
>>>>> 35392 - Giessen
>>>>> GERMANY
>>>>>
>>>>> Tel  : 0049 (0)641-9946413
>>>>> Fax  : 0049 (0)641-9946409
>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From neetisomaiya at gmail.com  Fri Nov  2 00:20:27 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 2 Nov 2007 09:50:27 +0530
Subject: [Bioperl-l] need help
Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>

Hi,

This is a perl question, not bioperl.
Can anyone point me to a perl program/code/function which can calculate the
number of days between any two given dates.
Any help will be deeply appreciated.
Thanks.

-- 
-Neeti
Even my blood says, B positive

From whs at ebi.ac.uk  Fri Nov  2 01:01:20 2007
From: whs at ebi.ac.uk (Will Spooner)
Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT)
Subject: [Bioperl-l] need help
In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0711020459530.17670@parrot.ebi.ac.uk>

Hi Neeti,

A non-bioperl answer to your perl questio; Date::Calc should do the trick.

Will

On Fri, 2 Nov 2007, neeti somaiya wrote:

> Hi,
>
> This is a perl question, not bioperl.
> Can anyone point me to a perl program/code/function which can calculate the
> number of days between any two given dates.
> Any help will be deeply appreciated.
> Thanks.
>
>

From smarkel at accelrys.com  Sat Nov  3 02:01:38 2007
From: smarkel at accelrys.com (Scott Markel)
Date: Fri, 2 Nov 2007 23:01:38 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlon	windows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <OFD3D05334.F9E235EF-ON88257388.00209BED-88257388.00211BD7@accelrys.com>

I set multiple environment variables in my code.

    $ENV{EMBOSS_ROOT}    = $embossPath;
    $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); 
    $ENV{EMBOSS_DB_DIR}  = File::Spec->catdir($embossPath, "test");
    $ENV{EMBOSS_DATA}    = File::Spec->catdir($embossPath, "data"); 
    $ENV{PATH}           = $embossPath; 

I found it necessary to set both PATH and EMBOSS_ROOT.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com


bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24:

> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> 
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> 
> >
> >
> > Thanks for all the suggestions... but I unfortunately still cannot run
> > emboss. I am running the latest version of embosswin  (2.10.0- 
> > Win-0.8),
> > and the
> > path is set correctly. I printed $ENV{$PATH} and this contains
> > C:\EMBOSSwin which is the correct location.
> > I also tried setting the path directly but I'm not sure how to do 
> > this,
> > so I tried this...
> >
> > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >
> > this also did not work.
> >
> > Also tried printing...
> > $fuzznuc->executable()
> >
> > gave the following error again
> > -------------------- WARNING ---------------------
> > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > ---------------------------------------------------
> >
> > Any more ideas ?
> >
> > thanks !
> > Rohit
> >
> >
> > here's the code...
> >
> > use strict;
> > use Bio::Factory::EMBOSS;
> > use Data::Dumper;
> >
> > #
> > # print "PATH=$ENV{PATH}\n";
> > # path contains C:\EMBOSSwin which is the correct location
> > # embossversion is 2.10.0-Win-0.8
> >
> >  my $f = Bio::Factory::EMBOSS->new();
> >  # get an EMBOSS application  object from the factory
> >  print Dumper ($f);
> >  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
> > fuzznuc.exe
> > as well,
> >  print Dump ($fuzznuc);
> >
> >  #dump of fuzznuc
> >  #$VAR1 = bless( {
> >  #                '_programgroup' => {},
> >  #                '_programs' => {},
> >  #                '_groups' => {}
> >  #              }, 'Bio::Factory::EMBOSS' );
> >
> >  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
> >
> >  my $infile = "temp.fasta";
> >  my $motif  = "ATGTCGATC";
> >  my $outfile = "test.out";
> >
> >
> >  $fuzznuc->run(
> >                   { -sequence  => $infile,
> >                     -pattern   => $motif,
> >                     -outfile   => $outfile
> >               });
> >
> > Here's the error again....
> >
> > #-------------------- WARNING ---------------------
> > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > #---------------------------------------------------
> >
> >
> >
> >
> > Jason Stajich wrote:
> >> Presumably the PATH is not getting set properly - you should play
> >> around printing the $ENV{PATH} variable in a perl script to see if
> >> actually contains the directory where the emboss programs are
> >> installed.  Bioperl can only guess so much as to where to find an
> >> application.  It is also possible that we aren't creating the proper
> >> path to the executable - you can print the executable path with
> >> print $fuzznuc->executable
> >> I believe unless it is throwing an error at the program() line.
> >>
> >> It looks like the code in the Factory object is a little fragile
> >> assuming that the programs HAVE to be in your $PATH.  I don't know if
> >> windows+perl is special in any way that it run things so I can't
> >> really tell if there is specific things you have to do here. You may
> >> have to run this through cygwin in case PATH and such are just not
> >> available properly to windowsPerl.
> >>
> >> -jason
> >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>
> >>> Dear all,
> >>>
> >>> I have emboss installed on a windows machine. (Embosswin). I can run
> >>> this from the dos command line and the path is present. However, 
> >>> when I
> >>> try to call
> >>> an emboss application from bioperl I get a "Application not found 
> >>> error"
> >>>
> >>>
> >>>   my $f = Bio::Factory::EMBOSS->new();
> >>>   # get an EMBOSS application  object from the factory
> >>>   my $fuzznuc = $f->program('fuzznuc');
> >>>     $fuzznuc->run(
> >>>                   { -sequence  => $infile,
> >>>                         -pattern   => $motif,
> >>>                        -outfile   => $outfile
> >>>               });
> >>>  gives the following error
> >>>
> >>> -------------------- WARNING ---------------------
> >>> MSG: Application [fuzznuc] is not available!
> >>> ---------------------------------------------------
> >>> Can't call method "run" on an undefined value at 
> >>> searchPatterns.pl line
> >>> 102.
> >>>
> >>> Can somebody help me fix this ?
> >>>
> >>> best regards
> >>> Rohit
> >>>
> >>> -- 
> >>>
> >>> Dr. Rohit Ghai
> >>> Institute of Medical Microbiology
> >>> Faculty of Medicine
> >>> Justus-Liebig University
> >>> Frankfurter Strasse 107
> >>> 35392 - Giessen
> >>> GERMANY
> >>>
> >>> Tel  : 0049 (0)641-9946413
> >>> Fax  : 0049 (0)641-9946409
> >>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason at bioperl.org <mailto:jason at bioperl.org>
> >>
> >
> > -- 
> >
> > Dr. Rohit Ghai
> > Institute of Medical Microbiology
> > Faculty of Medicine
> > Justus-Liebig University
> > Frankfurter Strasse 107
> > 35392 - Giessen
> > GERMANY
> >
> > Tel  :   0049 (0)641-9946413
> > Fax  :   0049 (0)641-9946409
> > Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Sat Nov  3 10:07:52 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Sat, 03 Nov 2007 15:07:52 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but
actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any
path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it
runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or
am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct 
version 5.0 (uses embossversion -auto internally, and seems to know 
where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ 
or with exe suffix doesn't work
    #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a 
Bio::Tools::Run::EMBOSSApplication object.
   
   
    #however, creating a EMBOSSApplication object directly makes it 
possible to run the program
    #
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


Chris Fields wrote:
> I did a little investigating using my old PC and was able to get 
> fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a 
> hoop or two but I managed to get it working.
>
> First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.  
> You need to remove EMBOSSWin and install the one I linked to 
> previously (this is an actual EMBOSS beta release).  It's possible 
> older EMBOSSWin can be configured, but I don't plan on checking it out 
> myself.
>
> Next, you need to ensure the binaries are in your PATH env. variable 
> (test by running 'wossname' on the command line), then set EMBOSS_DATA 
> to point at the EMBOSS data directory using a UNIX-like path (i.e. 
> 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP 
> recognizes the UNIX'y form as a valid path.  If you don't know how to 
> set env. variables go here:
>
> http://vlaurie.com/computers2/Articles/environment.htm
>
> Once that is set up you should be able to run the script using the 
> latest (greatest?) EMBOSS.
>
> chris
>
> On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:
>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot run
>>>> emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8),
>>>> and the
>>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>>> C:\EMBOSSwin which is the correct location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this,
>>>> so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location
>>>> # embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();
>>>>  # get an EMBOSS application  object from the factory
>>>>  print Dumper ($f);
>>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
>>>> fuzznuc.exe
>>>> as well,
>>>>  print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>>>
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play
>>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>>> actually contains the directory where the emboss programs are
>>>>> installed.  Bioperl can only guess so much as to where to find an
>>>>> application.  It is also possible that we aren't creating the proper
>>>>> path to the executable - you can print the executable path with
>>>>> print $fuzznuc->executable
>>>>> I believe unless it is throwing an error at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile
>>>>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do here. You may
>>>>> have to run this through cygwin in case PATH and such are just not
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>>>> this from the dos command line and the path is present. However,
>>>>>> when I
>>>>>> try to call
>>>>>> an emboss application from bioperl I get a "Application not found
>>>>>> error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>>> line
>>>>>> 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> -- 
>>>>>>
>
>


From hlapp at gmx.net  Sun Nov  4 12:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 4 Nov 2007 12:42:13 -0500
Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript
In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net>

Hi Stefanie,

sorry for taking so long to respond - your email got buried in a pile  
while I was away on travel. The Bio::SeqFeature::Gene::* modules were  
written mostly with the motivation to have a model that can represent  
the results of gene predictors.

GenBank AFAIK doesn't annotate introns explicitly, though they should  
be implicit from cDNA (or mRNA? or gene, as you say) features on  
genomic sequence. The Bioperl SeqIO parsers won't transform those  
into a Bio::SeqFeature::Gene-based model, but instead will yield just  
plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent  
processing to build these into more hierarchical models.

I'm not sure whether someone's done this already for GenBank-type  
feature tables. There is a Unflattener that at least attempts to  
build a feature hierarchy from the flat array that's compliant with  
the Sequence Ontology (or so I recall).

I'm copying the list in case others have additional suggestions.

	-hilmar

On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote:

>
>
> Hello Hilmar,
>
> I have a question about your bioperl module  
> Bio::SeqFeature::Gene::Transcript:
>
> I can't figure out how to generate the $gene object for use in this  
> line:
> @introns = $gene->introns();
>
> The data I'm working with is a local file in genbank format, and  
> I'm interested in extracting intron sequences (and maybe flanking  
> exons) for certain genes. I have been trying to get the introns via  
> the sequence features ('CDS' or 'gene'), but this has not been  
> working. Which approach will I have to take?
> I'd be very grateful if you could point me into the right direction!
>
> Hope things are going well in Durham! And thank you in advance!
>
> Stefanie
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From downloadondemand at gmail.com  Sun Nov  4 13:39:42 2007
From: downloadondemand at gmail.com (download on demand)
Date: Sun, 4 Nov 2007 20:39:42 +0200
Subject: [Bioperl-l] Help with Bio::SeqIO
Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>

Hi to all.

I have a problem with a simplest script:


         use Bio::SeqIO;
         # get command-line arguments, or die with a usage statement
         my $usage = "x2y.pl infile infileformat outfile outfileformat\n";
         my $infile = shift or die $usage;
         my $infileformat = shift or die $usage;
#         my $outfile = shift or die $usage;
         my $outfileformat = shift or die $usage;

         # create one SeqIO object to read in,and another to write out
         my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => $infileformat);
         my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
                                       '-format' => $outfileformat);

         # write each entry in the input file to the output file
         while (my $inseq = $seq_in->next_seq) {

#            $seq_out->write_seq($inseq); # Whole sequence not needed

for my $feat_object ($inseq->get_SeqFeatures)
    {
    if ($feat_object->primary_tag eq "CDS")
        {
        print $feat_object->get_tag_values('product'),"\n";
        print
$feat_object->location->start,"..",$feat_object->location->end,"\n";
        print $feat_object->spliced_seq->seq,"\n\n";
        }
    }


The result seems OK to me, but in case of first CDS of NC_005213.gbk from
here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/> the
output is wrong:

It is:
hypothetical protein
1..490885
TAAATGCGATTGCTATTAGAA..................................Truncated
sequence...................................

Should be:
hypothetical protein
879..490883
ATGCGATTGCTATTAGAA...................................Truncated
sequence....................................TAA


This CDS have an unnatural location string:
CDS             complement(join(490883..490885,1..879)), but spliced_seq
should handle these things?

Please help me!
Best regards, N.

From cjfields at uiuc.edu  Sun Nov  4 19:08:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Nov 2007 18:08:34 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>

Pass in (-nosort => 1) to spliced_seq:

print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";

This ensures no sorting of sublocations occurs, if you want for  
instance typical GenBank/EMBL 'join' behavior.

To the other devs: shouldn't -nosort be the default behavior when the  
split location is a 'join'?  In other words, should spliced_seq() be  
modified to take into account the split location type when returning  
sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly  
indicates the order of the sequences is important when joined  
together; the current behavior is more like that for 'order'.

chris

On Nov 4, 2007, at 12:39 PM, download on demand wrote:

> Hi to all.
>
> I have a problem with a simplest script:
>
>
>
>          use Bio::SeqIO;
>          # get command-line arguments, or die with a usage statement
>          my $usage = "x2y.pl infile infileformat outfile  
> outfileformat\n";
>          my $infile = shift or die $usage;
>          my $infileformat = shift or die $usage;
> #         my $outfile = shift or die $usage;
>          my $outfileformat = shift or die $usage;
>
>          # create one SeqIO object to read in,and another to write out
>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => $infileformat);
>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>                                        '-format' => $outfileformat);
>
>          # write each entry in the input file to the output file
>          while (my $inseq = $seq_in->next_seq) {
>
> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>
> for my $feat_object ($inseq->get_SeqFeatures)
>     {
>     if ($feat_object->primary_tag eq "CDS")
>         {
>         print $feat_object->get_tag_values('product'),"\n";
>         print
> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>         print $feat_object->spliced_seq->seq,"\n\n";
>         }
>     }
>
>
>
> The result seems OK to me, but in case of first CDS of  
> NC_005213.gbk from
> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/ 
> > the
> output is wrong:
>
> It is:
> hypothetical protein
> 1..490885
> TAAATGCGATTGCTATTAGAA..................................Truncated
> sequence...................................
>
> Should be:
> hypothetical protein
> 879..490883
> ATGCGATTGCTATTAGAA...................................Truncated
> sequence....................................TAA
>
>
>
> This CDS have an unnatural location string:
> CDS             complement(join(490883..490885,1..879)), but  
> spliced_seq
> should handle these things?
>
> Please help me!
> Best regards, N.
> _______________________________________________
>


From jean-luc.jany at univ-brest.fr  Mon Nov  5 03:26:52 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Mon, 05 Nov 2007 09:26:52 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to
	blastall
Message-ID: <472ED3CC.2050305@univ-brest.fr>

Dear Bioperl and Mac users,

I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables.

I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?)

Actually, my blast file is in myname directory and comprises a /bin and  a /data file. I have got my blastall and other executables in myname/blast/bin/blastall.

Thank you in anticipation for your help.

Jean-Luc


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Mon Nov  5 06:36:16 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Mon, 05 Nov 2007 12:36:16 +0100
Subject: [Bioperl-l] bioperl and emboss on windows
Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #
    # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work
    # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object.
    #
    #
    #
    # however, creating a EMBOSSApplication object directly makes it possible to run the program
    #
    
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


From neetisomaiya at gmail.com  Mon Nov  5 07:20:04 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 5 Nov 2007 17:50:04 +0530
Subject: [Bioperl-l] perl question
Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>

Again a perl question, and maybe a very trivial one.
How do I terminate a number like 3.1232010098 to only 3 decimal places in
perl?

-- 
-Neeti
Even my blood says, B positive

From biology0046 at hotmail.com  Mon Nov  5 07:16:13 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Mon, 05 Nov 2007 12:16:13 +0000
Subject: [Bioperl-l] how to extract intron information from gff files.
Message-ID: <BLU108-F34DC66B7BB1B9063DA2BC8B4880@phx.gbl>

Dear all:

i got a poplar genome gff file like this:
LG_I	src	exon	2598	3280	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	2598	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 4
LG_I	src	start_codon	3278	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	stop_codon	2598	2600	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	exon	3544	3918	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	3544	3918	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 3
LG_I	src	exon	4258	4740	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	4258	4740	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 2
LG_I	src	exon	5344	6388	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	5344	6388	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 1
LG_I	src	exon	8259	8528	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8259	8528	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 3
LG_I	src	stop_codon	8259	8261	.	-	0	name "fgenesh1_pg.C_LG_I000002"
LG_I	src	exon	8897	8987	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8897	8987	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 2
LG_I	src	exon	9831	9892	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	9831	9892	.	-	1	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 1
LG_I	src	start_codon	9890	9892	.	-	0	name "fgenesh1_pg.C_LG_I000002"

I try to use Bio::DB::GFF, but this module only applies to methods given in 
the gff file.
what i want to get is "intron, 5utr, 3utr", but this information do not 
contain in this gff file.

how can i get these information through bioperl? This file do not contain 
intron information
if i consider gaps between exons as introns, non cds parts of the first and 
last exon as utrs, how can i extract them through this gff file.

Thanks~~

Wenkai

_________________________________________________________________
?????????????????????????????? MSN Hotmail??  http://www.hotmail.com  


From spiros at lokku.com  Mon Nov  5 07:36:36 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 5 Nov 2007 12:36:36 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <bba689ec0711050436r6016ae57le78db531f9eab55b@mail.gmail.com>

Hey,

use the `sprintf` function. More information can be found at ,
http://perldoc.perl.org/functions/sprintf.html.

For more proper rounding, you could use the Math::Round module,
http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm.

hope this helps,
spiros

On 11/5/07, neeti somaiya <neetisomaiya at gmail.com> wrote:
>
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From ak at ebi.ac.uk  Mon Nov  5 07:43:06 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 12:43:06 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <20071105124305.GC4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

When displaying:

  printf( "The number is %.3f\n", $number );

When making a string:

  my $string = sprintf( "%.3f", $number );


BTW, this is cutting, not rounding.


Cheers,
Andreas


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------

From t.nugent at cs.ucl.ac.uk  Mon Nov  5 07:37:15 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 05 Nov 2007 12:37:15 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F0E7B.60303@cs.ucl.ac.uk>

Use Math:Round and nearest_ceil:

http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
>   

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From bix at sendu.me.uk  Mon Nov  5 07:47:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 12:47:17 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F10D5.5060006@sendu.me.uk>

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

Please don't use this list to ask general Perl questions.
See these instead:

http://perldoc.perl.org/perlfaq4.html
http://lists.cpan.org/
http://www.perlmonks.org/


$rounded = sprintf("%.3f", $number);

From Marc.Logghe at DEVGEN.com  Mon Nov  5 07:39:36 2007
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Mon, 5 Nov 2007 13:39:36 +0100
Subject: [Bioperl-l] perl question
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com>

Hi,
Have a look at
http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w
idth

In your particular case:
my $f = 3.1232010098;
printf "%0.3f", $f;


HTH,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> neeti somaiya
> Sent: Monday, November 05, 2007 1:20 PM
> To: bioperl-l
> Subject: [Bioperl-l] perl question
> 
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 
> decimal places in perl?
> 
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From bix at sendu.me.uk  Mon Nov  5 08:24:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 13:24:25 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <20071105124305.GC4491@ebi.ac.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk>
Message-ID: <472F1989.90105@sendu.me.uk>

Andreas Kahari wrote:
> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
>> Again a perl question, and maybe a very trivial one.
>> How do I terminate a number like 3.1232010098 to only 3 decimal places in
>> perl?
> 
> When displaying:
> 
>   printf( "The number is %.3f\n", $number );
> 
> When making a string:
> 
>   my $string = sprintf( "%.3f", $number );
> 
> 
> BTW, this is cutting, not rounding.

(s)printf rounds (ie. doesn't simply truncate), though for critical 
applications you should use your own rounding algorithm.


From ak at ebi.ac.uk  Mon Nov  5 08:56:24 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 13:56:24 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <472F1989.90105@sendu.me.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk>
Message-ID: <20071105135624.GD4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote:
> Andreas Kahari wrote:
> > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> >> Again a perl question, and maybe a very trivial one.
> >> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> >> perl?
> > 
> > When displaying:
> > 
> >   printf( "The number is %.3f\n", $number );
> > 
> > When making a string:
> > 
> >   my $string = sprintf( "%.3f", $number );
> > 
> > 
> > BTW, this is cutting, not rounding.
> 
> (s)printf rounds (ie. doesn't simply truncate), though for critical 
> applications you should use your own rounding algorithm.

They do indeed.  Mea culpa.


Andreas

-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------

From jay at jays.net  Mon Nov  5 10:14:17 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 10:14:17 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net>

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
> To the other devs: shouldn't -nosort be the default behavior when the
> split location is a 'join'?

I certainly think so.

> In other words, should spliced_seq() be
> modified to take into account the split location type when returning
> sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly
> indicates the order of the sequences is important when joined
> together; the current behavior is more like that for 'order'.

I don't see any value to the sorting algorithm. All tests invoke - 
nosort => 1 (except a phase test where nosort doesn't matter anyway).  
In my limited experience the sorting only serves to break real-world  
splicing.

If there is no valid use then we can remove ~20 lines from  
SeqFeatureI.pm circa line 505. If there is a valid use and someone  
would be so kind as to educate me I'd be happy to add tests which  
demonstrate them.  :)

P.S.  CSHL is neato. I plan on understanding some of this stuff some  
day.  :)

j
http://www.bioperl.org/wiki/User:Jhannah


From hlapp at duke.edu  Mon Nov  5 11:03:16 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 11:03:16 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>

I agree that there should be a meaningful default that results in  
"doing the right thing" in most cases if the user doesn't intervene.  
I'm not sure I understand all the details, but it sounds sorting or  
not sorting should depend on the split location type unless the user  
overrides it by argument. That's what you're suggesting, right?

	-hilmar

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:

> Pass in (-nosort => 1) to spliced_seq:
>
> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>
> This ensures no sorting of sublocations occurs, if you want for  
> instance typical GenBank/EMBL 'join' behavior.
>
> To the other devs: shouldn't -nosort be the default behavior when  
> the split location is a 'join'?  In other words, should spliced_seq 
> () be modified to take into account the split location type when  
> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'  
> explicitly indicates the order of the sequences is important when  
> joined together; the current behavior is more like that for 'order'.
>
> chris
>
> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>
>> Hi to all.
>>
>> I have a problem with a simplest script:
>>
>>
>>
>>          use Bio::SeqIO;
>>          # get command-line arguments, or die with a usage statement
>>          my $usage = "x2y.pl infile infileformat outfile  
>> outfileformat\n";
>>          my $infile = shift or die $usage;
>>          my $infileformat = shift or die $usage;
>> #         my $outfile = shift or die $usage;
>>          my $outfileformat = shift or die $usage;
>>
>>          # create one SeqIO object to read in,and another to write  
>> out
>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>                                       '-format' => $infileformat);
>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>                                        '-format' => $outfileformat);
>>
>>          # write each entry in the input file to the output file
>>          while (my $inseq = $seq_in->next_seq) {
>>
>> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>>
>> for my $feat_object ($inseq->get_SeqFeatures)
>>     {
>>     if ($feat_object->primary_tag eq "CDS")
>>         {
>>         print $feat_object->get_tag_values('product'),"\n";
>>         print
>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>         print $feat_object->spliced_seq->seq,"\n\n";
>>         }
>>     }
>>
>>
>>
>> The result seems OK to me, but in case of first CDS of  
>> NC_005213.gbk from
>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ 
>> Nanoarchaeum_equitans/> the
>> output is wrong:
>>
>> It is:
>> hypothetical protein
>> 1..490885
>> TAAATGCGATTGCTATTAGAA..................................Truncated
>> sequence...................................
>>
>> Should be:
>> hypothetical protein
>> 879..490883
>> ATGCGATTGCTATTAGAA...................................Truncated
>> sequence....................................TAA
>>
>>
>>
>> This CDS have an unnatural location string:
>> CDS             complement(join(490883..490885,1..879)), but  
>> spliced_seq
>> should handle these things?
>>
>> Please help me!
>> Best regards, N.
>> _______________________________________________
>>
>
>
>


From bernd.web at gmail.com  Mon Nov  5 11:53:01 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 5 Nov 2007 17:53:01 +0100
Subject: [Bioperl-l] PSI-BLAST
Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com>

Hi,

Is it possible with SearchIO to select a specific iteration (Results
from round i) part of the PSI-blast report, when parsing this with
SearchIO::blast?
It seems the parser parses the complete report. If not implemented I
could of course extract the specific part of the psi-blast report and
then give it too SearchIO (e.g. with IO::String), but maybe I am
missing a built-in option?


Regards,
Bernd

From jay at jays.net  Mon Nov  5 11:54:13 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 11:54:13 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>

On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?

If someone knows why spliced_seq() should ever sort then I'm  
suggesting we add a test demonstrating a useful example of that.

If no one has a useful example of when you would want spliced_seq()  
to sort then I'm suggesting we remove the sorting altogether and  
nosort goes away.

I can provide/add many examples where sorting is bad. I do not know  
of a case where sorting is good.

j
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Mon Nov  5 12:07:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Nov 2007 12:07:10 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>


At one point the location order was not respected/saved I believe. I  
guess we will just assume the user will build up a SplitLocation in  
order (i.e. add_SubLocation).  I'll try and remember if there were  
any other particular reasons.


-jason
On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar
>
> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>
>> Pass in (-nosort => 1) to spliced_seq:
>>
>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>
>> This ensures no sorting of sublocations occurs, if you want for
>> instance typical GenBank/EMBL 'join' behavior.
>>
>> To the other devs: shouldn't -nosort be the default behavior when
>> the split location is a 'join'?  In other words, should spliced_seq
>> () be modified to take into account the split location type when
>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>> explicitly indicates the order of the sequences is important when
>> joined together; the current behavior is more like that for 'order'.
>>
>> chris
>>
>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>
>>> Hi to all.
>>>
>>> I have a problem with a simplest script:
>>>
>>>
>>>
>>>          use Bio::SeqIO;
>>>          # get command-line arguments, or die with a usage statement
>>>          my $usage = "x2y.pl infile infileformat outfile
>>> outfileformat\n";
>>>          my $infile = shift or die $usage;
>>>          my $infileformat = shift or die $usage;
>>> #         my $outfile = shift or die $usage;
>>>          my $outfileformat = shift or die $usage;
>>>
>>>          # create one SeqIO object to read in,and another to write
>>> out
>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>                                       '-format' => $infileformat);
>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>                                        '-format' => $outfileformat);
>>>
>>>          # write each entry in the input file to the output file
>>>          while (my $inseq = $seq_in->next_seq) {
>>>
>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>> needed
>>>
>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>     {
>>>     if ($feat_object->primary_tag eq "CDS")
>>>         {
>>>         print $feat_object->get_tag_values('product'),"\n";
>>>         print
>>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>         }
>>>     }
>>>
>>>
>>>
>>> The result seems OK to me, but in case of first CDS of
>>> NC_005213.gbk from
>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>> Nanoarchaeum_equitans/> the
>>> output is wrong:
>>>
>>> It is:
>>> hypothetical protein
>>> 1..490885
>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>> sequence...................................
>>>
>>> Should be:
>>> hypothetical protein
>>> 879..490883
>>> ATGCGATTGCTATTAGAA...................................Truncated
>>> sequence....................................TAA
>>>
>>>
>>>
>>> This CDS have an unnatural location string:
>>> CDS             complement(join(490883..490885,1..879)), but
>>> spliced_seq
>>> should handle these things?
>>>
>>> Please help me!
>>> Best regards, N.
>>> _______________________________________________
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Mon Nov  5 12:16:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:16:10 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>

Yes, we would sort based on the splittype() and default to a  
particular behavior ('join') if one isn't designated, maybe with a  
warning indicating the splittype() isn't defined.  Using an 'order'  
or other defined types could also delineate a default sort/nosort  
behavior (probably the previous as it would replicate prior behavior).

chris

On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar


From cjfields at uiuc.edu  Mon Nov  5 12:20:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:20:35 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu>


On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote:

> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>
> If someone knows why spliced_seq() should ever sort then I'm
> suggesting we add a test demonstrating a useful example of that.
>
> If no one has a useful example of when you would want spliced_seq()
> to sort then I'm suggesting we remove the sorting altogether and
> nosort goes away.
>
> I can provide/add many examples where sorting is bad. I do not know
> of a case where sorting is good.
>
> j
> http://www.bioperl.org/wiki/User:Jhannah

The behavior would be based on the current use of 'join', 'order',  
and 'bond' (the latter in GenPept records).  I documented some cases  
here a while back:

http://www.bioperl.org/wiki/BioPerl_Locations#Split

chris

From hlapp at duke.edu  Mon Nov  5 12:32:24 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 12:32:24 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu>

Sounds good to me. -hilmar

On Nov 5, 2007, at 12:16 PM, Chris Fields wrote:

> Yes, we would sort based on the splittype() and default to a  
> particular behavior ('join') if one isn't designated, maybe with a  
> warning indicating the splittype() isn't defined.  Using an 'order'  
> or other defined types could also delineate a default sort/nosort  
> behavior (probably the previous as it would replicate prior behavior).
>
> chris
>
> On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 12:41:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:41:27 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
Message-ID: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>

It may have something to do with remote locations or setting strand()  
in sublocations.  This may have popped up in relation to a LocationI  
code audit I proposed a while back on the list which I never got  
around to.  Oh well...

I at least managed getting a wiki page started in case we decided to  
make changes, with the intention of making it a HOWTO at some point:

http://www.bioperl.org/wiki/BioPerl_Locations

If we go through with the changes to spliced_seq(), should it be  
implemented for inclusion in v1.6 or wait until v1.7?

chris

On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote:

>
> At one point the location order was not respected/saved I believe.  
> I guess we will just assume the user will build up a SplitLocation  
> in order (i.e. add_SubLocation).  I'll try and remember if there  
> were any other particular reasons.
>
>
> -jason
> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>>
>> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>>
>>> Pass in (-nosort => 1) to spliced_seq:
>>>
>>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>>
>>> This ensures no sorting of sublocations occurs, if you want for
>>> instance typical GenBank/EMBL 'join' behavior.
>>>
>>> To the other devs: shouldn't -nosort be the default behavior when
>>> the split location is a 'join'?  In other words, should spliced_seq
>>> () be modified to take into account the split location type when
>>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>>> explicitly indicates the order of the sequences is important when
>>> joined together; the current behavior is more like that for 'order'.
>>>
>>> chris
>>>
>>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>>
>>>> Hi to all.
>>>>
>>>> I have a problem with a simplest script:
>>>>
>>>>
>>>>
>>>>          use Bio::SeqIO;
>>>>          # get command-line arguments, or die with a usage  
>>>> statement
>>>>          my $usage = "x2y.pl infile infileformat outfile
>>>> outfileformat\n";
>>>>          my $infile = shift or die $usage;
>>>>          my $infileformat = shift or die $usage;
>>>> #         my $outfile = shift or die $usage;
>>>>          my $outfileformat = shift or die $usage;
>>>>
>>>>          # create one SeqIO object to read in,and another to write
>>>> out
>>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>>                                       '-format' => $infileformat);
>>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>>                                        '-format' =>  
>>>> $outfileformat);
>>>>
>>>>          # write each entry in the input file to the output file
>>>>          while (my $inseq = $seq_in->next_seq) {
>>>>
>>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>>> needed
>>>>
>>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>>     {
>>>>     if ($feat_object->primary_tag eq "CDS")
>>>>         {
>>>>         print $feat_object->get_tag_values('product'),"\n";
>>>>         print
>>>> $feat_object->location->start,"..",$feat_object->location- 
>>>> >end,"\n";
>>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>
>>>> The result seems OK to me, but in case of first CDS of
>>>> NC_005213.gbk from
>>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>>> Nanoarchaeum_equitans/> the
>>>> output is wrong:
>>>>
>>>> It is:
>>>> hypothetical protein
>>>> 1..490885
>>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>>> sequence...................................
>>>>
>>>> Should be:
>>>> hypothetical protein
>>>> 879..490883
>>>> ATGCGATTGCTATTAGAA...................................Truncated
>>>> sequence....................................TAA
>>>>
>>>>
>>>>
>>>> This CDS have an unnatural location string:
>>>> CDS             complement(join(490883..490885,1..879)), but
>>>> spliced_seq
>>>> should handle these things?
>>>>
>>>> Please help me!
>>>> Best regards, N.
>>>> _______________________________________________
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Mon Nov  5 11:05:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 05 Nov 2007 12:05:41 -0400
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <472ED3CC.2050305@univ-brest.fr>
Message-ID: <C354B795.10231%bosborne11@verizon.net>

Jean-luc,

>From what you written it sounds like you're using bash and not some other
shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
in your home directory, as well as a .ncbirc file. This should work.

I'm no Unix expert but I've always configured tcsh on the Mac in the same
ways I'd configure it on Linux machines. Similarly, if you're using bash
then it will read its .bashrc file, regardless of what flavor of Unix you
use (and the same thing holds true for zsh or csh or ...).

Brian O.


On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:

> Dear Bioperl and Mac users,
> 
> I am a Mac user and would like to run a script I made using
> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
> to Bioperl the pathway to Blastall and other executables.
> 
> I read carefully the following link
> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
> path to Blast, but I guess the way to proceed is slightly different in Mac and
> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
> .profile file instead of .bashrc?)
> 
> Actually, my blast file is in myname directory and comprises a /bin and  a
> /data file. I have got my blastall and other executables in
> myname/blast/bin/blastall.
> 
> Thank you in anticipation for your help.
> 
> Jean-Luc
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Nov  5 13:35:56 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 05 Nov 2007 12:35:56 -0600
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <C354B795.10231%bosborne11@verizon.net>
References: <C354B795.10231%bosborne11@verizon.net>
Message-ID: <472F628C.2000506@campus.iztacala.unam.mx>

If the ~/.bashrc file doesn't work for you, try renaming it to 
~/.bash_profile and re-login, that might work best.

~/.bashrc works as an individual per-interactive-shell startup file, 
whereas ~/.bash_profile is a personal initialization file, executed for 
login shells.

Hope this helps.

Regards,
Mauricio.


Brian Osborne wrote:
> Jean-luc,
> 
>>From what you written it sounds like you're using bash and not some other
> shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
> in your home directory, as well as a .ncbirc file. This should work.
> 
> I'm no Unix expert but I've always configured tcsh on the Mac in the same
> ways I'd configure it on Linux machines. Similarly, if you're using bash
> then it will read its .bashrc file, regardless of what flavor of Unix you
> use (and the same thing holds true for zsh or csh or ...).
> 
> Brian O.
> 
> 
> On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:
> 
>> Dear Bioperl and Mac users,
>>
>> I am a Mac user and would like to run a script I made using
>> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
>> to Bioperl the pathway to Blastall and other executables.
>>
>> I read carefully the following link
>> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
>> path to Blast, but I guess the way to proceed is slightly different in Mac and
>> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
>> .profile file instead of .bashrc?)
>>
>> Actually, my blast file is in myname directory and comprises a /bin and  a
>> /data file. I have got my blastall and other executables in
>> myname/blast/bin/blastall.
>>
>> Thank you in anticipation for your help.
>>
>> Jean-Luc
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at duke.edu  Mon Nov  5 16:04:11 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 16:04:11 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
Message-ID: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>


On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:

> If we go through with the changes to spliced_seq(), should it be  
> implemented for inclusion in v1.6 or wait until v1.7?

I would say they should be implemented ASAP because they 1) should  
not change behavior for those for which the current default behavior  
was already broken (and who therefore pass in --no_sort), and 2) fix  
the behavior for those who erroneously assumed that the code was  
going to do the right thing by default.

I.e., it sounds mostly like a bugfix to me. Am I overlooking something?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 17:12:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 16:12:23 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
	<EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu>


On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote:

>
> On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:
>
>> If we go through with the changes to spliced_seq(), should it be  
>> implemented for inclusion in v1.6 or wait until v1.7?
>
> I would say they should be implemented ASAP because they 1) should  
> not change behavior for those for which the current default  
> behavior was already broken (and who therefore pass in --no_sort),  
> and 2) fix the behavior for those who erroneously assumed that the  
> code was going to do the right thing by default.
>
> I.e., it sounds mostly like a bugfix to me. Am I overlooking  
> something?
>
> 	-hilmar
> -- 

Okay; I'll try to get this in soon.

chris

From jean-luc.jany at univ-brest.fr  Tue Nov  6 04:00:07 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Tue, 06 Nov 2007 10:00:07 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
Message-ID: <47302D17.2030500@univ-brest.fr>

Thanks Brian. Yes I use bash. I am going to follow your advice as soon 
as possible (for some reasons I am unable to run bioperl) and come back 
to you to tell you if it runs.
Jean-Luc

From jason at bioperl.org  Tue Nov  6 16:18:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 16:18:35 -0500
Subject: [Bioperl-l] lightweight sequence features
Message-ID: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>

I started a branch for implementing and playing with lightweight  
feature object. The branch is called 'lightweight_feature_branch'.

Right now it is about 70% faster just in object creation based on  
parsing features using Bio::Tools::GFF and swapping the types of  
features that are created.  It uses arrays instead of hashes under  
the hood.

So the objects don't have locations under the hood.  My hope is if  
this works okay we could use it for creating objects where we KNOW  
the underlying features have simple locations so such as parsing in  
GFF data.

-jason
--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Tue Nov  6 16:57:17 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Nov 2007 15:57:17 -0600
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>

Bravo!  I once benchmarked Location instance creation once and found  
it contributed quite a bit of overhead so the speedup with that and  
the use of arrays makes quite a bit of sense to me.

You mention only simple locations; I'm guessing this doesn't handle  
'fuzzy' ends?  If it did I could see layering the feature data from  
the get-go, so it could be used just about anywhere in the place of  
SF::Generic.  Maybe something to test out in 1.7?

chris

On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:

> I started a branch for implementing and playing with lightweight
> feature object. The branch is called 'lightweight_feature_branch'.
>
> Right now it is about 70% faster just in object creation based on
> parsing features using Bio::Tools::GFF and swapping the types of
> features that are created.  It uses arrays instead of hashes under
> the hood.
>
> So the objects don't have locations under the hood.  My hope is if
> this works okay we could use it for creating objects where we KNOW
> the underlying features have simple locations so such as parsing in
> GFF data.
>
> -jason
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Tue Nov  6 23:14:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 23:14:55 -0500
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
	<5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
Message-ID: <A021EE94-8DF8-467E-8303-E80127E3AEE2@bioperl.org>

Right - only for simple locations.  I've got a bunch more tests and  
fixes to put in.

I am hoping this can be fast replacement in the case where we're  
dealing with this "unflattened" data (i.e. GFF in FeatureIO &  
Gbrowse).  This is sort of a playground until I feel like it can  
really get  it tested a bit more.  I'll give an all clear when the  
dust settles in terms of the design if anyone wants to play/help.

-jason
On Nov 6, 2007, at 4:57 PM, Chris Fields wrote:

> Bravo!  I once benchmarked Location instance creation once and  
> found it contributed quite a bit of overhead so the speedup with  
> that and the use of arrays makes quite a bit of sense to me.
>
> You mention only simple locations; I'm guessing this doesn't handle  
> 'fuzzy' ends?  If it did I could see layering the feature data from  
> the get-go, so it could be used just about anywhere in the place of  
> SF::Generic.  Maybe something to test out in 1.7?
>
> chris
>
> On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:
>
>> I started a branch for implementing and playing with lightweight
>> feature object. The branch is called 'lightweight_feature_branch'.
>>
>> Right now it is about 70% faster just in object creation based on
>> parsing features using Bio::Tools::GFF and swapping the types of
>> features that are created.  It uses arrays instead of hashes under
>> the hood.
>>
>> So the objects don't have locations under the hood.  My hope is if
>> this works okay we could use it for creating objects where we KNOW
>> the underlying features have simple locations so such as parsing in
>> GFF data.
>>
>> -jason
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From heikki at sanbi.ac.za  Wed Nov  7 05:05:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Nov 2007 12:05:59 +0200
Subject: [Bioperl-l] Bio::Tools::Run::Mdust
Message-ID: <200711071205.59576.heikki@sanbi.ac.za>

Hi Donald,

I started using your Mdust module in bioperl-run and run into problems 
immediately.

* Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects,
  although the docs say otherwise
* Sequences are modified in place. That is really bad, because that 
  means that the user has to know to create a copy before 
  running Mdust on it.
* The docs say that you have to set MDUSTDIR envvar to tell the program 
  where to find the binary. That is actually optional if the 
  binary is on your path.
* The tests do not cover any of the options to the program


As a quick fix, I suggest that we:

* leave the current way of working for Bio::SeqI objects:
  sequence string is not masked but seqfeatures to that effect are added
* Modify run() to return the new masked sequence object when 
  the target is a Bio::PrimarySeqI.
* fix the documentation


After that it will be possible to simply write:

use Bio::Tools::Run::Mdust;
$mdust = Bio::Tools::Run::Mdust->new();
$seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI);


Are you happy for me to do this or do you want to do it yourself?


Yours,
	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    
    _/_/_/_/_/  heikki at_sanbi _ac _za    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From Kevin.M.Brown at asu.edu  Wed Nov  7 13:04:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 7 Nov 2007 11:04:50 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>

I installed bioperl-ext from CVS, but can't figure out what else is
missing to utilize Bio::Tools::pSW.  The error I get from the example
script in the wiki is:

The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align)
has not been installed.
 Please read the install the bioperl-ext package

BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
Compilation failed in require at ./align_test.pl line 3.
BEGIN failed--compilation aborted at ./align_test.pl line 3.

In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
Align, but no Align.pm file.

I followed the directions in the wiki to install 1.5.2_102 (think I had
_100 installed previously).  Any thoughts on what I'm missing?


From jason at bioperl.org  Wed Nov  7 14:52:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 14:52:16 -0500
Subject: [Bioperl-l] (no subject)
Message-ID: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>

The array-based Bio::SeqFeature::Slim is only about 7% faster than  
Bio::Graphics::Feature so I suspect most of the speedup comes from  
removing location objects.

Generic     6.75        --      -37%      -41%
GraphicsF   4.26       58%        --       -7%
Slim        3.98       70%        7%        --

this is using code on the lightweight_feature_branch so
cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r  
lightweight_feature_branch -d core_lwf bioperl-live

http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl
and the GFF3 file I used to parse
http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2

-jason

From lstein at cshl.edu  Wed Nov  7 15:04:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Nov 2007 15:04:24 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>

I wonder if it is worth moving to the array-based version more generally,
then.

How does the array based feature object deal with tags?

Lincoln

On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:

> The array-based Bio::SeqFeature::Slim is only about 7% faster than
> Bio::Graphics::Feature so I suspect most of the speedup comes from removing
> location objects.
>
> Generic     6.75        --      -37%      -41%
> GraphicsF   4.26       58%        --       -7%
> Slim        3.98       70%        7%        --
>
> this is using code on the lightweight_feature_branch so
> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
> lightweight_feature_branch -d core_lwf bioperl-live
>
> http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/seqfeature_speed.pl>
> and the GFF3 file I used to parse
> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>
> -jason
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From jason at bioperl.org  Wed Nov  7 15:09:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 15:09:35 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>

It uses hashes there so technically it is not entirely array based.

-jason
On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:

> I wonder if it is worth moving to the array-based version more  
> generally,
> then.
>
> How does the array based feature object deal with tags?
>
> Lincoln
>
> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>
>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>> Bio::Graphics::Feature so I suspect most of the speedup comes from  
>> removing
>> location objects.
>>
>> Generic     6.75        --      -37%      -41%
>> GraphicsF   4.26       58%        --       -7%
>> Slim        3.98       70%        7%        --
>>
>> this is using code on the lightweight_feature_branch so
>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>> lightweight_feature_branch -d core_lwf bioperl-live
>>
>> http://jason.open-bio.org/~jason/bioperl/ 
>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/ 
>> seqfeature_speed.pl>
>> and the GFF3 file I used to parse
>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http:// 
>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>
>> -jason
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Nov  7 16:12:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 15:12:35 -0600
Subject: [Bioperl-l] (no subject)
In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>

I can see preferring a lightweight simple SF over SF::Generic in the  
next BioPerl dev cycle.  I guess we would just layer split locations  
as simple sub-features/segments, typing when necessary?  That  
shouldn't be much more overhead than creating a layered Location::Split.

chris

On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:

> It uses hashes there so technically it is not entirely array based.
>
> -jason
> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>
>> I wonder if it is worth moving to the array-based version more
>> generally,
>> then.
>>
>> How does the array based feature object deal with tags?
>>
>> Lincoln
>>
>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>
>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>> removing
>>> location objects.
>>>
>>> Generic     6.75        --      -37%      -41%
>>> GraphicsF   4.26       58%        --       -7%
>>> Slim        3.98       70%        7%        --
>>>
>>> this is using code on the lightweight_feature_branch so
>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>
>>> http://jason.open-bio.org/~jason/bioperl/
>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>> seqfeature_speed.pl>
>>> and the GFF3 file I used to parse
>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>
>>> -jason
>>>
>>
>>
>>
>> -- 
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Wed Nov  7 18:19:15 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Nov 2007 18:19:15 -0500
Subject: [Bioperl-l] lightweight features
In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
Message-ID: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>

It seems to me that there are applications where you're dealing with  
a huge number of features (such as GFF) and where therefore a  
lightweight object makes tremendous sense. But when you parse a  
genbank file, I'm not sure that's the bottleneck, unless maybe it's a  
large contig with lots of feature annotations.

I guess we'll ultimately want a way to control the type of feature  
being instantiated by a parser, e..g using a factory.

	-hilmar

On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:

> I can see preferring a lightweight simple SF over SF::Generic in the
> next BioPerl dev cycle.  I guess we would just layer split locations
> as simple sub-features/segments, typing when necessary?  That
> shouldn't be much more overhead than creating a layered  
> Location::Split.
>
> chris
>
> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>
>> It uses hashes there so technically it is not entirely array based.
>>
>> -jason
>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>
>>> I wonder if it is worth moving to the array-based version more
>>> generally,
>>> then.
>>>
>>> How does the array based feature object deal with tags?
>>>
>>> Lincoln
>>>
>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>
>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>> removing
>>>> location objects.
>>>>
>>>> Generic     6.75        --      -37%      -41%
>>>> GraphicsF   4.26       58%        --       -7%
>>>> Slim        3.98       70%        7%        --
>>>>
>>>> this is using code on the lightweight_feature_branch so
>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>
>>>> http://jason.open-bio.org/~jason/bioperl/
>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>> seqfeature_speed.pl>
>>>> and the GFF3 file I used to parse
>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>
>>>> -jason
>>>>
>>>
>>>
>>>
>>> -- 
>>> Lincoln D. Stein
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724
>>> (516) 367-8380 (voice)
>>> (516) 367-8389 (fax)
>>> FOR URGENT MESSAGES & SCHEDULING,
>>> PLEASE CONTACT MY ASSISTANT,
>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Nov  7 20:04:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 19:04:05 -0600
Subject: [Bioperl-l] lightweight features
In-Reply-To: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
	<D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
Message-ID: <E541C60D-6741-4923-A71D-E14CE6FE176D@uiuc.edu>

I'm also thinking a factory is a good possibility; maybe something to  
take the place of FTHelper.

chris

On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote:

> It seems to me that there are applications where you're dealing with
> a huge number of features (such as GFF) and where therefore a
> lightweight object makes tremendous sense. But when you parse a
> genbank file, I'm not sure that's the bottleneck, unless maybe it's a
> large contig with lots of feature annotations.
>
> I guess we'll ultimately want a way to control the type of feature
> being instantiated by a parser, e..g using a factory.
>
> 	-hilmar
>
> On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:
>
>> I can see preferring a lightweight simple SF over SF::Generic in the
>> next BioPerl dev cycle.  I guess we would just layer split locations
>> as simple sub-features/segments, typing when necessary?  That
>> shouldn't be much more overhead than creating a layered
>> Location::Split.
>>
>> chris
>>
>> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>>
>>> It uses hashes there so technically it is not entirely array based.
>>>
>>> -jason
>>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>>
>>>> I wonder if it is worth moving to the array-based version more
>>>> generally,
>>>> then.
>>>>
>>>> How does the array based feature object deal with tags?
>>>>
>>>> Lincoln
>>>>
>>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>>
>>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>>> removing
>>>>> location objects.
>>>>>
>>>>> Generic     6.75        --      -37%      -41%
>>>>> GraphicsF   4.26       58%        --       -7%
>>>>> Slim        3.98       70%        7%        --
>>>>>
>>>>> this is using code on the lightweight_feature_branch so
>>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl  
>>>>> co -r
>>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>>
>>>>> http://jason.open-bio.org/~jason/bioperl/
>>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>>> seqfeature_speed.pl>
>>>>> and the GFF3 file I used to parse
>>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>>
>>>>> -jason
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Lincoln D. Stein
>>>> Cold Spring Harbor Laboratory
>>>> 1 Bungtown Road
>>>> Cold Spring Harbor, NY 11724
>>>> (516) 367-8380 (voice)
>>>> (516) 367-8389 (fax)
>>>> FOR URGENT MESSAGES & SCHEDULING,
>>>> PLEASE CONTACT MY ASSISTANT,
>>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov  7 23:45:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 22:45:26 -0600
Subject: [Bioperl-l] test please ignore
Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>


From cjfields at uiuc.edu  Thu Nov  8 10:50:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Nov 2007 09:50:02 -0600
Subject: [Bioperl-l] test please ignore
In-Reply-To: <47332534.5090205@bms.com>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
	<47332534.5090205@bms.com>
Message-ID: <D0ADF51D-92BE-4645-BB1C-564536732368@uiuc.edu>

And respond back!  Just checking the mail list; the open-bio wiki  
pages were down last night.

chris

On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote:

> Chris Fields wrote:
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> This is the best way to make everyone open this e-mail ;-)
> Stefan


From stefan.kirov at bms.com  Thu Nov  8 10:03:16 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 08 Nov 2007 10:03:16 -0500
Subject: [Bioperl-l] test please ignore
In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
Message-ID: <47332534.5090205@bms.com>

Chris Fields wrote:
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
This is the best way to make everyone open this e-mail ;-)
Stefan

From Kevin.M.Brown at asu.edu  Thu Nov  8 17:30:24 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Nov 2007 15:30:24 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org>
References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>
	<20071108003638.GA5892@eniac.jgi-psf.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu>

OK, found the issue.  For whatever reason the Align.pm file is inside
the Align folder and so the package name and path don't match up once it
is installed.  This would cause it to have a name of
"Bio::Ext::Align::Align" instead of "Bio::Ext::Align".  Not sure why
this wasn't caught when I did "perl Makefile.pl && make && make test &&
make install" 

> -----Original Message-----
> From: Joel Martin [mailto:j_martin at lbl.gov] 
> Sent: Wednesday, November 07, 2007 5:37 PM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Bio::Ext::Align?
> 
> Hello,
>     Might be a side effect of fixing the other bioperl-ext package, 
> what steps exactly did this entail:
> 
> > I installed bioperl-ext from CVS, 
> 
> ?
> 
> you can probably bypass it at the moment by doing this after 
> unpacking the
> bioperl-ext package 
> 
> cd Bio/Ext/Align
> perl Makefile.PL
> make
> make test
> make install
> 
> and
> 
> cd Bio/Ext/HMM
> perl Makefile.PL
> make 
> make test
> make install
> 
> Joel
> 
> but can't figure out what else is
> > missing to utilize Bio::Tools::pSW.  The error I get from 
> the example
> > script in the wiki is:
> > 
> > The C-compiled engine for Smith Waterman alignments 
> (Bio::Ext::Align)
> > has not been installed.
> >  Please read the install the bioperl-ext package
> > 
> > BEGIN failed--compilation aborted at
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
> > Compilation failed in require at ./align_test.pl line 3.
> > BEGIN failed--compilation aborted at ./align_test.pl line 3.
> > 
> > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
> > Align, but no Align.pm file.
> > 
> > I followed the directions in the wiki to install 1.5.2_102 
> (think I had
> > _100 installed previously).  Any thoughts on what I'm missing?
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From akarger at CGR.Harvard.edu  Fri Nov  9 09:53:02 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 9 Nov 2007 09:53:02 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
Message-ID: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>

When I tblastn ENSP00000349467 against the human genome, I get a few
hits on chr10, among which are:


 Score =  192 bits (487), Expect(2) = 5e-64
 Identities = 99/109 (90%), Positives = 99/109 (90%)
 Frame = +2

Query: 40
LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99
                L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F
VFDKDGNG
Sbjct: 71593562
LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741

Query: 100      YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148
                YIS  EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA
Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA
71593885


 Score = 75.1 bits (183), Expect(2) = 5e-64
 Identities = 36/43 (83%), Positives = 39/43 (90%)
 Frame = +1

Query: 1        MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43
                MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS  ++
Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575


As you can see from Sbjct lines, these two hits are basically
contiguous.
I was surprised to see that the bit scores and identities and alignment
lengths here are totally different but the expectation values are
identical. 

After a bit of grepping in the BLAST source, I found reference to "sum
segments" and "a collection [of] multiple distinct alignments with
asymmetric gaps between the alignments" and decided it was time to cry
for help. When does BLAST decide that two or more alignments belong
"together" and how does the affect the evalue? Is the evalue really
showing how good those two alignments combined are, despite the frame
shift? (It so happens that that's what I want.)

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

Thanks,

- Amir Karger
Research Computing
Life Sciences Division
Harvard University


From cjfields at uiuc.edu  Fri Nov  9 12:58:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Nov 2007 11:58:16 -0600
Subject: [Bioperl-l] GFF3loader and indexing
Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu>

Quick question: shouldn't the new Index attribute be passed on to  
seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping  
purposes (for instance, properly reloading dumped gff3 data)?  I'm  
testing out a feature editor using volvox.gff3 data in GBrowse and  
the mRNA features appear to drop this attribute once loaded:

Original data:

ctgA	example	gene	1050	9000	.	+	.	ID=EDEN;Name=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	ID=EDEN.1;Parent=EDEN;Name=EDEN. 
1;Note=Eden splice form 1;Index=1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=EDEN.1

partial gff3_string(1) output:

ctgA	example	gene	1050	9000	.	+	.	 
Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	Name=EDEN. 
1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=51;ID=52
...

chris

From David.Messina at sbc.su.se  Sat Nov 10 06:04:25 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 10 Nov 2007 12:04:25 +0100
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>

Hi Amir,

I don't have my BLAST book handy, and my memory is a little fuzzy, but I
think the Expect(2) you're seeing is the E-value based on both HSPs
combined. And I think this is why you see the same Expect value for both --
because it is shared between them (which sounds like what you wanted).

Again, this is just from memory, but I think this is an option that has to
be turned on rather than something which Blast decides to do on its own.


I don't know whether BioPerl reports this or not. Would you mind e-mailing
me a entire BLAST report as a sample? When I have some time I'd like to play
around with this a bit.

Thanks,
Dave

From sac at bioperl.org  Sat Nov 10 17:59:28 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Nov 2007 14:59:28 -0800
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>

The Bioperl blast parser should extract that value and you can obtain
it from an HSP object, via the HSPI::n() method, documented here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23

Dave's basically correct in his explanation. It's a result of the
application of sum statistics by the blast algorithm. You can read all
about it in Korf et al's BLAST book. Here's the relevant section:

http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1

Steve

On Nov 10, 2007 3:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Amir,
>
> I don't have my BLAST book handy, and my memory is a little fuzzy, but I
> think the Expect(2) you're seeing is the E-value based on both HSPs
> combined. And I think this is why you see the same Expect value for both --
> because it is shared between them (which sounds like what you wanted).
>
> Again, this is just from memory, but I think this is an option that has to
> be turned on rather than something which Blast decides to do on its own.
>
>
> I don't know whether BioPerl reports this or not. Would you mind e-mailing
> me a entire BLAST report as a sample? When I have some time I'd like to play
> around with this a bit.
>
> Thanks,
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From bernd.web at gmail.com  Tue Nov 13 06:57:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 13 Nov 2007 12:57:04 +0100
Subject: [Bioperl-l] Panel link
Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com>

Hi,

Is it possible with Panel to provide javascript event handlers?
With -link we can provide hrefs as:
  -link => 'http://www.google.com/search?q=$description'
or use a coderef that returns a href.

However, I'd like to set-up links as:
<area .... href="#id" onmouseover="function()" onmouseout="function()">

Is this possible by default with Panel?

Regards,
Bernd

From akarger at CGR.Harvard.edu  Tue Nov 13 12:12:32 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:12:32 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A0165@huls5.nucleus.harvard.edu>

Thanks for the reply. I'm curious as to how BLAST decides to do this,
but not curious enough to buy the BLAST book.

If you want to see this, you could just tblastn the ENSP00000349467
sequence vs. the genome:
 
MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG
NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE
EVDEMIREADIDGDGQVNYEEFVQMMTAK
against the human genome at NCBI or locally.
 
I've attached the tblastn report for that protein, which includes the
results I quoted. (It was done as part of a blast of 150 proteins vs.
the genome.)
 
-Amir


________________________________

	From: dave at davemessina.com [mailto:dave at davemessina.com] On
Behalf Of Dave Messina
	Sent: Saturday, November 10, 2007 6:04 AM
	To: Amir Karger
	Cc: bioperl-l
	Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast
result?
	
	
	Hi Amir,

	I don't have my BLAST book handy, and my memory is a little
fuzzy, but I think the Expect(2) you're seeing is the E-value based on
both HSPs combined. And I think this is why you see the same Expect
value for both -- because it is shared between them (which sounds like
what you wanted). 

	Again, this is just from memory, but I think this is an option
that has to be turned on rather than something which Blast decides to do
on its own.

	 
	I don't know whether BioPerl reports this or not. Would you mind
e-mailing me a entire BLAST report as a sample? When I have some time
I'd like to play around with this a bit.

	Thanks,
	Dave


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ENSP00000349467_tblastn.txt.gz
Type: application/x-gzip
Size: 9755 bytes
Desc: ENSP00000349467_tblastn.txt.gz
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071113/f8853e76/attachment.gz 

From akarger at CGR.Harvard.edu  Tue Nov 13 12:30:52 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:30:52 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>

> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf 
> Of Steve Chervitz
> 
> The Bioperl blast parser should extract that value and you can obtain
> it from an HSP object, via the HSPI::n() method, documented here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
io/Search/HSP/HSPI.html#POD23

As I mentioned in my email:

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

And the docs for n() actually say, "This value is not defined with NCBI
Blast2 with gapping" although they don't say why. Which may explain why,
when I ran the following code on the blast result I included in my last
email, I got empty values for all of the n's. (Why is n() undefined for
gapped blast if I'm getting n's in my results from that blast?)

use warnings;
use strict;
use Bio::SearchIO;

my $blast_out = $ARGV[0];
my $in = new Bio::SearchIO(-format => 'blast',
                            -file   => $blast_out,
                            -report_type => 'tblastn');

print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
Evalue)), "\n";
while(my $query = $in->next_result) {
    while(my $subject = $query->next_hit) {
        while (my $hsp = $subject->next_hsp) {
            print join("\t",
                $query->query_name,
                $hsp->start("query"),
                $hsp->end("query"),
                $hsp->strand("hit"),
                $subject->name,
                $hsp->start("hit"),
                $hsp->end("hit"),
                $subject->frame,
                $hsp->n,
                $hsp->evalue,
            ),"\n";
        }
    }
}

> Dave's basically correct in his explanation. It's a result of the
> application of sum statistics by the blast algorithm. You can read all
> about it in Korf et al's BLAST book. Here's the relevant section:

[snip]

Thanks,

-Amir


From cjfields at uiuc.edu  Tue Nov 13 12:42:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Nov 2007 11:42:07 -0600
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
	<B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>

Amir,

Can you file this as a bug?  Dave mentioned he would look into it but  
I think it warrants tracking to make sure it gets fixed:

http://www.bioperl.org/wiki/Bugs

Attach the example BLAST report from your last post to the report.   
BTW, I wonder how this appears in XML output?

chris

On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:

>> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
>> Of Steve Chervitz
>>
>> The Bioperl blast parser should extract that value and you can obtain
>> it from an HSP object, via the HSPI::n() method, documented here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/Search/HSP/HSPI.html#POD23
>
> As I mentioned in my email:
>
> And does anyone know off-hand if Bioperl will tell me when situations
> like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> subroutine
> would help, but I just get a bunch of empty strings for that,  
> whether or
> not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> {"_n"} is
> undef.)
>
> And the docs for n() actually say, "This value is not defined with  
> NCBI
> Blast2 with gapping" although they don't say why. Which may explain  
> why,
> when I ran the following code on the blast result I included in my  
> last
> email, I got empty values for all of the n's. (Why is n() undefined  
> for
> gapped blast if I'm getting n's in my results from that blast?)
>
> use warnings;
> use strict;
> use Bio::SearchIO;
>
> my $blast_out = $ARGV[0];
> my $in = new Bio::SearchIO(-format => 'blast',
>                             -file   => $blast_out,
>                             -report_type => 'tblastn');
>
> print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
> Evalue)), "\n";
> while(my $query = $in->next_result) {
>     while(my $subject = $query->next_hit) {
>         while (my $hsp = $subject->next_hsp) {
>             print join("\t",
>                 $query->query_name,
>                 $hsp->start("query"),
>                 $hsp->end("query"),
>                 $hsp->strand("hit"),
>                 $subject->name,
>                 $hsp->start("hit"),
>                 $hsp->end("hit"),
>                 $subject->frame,
>                 $hsp->n,
>                 $hsp->evalue,
>             ),"\n";
>         }
>     }
> }
>
>> Dave's basically correct in his explanation. It's a result of the
>> application of sum statistics by the blast algorithm. You can read  
>> all
>> about it in Korf et al's BLAST book. Here's the relevant section:
>
> [snip]
>
> Thanks,
>
> -Amir
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lskatz at gatech.edu  Tue Nov 13 20:27:45 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Tue, 13 Nov 2007 20:27:45 -0500
Subject: [Bioperl-l] chromatogram
Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>

Hi,
I would like to know how to draw a chromatogram file.  Does anyone
have any sample code where you read in an scf file and create a jpeg
or other image file?
For that matter, I want to be able to customize these images with base
calls if possible.  I really appreciate the help, so thanks!

-- 
Lee Katz

From mvrmakam at yahoo.com  Wed Nov 14 04:52:13 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST)
Subject: [Bioperl-l] Installing Bioperl on Windows XP
Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com>

Hi,

I am encountering a problem while installing Bioperl on Windows XP.  I have installed ActivePerl version 5.8.8.822.  I am using Perl Package Manager GUI.  Also, I am following the instructions outlined for installing Bioperl on Windows.  I am getting an error.  The error is as follows:

 Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com')

I do not know how to overcome this problem.  The other issue is when I type bioperl in the search box I do not see any packages of bioperl.  I do not know what the problem is.  If anyone of you could guide me through the installation process I would appreciate it.

Thanks,

Roshan


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/

From cjfields at uiuc.edu  Wed Nov 14 09:02:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Nov 2007 08:02:05 -0600
Subject: [Bioperl-l] Installing Bioperl on Windows XP
In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com>
References: <235423.72586.qm@web33703.mail.mud.yahoo.com>
Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu>

The instructions are pretty specific:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Note the section on adding new repositories.  As for the PPM  
connection error, it's more than likely an error with the default  
address but it isn't bioperl-related; maybe answers lie here:

http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- 
faq2.html#ppm_repositories

chris

On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote:

> Hi,
>
> I am encountering a problem while installing Bioperl on Windows  
> XP.  I have installed ActivePerl version 5.8.8.822.  I am using  
> Perl Package Manager GUI.  Also, I am following the instructions  
> outlined for installing Bioperl on Windows.  I am getting an  
> error.  The error is as follows:
>
>  Downloading ActiveState Package Repository packlist ... failed 500  
> Can't connect to ppm4.activestate.com:80 (Bad hostname  
> 'ppm4.activestate.com')
>
> I do not know how to overcome this problem.  The other issue is  
> when I type bioperl in the search box I do not see any packages of  
> bioperl.  I do not know what the problem is.  If anyone of you  
> could guide me through the installation process I would appreciate it.
>
> Thanks,
>
> Roshan


From reshetovdenis at gmail.com  Wed Nov 14 12:28:40 2007
From: reshetovdenis at gmail.com (Denis Reshetov)
Date: Wed, 14 Nov 2007 20:28:40 +0300
Subject: [Bioperl-l] how to load all genomes
Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>

Dear BioPerl-db Creators,

I`m trying to load all genomes from NCBI ftp site
to my BioSql database using common script load_seqdatabase.pl

But it seems very slow. Let me know what is the better way to do it?

Thank you very much,

Denis.

From barry.moore at genetics.utah.edu  Wed Nov 14 14:18:29 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 14 Nov 2007 12:18:29 -0700
Subject: [Bioperl-l] how to load all genomes
In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu>

Denis,

You might be interested in this thread from a couple years ago.  I  
was having a similar problem, that I eventually resolved.   
Unfortunately the reason for the problem and the solution weren't  
entirely clear, but you may be able to glean some ideas from it.   
Also, you may have already done this, but I suggest searching the  
archives from this list because it seems like this comes up every now  
and then, so there may be other postings similar to the one I'm  
sending you that could help you.

http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html

Finally, if you are still having problems, you'll want to include a  
few more details about your situation.  What DB are you using, have  
you preloaded taxonomy data etc. How fast/slow are your sequences  
loading?

Barry

On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote:

> Dear BioPerl-db Creators,
>
> I`m trying to load all genomes from NCBI ftp site
> to my BioSql database using common script load_seqdatabase.pl
>
> But it seems very slow. Let me know what is the better way to do it?
>
> Thank you very much,
>
> Denis.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 14:57:49 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 08:57:49 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>

Here's my trace viewer.
Please excuse my dodgy Perl and debugging code as it's still under
development  :-)


Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz


------------------------------------------------------------------------
------------------

#!perl -w
use ABI;

use GD::Graph::lines;
use GD::Graph::colour;
use GD::Graph::Data;

use Data::Dumper;


use Getopt::Long;

use constant HEIGHT => 300;

GetOptions ('h|height=i' => \$HEIGHT,
            'f|file=s' => \$FILE,
            'o|out=s' => \$OUTFILE,
            'l|left=s' => \$LEFT_SEQ,
            'r|right=s' => \$RIGHT_SEQ,
            's|size=i' => \$SIZE,
            ) || die <<USAGE;
Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
test2.png -l actacgtacgta -r atgatcgtacgtac
or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
--out test2.png --left actacgtacgta --right atgatcgtacgtac

Options:
--height <pixels> Set height of image (${\HEIGHT} pixels default)
--file <trace file-name> Filename for the ABI trace file
--out <output file-name> Filename for the generated .png image
--left <left end sequence>
--right <right end sequence>
--size <size of clipped fasta sequence>

Parse an ABI trace file and render a PNG image.
See http://search.cpan.org/dist/ABI/ABI.pm
    or
    http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
USAGE

my $height = $HEIGHT || HEIGHT;
my $file = $FILE;
my $outfile = $OUTFILE;

my $abi = ABI->new(-file=> $file);

my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"

my @base_calls = $abi->get_base_calls(); # Get the base calls
my $sequence =$abi->get_sequence();
@bp = split(//, $sequence);


# iterate over array
$size = $abi->get_trace_length();
for ($i=0,$count = 0; $i<$size; $i++) {
     if(grep(/\b$i\b/, @base_calls)){
       $bases[$i] = $bp[$count];
       $count++;
     }else{
       $bases[$i] = ' ';
     }
}

# create the data. see GD::Graph::Data for details of the format
my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );

my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
   $graph->set(
   title => $abi->get_sample_name(),
#	y_max_value => $abi->get_max_trace() + 50,
	x_max_value => $abi->get_trace_length(),
	t_margin => 5,
    b_margin => 5,
    l_margin => 5,
    r_margin => 5,
    x_ticks => 0,
    text_space => 0,
	line_width 	=> 1,
	transparent	=> 0,
	b_margin => 30,
	t_margin => 35,
	x_plot_values => 0,
	interlaced => 1,
);

# allocate some colors for drawing the bases
#use colors same as Chromas
$graph->set( dclrs => [ qw( green blue black red pink) ] );

#plot the data
my $gd = $graph->plot(\@data);

$black = $gd->colorAllocate(0,0,0);       # A
$blue = $gd->colorAllocate(0,0,255);      # C
$red = $gd->colorAllocate(255,0,0);       # G
$green = $gd->colorAllocate(0,255,0);     # T
$magenta =$gd->colorAllocate(255,0,255);  # N
$white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
$gray = $gd->colorAllocate(210,210,210);
%colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
$magenta, " ",$white);

#$start_base = index(lc($sequence),lc($LEFT_SEQ));
$start_base = find_match($sequence,$LEFT_SEQ);

#if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
$end_base = find_match($sequence,$RIGHT_SEQ, 1);
if($end_base){
 $end_base += length($RIGHT_SEQ);
}


# get the coords of the features on the image
@coords = $graph->get_hotspot(1);
$size = @coords;
$printed_num = 1;
$basecount = 0;
$numstoprint = $basecount - $start_base;

# draw the colored bases and scale at top and bottom of image
for ($i=0,$count = 0; $i<$size; $i++) {
  $c = $coords[$i];
  (undef, $xs, undef, undef, undef, undef) = @$c;
  $base = $bases[$i];
  if($base =~ /[ACGTN]/){
   if($start_base - 1 == $basecount){$start_base_coord = $xs;}
   if($end_base - 1 == $basecount){$end_base_coord = $xs;}
   if(defined($SIZE) && $start_base+$SIZE -2 ==
$basecount){$end_base_coord_by_size = $xs;}
   $basecount++;
   $numstoprint++;
   $printed_num = 0;
  }
  # print the bases top and bottom
  $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
  $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});

  # print scale
  if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
    if($LEFT_SEQ){
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }else{
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }
  }
  $top_right_corner = $xs;
}


# only draw the clipped region if the calculated size is + or - 6bp
#if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
- $SIZE >= -6 ){
# draw the clipped regions as gray
  #if LEFT_SEQ supplied and a match found
  if($LEFT_SEQ && $start_base > 0){
     $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
33,$red);
     $clipped = 1;
  }
 #if RIGHT_SEQ supplied and a match found
 if($RIGHT_SEQ && $end_base > 0){
   print join("\t", ($end_base)),"\n";
   $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
33,$gray);
   $clipped = 1;
 }
 #if no RIGHT_SEQ supplied or no match found, use left match + seq
length
 if(!$RIGHT_SEQ || $end_base < 0){
 
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
t - 33,$blue);
  $clipped = 1;
 }
 

# set height based on max trace within clipped region
   $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);

  # need to re-plot the data over the grayed out area
  $graph->plot(\@data) if $clipped;
  $gd->filledRectangle(0,0,$top_right_corner,33,$white);

#}

#print the graph
open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
binmode OUT;
print OUT $gd->png;
close OUT;


sub find_match{
  my ($sequence,$query,$last) = @_;
  return -1 if length($query) < 6;
  my($odds, $evens, $ones, $twos, $threes, $match_pos);
    # try exact match
    $match_pos = do_regex($query, $sequence,$last); return $match_pos if
$match_pos > 0;

    # try matching every second base starting from the second base e.g.
it will be .C.T.C.G.etc
    map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
($query=~m/(\w\w)/g);
    $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
if $match_pos > 0;

    # try matching every third base starting from the first base e.g. it
will be C..T..G..T etc
    map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
$threes.="..$3"} ($query =~m/(\w\w\w)/g);
    $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($threes, $sequence,$last); return $match_pos
if $match_pos > 0;

     # not found
     return -1;
}

sub do_regex(){
	my ($query,$sequence,$last)= @_;
    #print "trying $query \n";
    my $result = -1;
      $result = pos($sequence)-length($query)+1 if $last && ($sequence
=~ m/.*($query)/ig);
      $result = pos($sequence)-length($query)+1 if($sequence =~
m/.*?($query)/ig);
    return $result;
}

------------------------------------------------------------------------
------------------

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 15:47:20 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 15:47:20 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
Message-ID: <473B5ED8.1090201@mail.nih.gov>

I guess you need chromatogram from SCF. I can't help in that. ABI.pm is 
not in Bioperl distribution. But to make the record straight, you can 
use one step chromatogram drawing in SVG from ABI file using my BioSVG
module, available at:

http://www.bioinformatics.org/~malay/biosvg/

Malay


Smithies, Russell wrote:
> Here's my trace viewer.
> Please excuse my dodgy Perl and debugging code as it's still under
> development  :-)
> 
> 
> Russell Smithies
> 
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> 
> Invermay  Research Centre
> Puddle Alley, 
> Mosgiel, 
> New Zealand
> T  +64 3 489 3809   
> F  +64 3 489 9174  
> www.agresearch.co.nz
> 
> 
> ------------------------------------------------------------------------
> ------------------
> 
> #!perl -w
> use ABI;
> 
> use GD::Graph::lines;
> use GD::Graph::colour;
> use GD::Graph::Data;
> 
> use Data::Dumper;
> 
> 
> use Getopt::Long;
> 
> use constant HEIGHT => 300;
> 
> GetOptions ('h|height=i' => \$HEIGHT,
>             'f|file=s' => \$FILE,
>             'o|out=s' => \$OUTFILE,
>             'l|left=s' => \$LEFT_SEQ,
>             'r|right=s' => \$RIGHT_SEQ,
>             's|size=i' => \$SIZE,
>             ) || die <<USAGE;
> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> test2.png -l actacgtacgta -r atgatcgtacgtac
> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> --out test2.png --left actacgtacgta --right atgatcgtacgtac
> 
> Options:
> --height <pixels> Set height of image (${\HEIGHT} pixels default)
> --file <trace file-name> Filename for the ABI trace file
> --out <output file-name> Filename for the generated .png image
> --left <left end sequence>
> --right <right end sequence>
> --size <size of clipped fasta sequence>
> 
> Parse an ABI trace file and render a PNG image.
> See http://search.cpan.org/dist/ABI/ABI.pm
>     or
>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> USAGE
> 
> my $height = $HEIGHT || HEIGHT;
> my $file = $FILE;
> my $outfile = $OUTFILE;
> 
> my $abi = ABI->new(-file=> $file);
> 
> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> 
> my @base_calls = $abi->get_base_calls(); # Get the base calls
> my $sequence =$abi->get_sequence();
> @bp = split(//, $sequence);
> 
> 
> 
> # iterate over array
> $size = $abi->get_trace_length();
> for ($i=0,$count = 0; $i<$size; $i++) {
>      if(grep(/\b$i\b/, @base_calls)){
>        $bases[$i] = $bp[$count];
>        $count++;
>      }else{
>        $bases[$i] = ' ';
>      }
> }
> 
> # create the data. see GD::Graph::Data for details of the format
> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> 
> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>    $graph->set(
>    title => $abi->get_sample_name(),
> #	y_max_value => $abi->get_max_trace() + 50,
> 	x_max_value => $abi->get_trace_length(),
> 	t_margin => 5,
>     b_margin => 5,
>     l_margin => 5,
>     r_margin => 5,
>     x_ticks => 0,
>     text_space => 0,
> 	line_width 	=> 1,
> 	transparent	=> 0,
> 	b_margin => 30,
> 	t_margin => 35,
> 	x_plot_values => 0,
> 	interlaced => 1,
> );
> 
> # allocate some colors for drawing the bases
> #use colors same as Chromas
> $graph->set( dclrs => [ qw( green blue black red pink) ] );
> 
> #plot the data
> my $gd = $graph->plot(\@data);
> 
> $black = $gd->colorAllocate(0,0,0);       # A
> $blue = $gd->colorAllocate(0,0,255);      # C
> $red = $gd->colorAllocate(255,0,0);       # G
> $green = $gd->colorAllocate(0,255,0);     # T
> $magenta =$gd->colorAllocate(255,0,255);  # N
> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> $gray = $gd->colorAllocate(210,210,210);
> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> $magenta, " ",$white);
> 
> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> $start_base = find_match($sequence,$LEFT_SEQ);
> 
> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> if($end_base){
>  $end_base += length($RIGHT_SEQ);
> }
> 
> 
> # get the coords of the features on the image
> @coords = $graph->get_hotspot(1);
> $size = @coords;
> $printed_num = 1;
> $basecount = 0;
> $numstoprint = $basecount - $start_base;
> 
> # draw the colored bases and scale at top and bottom of image
> for ($i=0,$count = 0; $i<$size; $i++) {
>   $c = $coords[$i];
>   (undef, $xs, undef, undef, undef, undef) = @$c;
>   $base = $bases[$i];
>   if($base =~ /[ACGTN]/){
>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>    if(defined($SIZE) && $start_base+$SIZE -2 ==
> $basecount){$end_base_coord_by_size = $xs;}
>    $basecount++;
>    $numstoprint++;
>    $printed_num = 0;
>   }
>   # print the bases top and bottom
>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>   $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});
> 
>   # print scale
>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>     if($LEFT_SEQ){
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }else{
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }
>   }
>   $top_right_corner = $xs;
> }
> 
> 
> 
> # only draw the clipped region if the calculated size is + or - 6bp
> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
> - $SIZE >= -6 ){
> # draw the clipped regions as gray
>   #if LEFT_SEQ supplied and a match found
>   if($LEFT_SEQ && $start_base > 0){
>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> 33,$red);
>      $clipped = 1;
>   }
>  #if RIGHT_SEQ supplied and a match found
>  if($RIGHT_SEQ && $end_base > 0){
>    print join("\t", ($end_base)),"\n";
>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
> 33,$gray);
>    $clipped = 1;
>  }
>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> length
>  if(!$RIGHT_SEQ || $end_base < 0){
>  
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> t - 33,$blue);
>   $clipped = 1;
>  }
>  
> 
> 
> # set height based on max trace within clipped region
>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);
> 
>   # need to re-plot the data over the grayed out area
>   $graph->plot(\@data) if $clipped;
>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> 
> #}
> 
> #print the graph
> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> binmode OUT;
> print OUT $gd->png;
> close OUT;
> 
> 
> sub find_match{
>   my ($sequence,$query,$last) = @_;
>   return -1 if length($query) < 6;
>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>     # try exact match
>     $match_pos = do_regex($query, $sequence,$last); return $match_pos if
> $match_pos > 0;
> 
>     # try matching every second base starting from the second base e.g.
> it will be .C.T.C.G.etc
>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> ($query=~m/(\w\w)/g);
>     $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
> if $match_pos > 0;
> 
>     # try matching every third base starting from the first base e.g. it
> will be C..T..G..T etc
>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>     $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($threes, $sequence,$last); return $match_pos
> if $match_pos > 0;
> 
>      # not found
>      return -1;
> }
> 
> sub do_regex(){
> 	my ($query,$sequence,$last)= @_;
>     #print "trying $query \n";
>     my $result = -1;
>       $result = pos($sequence)-length($query)+1 if $last && ($sequence
> =~ m/.*($query)/ig);
>       $result = pos($sequence)-length($query)+1 if($sequence =~
> m/.*?($query)/ig);
>     return $result;
> }
> 
> ------------------------------------------------------------------------
> ------------------
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Lee Katz
>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] chromatogram
>>
>> Hi,
>> I would like to know how to draw a chromatogram file.  Does anyone
>> have any sample code where you read in an scf file and create a jpeg
>> or other image file?
>> For that matter, I want to be able to customize these images with base
>> calls if possible.  I really appreciate the help, so thanks!
>>
>> --
>> Lee Katz
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Malay K Basu
www.malaybasu.net


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 15:58:19 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 09:58:19 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B5ED8.1090201@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>


We try and avoid SVG at all costs as installing plugins and viewers in a
locked down corporate environment can be more trouble than it's worth
whereas generating .png images works for any browser with no extras
required.
We actually call this trace drawing code from Python which then
generates webpages with the embedded image. 
It also means we don't need to licence, install and maintain a trace
viewer like Chromas.
:-)

Russell

> -----Original Message-----
> From: Malay [mailto:mbasu at mail.nih.gov]
> Sent: Thursday, 15 November 2007 9:47 a.m.
> To: Smithies, Russell
> Cc: Lee Katz; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] chromatogram
> 
> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
is
> not in Bioperl distribution. But to make the record straight, you can
> use one step chromatogram drawing in SVG from ABI file using my BioSVG
> module, available at:
> 
> http://www.bioinformatics.org/~malay/biosvg/
> 
> Malay
> 
> 
> 
> 
> Smithies, Russell wrote:
> > Here's my trace viewer.
> > Please excuse my dodgy Perl and debugging code as it's still under
> > development  :-)
> >
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> > #!perl -w
> > use ABI;
> >
> > use GD::Graph::lines;
> > use GD::Graph::colour;
> > use GD::Graph::Data;
> >
> > use Data::Dumper;
> >
> >
> > use Getopt::Long;
> >
> > use constant HEIGHT => 300;
> >
> > GetOptions ('h|height=i' => \$HEIGHT,
> >             'f|file=s' => \$FILE,
> >             'o|out=s' => \$OUTFILE,
> >             'l|left=s' => \$LEFT_SEQ,
> >             'r|right=s' => \$RIGHT_SEQ,
> >             's|size=i' => \$SIZE,
> >             ) || die <<USAGE;
> > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> > test2.png -l actacgtacgta -r atgatcgtacgtac
> > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> > --out test2.png --left actacgtacgta --right atgatcgtacgtac
> >
> > Options:
> > --height <pixels> Set height of image (${\HEIGHT} pixels default)
> > --file <trace file-name> Filename for the ABI trace file
> > --out <output file-name> Filename for the generated .png image
> > --left <left end sequence>
> > --right <right end sequence>
> > --size <size of clipped fasta sequence>
> >
> > Parse an ABI trace file and render a PNG image.
> > See http://search.cpan.org/dist/ABI/ABI.pm
> >     or
> >     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> > USAGE
> >
> > my $height = $HEIGHT || HEIGHT;
> > my $file = $FILE;
> > my $outfile = $OUTFILE;
> >
> > my $abi = ABI->new(-file=> $file);
> >
> > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> >
> > my @base_calls = $abi->get_base_calls(); # Get the base calls
> > my $sequence =$abi->get_sequence();
> > @bp = split(//, $sequence);
> >
> >
> >
> > # iterate over array
> > $size = $abi->get_trace_length();
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >      if(grep(/\b$i\b/, @base_calls)){
> >        $bases[$i] = $bp[$count];
> >        $count++;
> >      }else{
> >        $bases[$i] = ' ';
> >      }
> > }
> >
> > # create the data. see GD::Graph::Data for details of the format
> > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> >
> > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
> >    $graph->set(
> >    title => $abi->get_sample_name(),
> > #	y_max_value => $abi->get_max_trace() + 50,
> > 	x_max_value => $abi->get_trace_length(),
> > 	t_margin => 5,
> >     b_margin => 5,
> >     l_margin => 5,
> >     r_margin => 5,
> >     x_ticks => 0,
> >     text_space => 0,
> > 	line_width 	=> 1,
> > 	transparent	=> 0,
> > 	b_margin => 30,
> > 	t_margin => 35,
> > 	x_plot_values => 0,
> > 	interlaced => 1,
> > );
> >
> > # allocate some colors for drawing the bases
> > #use colors same as Chromas
> > $graph->set( dclrs => [ qw( green blue black red pink) ] );
> >
> > #plot the data
> > my $gd = $graph->plot(\@data);
> >
> > $black = $gd->colorAllocate(0,0,0);       # A
> > $blue = $gd->colorAllocate(0,0,255);      # C
> > $red = $gd->colorAllocate(255,0,0);       # G
> > $green = $gd->colorAllocate(0,255,0);     # T
> > $magenta =$gd->colorAllocate(255,0,255);  # N
> > $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> > $gray = $gd->colorAllocate(210,210,210);
> > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> > $magenta, " ",$white);
> >
> > #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> > $start_base = find_match($sequence,$LEFT_SEQ);
> >
> > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> > $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> > if($end_base){
> >  $end_base += length($RIGHT_SEQ);
> > }
> >
> >
> > # get the coords of the features on the image
> > @coords = $graph->get_hotspot(1);
> > $size = @coords;
> > $printed_num = 1;
> > $basecount = 0;
> > $numstoprint = $basecount - $start_base;
> >
> > # draw the colored bases and scale at top and bottom of image
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >   $c = $coords[$i];
> >   (undef, $xs, undef, undef, undef, undef) = @$c;
> >   $base = $bases[$i];
> >   if($base =~ /[ACGTN]/){
> >    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
> >    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
> >    if(defined($SIZE) && $start_base+$SIZE -2 ==
> > $basecount){$end_base_coord_by_size = $xs;}
> >    $basecount++;
> >    $numstoprint++;
> >    $printed_num = 0;
> >   }
> >   # print the bases top and bottom
> >   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
> >   $gd->string(GD::Font->Small(),$xs,$height -
30,$base,$colors{$base});
> >
> >   # print scale
> >   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
> >     if($LEFT_SEQ){
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }else{
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }
> >   }
> >   $top_right_corner = $xs;
> > }
> >
> >
> >
> > # only draw the clipped region if the calculated size is + or - 6bp
> > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
$start_base)
> > - $SIZE >= -6 ){
> > # draw the clipped regions as gray
> >   #if LEFT_SEQ supplied and a match found
> >   if($LEFT_SEQ && $start_base > 0){
> >      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> > 33,$red);
> >      $clipped = 1;
> >   }
> >  #if RIGHT_SEQ supplied and a match found
> >  if($RIGHT_SEQ && $end_base > 0){
> >    print join("\t", ($end_base)),"\n";
> >    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
-
> > 33,$gray);
> >    $clipped = 1;
> >  }
> >  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> > length
> >  if(!$RIGHT_SEQ || $end_base < 0){
> >
> >
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> > t - 33,$blue);
> >   $clipped = 1;
> >  }
> >
> >
> >
> > # set height based on max trace within clipped region
> >    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
50);
> >
> >   # need to re-plot the data over the grayed out area
> >   $graph->plot(\@data) if $clipped;
> >   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> >
> > #}
> >
> > #print the graph
> > open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> > binmode OUT;
> > print OUT $gd->png;
> > close OUT;
> >
> >
> > sub find_match{
> >   my ($sequence,$query,$last) = @_;
> >   return -1 if length($query) < 6;
> >   my($odds, $evens, $ones, $twos, $threes, $match_pos);
> >     # try exact match
> >     $match_pos = do_regex($query, $sequence,$last); return
$match_pos if
> > $match_pos > 0;
> >
> >     # try matching every second base starting from the second base
e.g.
> > it will be .C.T.C.G.etc
> >     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> > ($query=~m/(\w\w)/g);
> >     $match_pos = do_regex($odds, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($evens, $sequence,$last);  return
$match_pos
> > if $match_pos > 0;
> >
> >     # try matching every third base starting from the first base
e.g. it
> > will be C..T..G..T etc
> >     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> > $threes.="..$3"} ($query =~m/(\w\w\w)/g);
> >     $match_pos = do_regex($ones, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($twos, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($threes, $sequence,$last); return
$match_pos
> > if $match_pos > 0;
> >
> >      # not found
> >      return -1;
> > }
> >
> > sub do_regex(){
> > 	my ($query,$sequence,$last)= @_;
> >     #print "trying $query \n";
> >     my $result = -1;
> >       $result = pos($sequence)-length($query)+1 if $last &&
($sequence
> > =~ m/.*($query)/ig);
> >       $result = pos($sequence)-length($query)+1 if($sequence =~
> > m/.*?($query)/ig);
> >     return $result;
> > }
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-
> >> bio.org] On Behalf Of Lee Katz
> >> Sent: Wednesday, 14 November 2007 2:28 p.m.
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] chromatogram
> >>
> >> Hi,
> >> I would like to know how to draw a chromatogram file.  Does anyone
> >> have any sample code where you read in an scf file and create a
jpeg
> >> or other image file?
> >> For that matter, I want to be able to customize these images with
base
> >> calls if possible.  I really appreciate the help, so thanks!
> >>
> >> --
> >> Lee Katz
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> =============================================================
> ==========
> > Attention: The information contained in this message and/or
attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
privileged
> > material. Any review, retransmission, dissemination or other use of,
or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> >
> =============================================================
> ==========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> --
> Malay K Basu
> www.malaybasu.net

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 16:04:25 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 16:04:25 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
Message-ID: <473B62D9.8010004@mail.nih.gov>

You don't need any plugin. Firefox natively can show most of the SVG files.

-Malay

Smithies, Russell wrote:
> We try and avoid SVG at all costs as installing plugins and viewers in a
> locked down corporate environment can be more trouble than it's worth
> whereas generating .png images works for any browser with no extras
> required.
> We actually call this trace drawing code from Python which then
> generates webpages with the embedded image. 
> It also means we don't need to licence, install and maintain a trace
> viewer like Chromas.
> :-)
> 
> Russell
> 
>> -----Original Message-----
>> From: Malay [mailto:mbasu at mail.nih.gov]
>> Sent: Thursday, 15 November 2007 9:47 a.m.
>> To: Smithies, Russell
>> Cc: Lee Katz; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] chromatogram
>>
>> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
> is
>> not in Bioperl distribution. But to make the record straight, you can
>> use one step chromatogram drawing in SVG from ABI file using my BioSVG
>> module, available at:
>>
>> http://www.bioinformatics.org/~malay/biosvg/
>>
>> Malay
>>
>>
>>
>>
>> Smithies, Russell wrote:
>>> Here's my trace viewer.
>>> Please excuse my dodgy Perl and debugging code as it's still under
>>> development  :-)
>>>
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>> #!perl -w
>>> use ABI;
>>>
>>> use GD::Graph::lines;
>>> use GD::Graph::colour;
>>> use GD::Graph::Data;
>>>
>>> use Data::Dumper;
>>>
>>>
>>> use Getopt::Long;
>>>
>>> use constant HEIGHT => 300;
>>>
>>> GetOptions ('h|height=i' => \$HEIGHT,
>>>             'f|file=s' => \$FILE,
>>>             'o|out=s' => \$OUTFILE,
>>>             'l|left=s' => \$LEFT_SEQ,
>>>             'r|right=s' => \$RIGHT_SEQ,
>>>             's|size=i' => \$SIZE,
>>>             ) || die <<USAGE;
>>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
>>> test2.png -l actacgtacgta -r atgatcgtacgtac
>>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
>>> --out test2.png --left actacgtacgta --right atgatcgtacgtac
>>>
>>> Options:
>>> --height <pixels> Set height of image (${\HEIGHT} pixels default)
>>> --file <trace file-name> Filename for the ABI trace file
>>> --out <output file-name> Filename for the generated .png image
>>> --left <left end sequence>
>>> --right <right end sequence>
>>> --size <size of clipped fasta sequence>
>>>
>>> Parse an ABI trace file and render a PNG image.
>>> See http://search.cpan.org/dist/ABI/ABI.pm
>>>     or
>>>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
>>> USAGE
>>>
>>> my $height = $HEIGHT || HEIGHT;
>>> my $file = $FILE;
>>> my $outfile = $OUTFILE;
>>>
>>> my $abi = ABI->new(-file=> $file);
>>>
>>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
>>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
>>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
>>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
>>>
>>> my @base_calls = $abi->get_base_calls(); # Get the base calls
>>> my $sequence =$abi->get_sequence();
>>> @bp = split(//, $sequence);
>>>
>>>
>>>
>>> # iterate over array
>>> $size = $abi->get_trace_length();
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>      if(grep(/\b$i\b/, @base_calls)){
>>>        $bases[$i] = $bp[$count];
>>>        $count++;
>>>      }else{
>>>        $bases[$i] = ' ';
>>>      }
>>> }
>>>
>>> # create the data. see GD::Graph::Data for details of the format
>>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
>>>
>>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>>>    $graph->set(
>>>    title => $abi->get_sample_name(),
>>> #	y_max_value => $abi->get_max_trace() + 50,
>>> 	x_max_value => $abi->get_trace_length(),
>>> 	t_margin => 5,
>>>     b_margin => 5,
>>>     l_margin => 5,
>>>     r_margin => 5,
>>>     x_ticks => 0,
>>>     text_space => 0,
>>> 	line_width 	=> 1,
>>> 	transparent	=> 0,
>>> 	b_margin => 30,
>>> 	t_margin => 35,
>>> 	x_plot_values => 0,
>>> 	interlaced => 1,
>>> );
>>>
>>> # allocate some colors for drawing the bases
>>> #use colors same as Chromas
>>> $graph->set( dclrs => [ qw( green blue black red pink) ] );
>>>
>>> #plot the data
>>> my $gd = $graph->plot(\@data);
>>>
>>> $black = $gd->colorAllocate(0,0,0);       # A
>>> $blue = $gd->colorAllocate(0,0,255);      # C
>>> $red = $gd->colorAllocate(255,0,0);       # G
>>> $green = $gd->colorAllocate(0,255,0);     # T
>>> $magenta =$gd->colorAllocate(255,0,255);  # N
>>> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
>>> $gray = $gd->colorAllocate(210,210,210);
>>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
>>> $magenta, " ",$white);
>>>
>>> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
>>> $start_base = find_match($sequence,$LEFT_SEQ);
>>>
>>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
>>> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
>>> if($end_base){
>>>  $end_base += length($RIGHT_SEQ);
>>> }
>>>
>>>
>>> # get the coords of the features on the image
>>> @coords = $graph->get_hotspot(1);
>>> $size = @coords;
>>> $printed_num = 1;
>>> $basecount = 0;
>>> $numstoprint = $basecount - $start_base;
>>>
>>> # draw the colored bases and scale at top and bottom of image
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>   $c = $coords[$i];
>>>   (undef, $xs, undef, undef, undef, undef) = @$c;
>>>   $base = $bases[$i];
>>>   if($base =~ /[ACGTN]/){
>>>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>>>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>>>    if(defined($SIZE) && $start_base+$SIZE -2 ==
>>> $basecount){$end_base_coord_by_size = $xs;}
>>>    $basecount++;
>>>    $numstoprint++;
>>>    $printed_num = 0;
>>>   }
>>>   # print the bases top and bottom
>>>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>>>   $gd->string(GD::Font->Small(),$xs,$height -
> 30,$base,$colors{$base});
>>>   # print scale
>>>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>>>     if($LEFT_SEQ){
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }else{
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }
>>>   }
>>>   $top_right_corner = $xs;
>>> }
>>>
>>>
>>>
>>> # only draw the clipped region if the calculated size is + or - 6bp
>>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
> $start_base)
>>> - $SIZE >= -6 ){
>>> # draw the clipped regions as gray
>>>   #if LEFT_SEQ supplied and a match found
>>>   if($LEFT_SEQ && $start_base > 0){
>>>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
>>> 33,$red);
>>>      $clipped = 1;
>>>   }
>>>  #if RIGHT_SEQ supplied and a match found
>>>  if($RIGHT_SEQ && $end_base > 0){
>>>    print join("\t", ($end_base)),"\n";
>>>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
> -
>>> 33,$gray);
>>>    $clipped = 1;
>>>  }
>>>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
>>> length
>>>  if(!$RIGHT_SEQ || $end_base < 0){
>>>
>>>
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
>>> t - 33,$blue);
>>>   $clipped = 1;
>>>  }
>>>
>>>
>>>
>>> # set height based on max trace within clipped region
>>>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
> 50);
>>>   # need to re-plot the data over the grayed out area
>>>   $graph->plot(\@data) if $clipped;
>>>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
>>>
>>> #}
>>>
>>> #print the graph
>>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
>>> binmode OUT;
>>> print OUT $gd->png;
>>> close OUT;
>>>
>>>
>>> sub find_match{
>>>   my ($sequence,$query,$last) = @_;
>>>   return -1 if length($query) < 6;
>>>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>>>     # try exact match
>>>     $match_pos = do_regex($query, $sequence,$last); return
> $match_pos if
>>> $match_pos > 0;
>>>
>>>     # try matching every second base starting from the second base
> e.g.
>>> it will be .C.T.C.G.etc
>>>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
>>> ($query=~m/(\w\w)/g);
>>>     $match_pos = do_regex($odds, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($evens, $sequence,$last);  return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>     # try matching every third base starting from the first base
> e.g. it
>>> will be C..T..G..T etc
>>>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
>>> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>>>     $match_pos = do_regex($ones, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($twos, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($threes, $sequence,$last); return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>      # not found
>>>      return -1;
>>> }
>>>
>>> sub do_regex(){
>>> 	my ($query,$sequence,$last)= @_;
>>>     #print "trying $query \n";
>>>     my $result = -1;
>>>       $result = pos($sequence)-length($query)+1 if $last &&
> ($sequence
>>> =~ m/.*($query)/ig);
>>>       $result = pos($sequence)-length($query)+1 if($sequence =~
>>> m/.*?($query)/ig);
>>>     return $result;
>>> }
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-
>>>> bio.org] On Behalf Of Lee Katz
>>>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] chromatogram
>>>>
>>>> Hi,
>>>> I would like to know how to draw a chromatogram file.  Does anyone
>>>> have any sample code where you read in an scf file and create a
> jpeg
>>>> or other image file?
>>>> For that matter, I want to be able to customize these images with
> base
>>>> calls if possible.  I really appreciate the help, so thanks!
>>>>
>>>> --
>>>> Lee Katz
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =============================================================
>> ==========
>>> Attention: The information contained in this message and/or
> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
> privileged
>>> material. Any review, retransmission, dissemination or other use of,
> or
>>> taking of any action in reliance upon, this information by persons
> or
>>> entities other than the intended recipients is prohibited by
> AgResearch
>>> Limited. If you have received this message in error, please notify
> the
>>> sender immediately.
>>>
>> =============================================================
>> ==========
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Malay K Basu
>> www.malaybasu.net
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================


-- 
Malay K Basu
www.malaybasu.net


From tomboy at cs.huji.ac.il  Wed Nov 14 21:43:43 2007
From: tomboy at cs.huji.ac.il (Tomer Hertz)
Date: Wed, 14 Nov 2007 18:43:43 -0800
Subject: [Bioperl-l] problems in stalling bio perl
Message-ID: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>

hi
when I try to install bioperl I get the following error message:

hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
$ perl Build.PL
Can't find file lib/Module/Build.pm to determine version at
/usr/lib/perl5/site_
perl/5.8/Module/Build/Base.pm line 950.
can you please help. I have tried reinstalling the build command and that
does not seem to help as well.

many thanks
--Tomer

-- 
--------------------------------------------------------------------------------
Tomer Hertz
Postdoctoral Researcher
Machine Learning and Applied Statistics
Microsoft Research
One Microsoft Way, Redmond, WA, 98052, USA

Homepage: www.cs.huji.ac.il/~tomboy
Email: hertz at microsoft dot com
Tel: (425)-421-8313               Fax: (425) 936-7329
--------------------------------------------------------------------------------

From lskatz at gatech.edu  Thu Nov 15 08:24:02 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Thu, 15 Nov 2007 08:24:02 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B62D9.8010004@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>

Thank you all.
Are you all sure in that there is no way to go from an scf to an image
though?  I do have abi files, but I am relying on Phred output for
base calls for other things and I want to stay consistent.  This means
that if I use the fasta files that I get from Phred in another part of
my program, I need to use the scf files it produces.

If this is not possible, do you know if drawing an scf is in the works?  Thanks.

-- 
Lee Katz
http://www.lskatz.com

From cain.cshl at gmail.com  Thu Nov 15 09:21:26 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 15 Nov 2007 09:21:26 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <1195136486.2785.12.camel@localhost.localdomain>

Hi Lee,

Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses
Bio::SCF to draw trace files onto a Bio::Graphics::Panel.  Bio::SCF is
not part of bioperl, so you have to get it from CPAN and it depends on
the Staden io-lib package, so you'll need that too.  You can get GBrowse
from http://www.gmod.org/gbrowse , and you can look at the tutorial for
more information on configuring the trace glyph.

Scott


On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote:
> Thank you all.
> Are you all sure in that there is no way to go from an scf to an image
> though?  I do have abi files, but I am relying on Phred output for
> base calls for other things and I want to stay consistent.  This means
> that if I use the fasta files that I get from Phred in another part of
> my program, I need to use the scf files it produces.
> 
> If this is not possible, do you know if drawing an scf is in the works?  Thanks.
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From bosborne11 at verizon.net  Thu Nov 15 09:18:05 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 09:18:05 -0500
Subject: [Bioperl-l] problems in stalling bio perl
In-Reply-To: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>
Message-ID: <C361BF4D.103D8%bosborne11@verizon.net>

Tomer,

Interesting. When I used Cygwin I always worked entirely within the C:
drive, it looks like you're executing the script from the E: drive. Is
Cygwin installed in C:/cygwin? You can see what I'm getting at, it's
possible that you need to set $PERL5LIB to something like
/cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say?

Brian O.


On 11/14/07 9:43 PM, "Tomer Hertz" <tomboy at cs.huji.ac.il> wrote:

> hi
> when I try to install bioperl I get the following error message:
> 
> hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
> $ perl Build.PL
> Can't find file lib/Module/Build.pm to determine version at
> /usr/lib/perl5/site_
> perl/5.8/Module/Build/Base.pm line 950.
> can you please help. I have tried reinstalling the build command and that
> does not seem to help as well.
> 
> many thanks
> --Tomer


From bernd.web at gmail.com  Thu Nov 15 10:26:42 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 16:26:42 +0100
Subject: [Bioperl-l] Graphics::Panel
Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>

Hi,

Has someone been able to access '$description' for the production of
imagemaps with Graphics::Panel?
The map below does not print the "title" tag at all, '$description'
seems not available, although for the tracks ($panel->add_track) it is
available.
$map = $panel->create_web_map($mapname, $linkrule, '$description');

Replacing '$description' with a coderef for the titletag does work, if
I use the code below
my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };


I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }


Regards,
Bernd

From luciap at sas.upenn.edu  Thu Nov 15 10:44:21 2007
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Thu, 15 Nov 2007 10:44:21 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>

Hi
I was asked this question recently
and it occurred to me I must be doing things inefficiently
To produce gff file I was using SeqIO to parse the required fields, then
according to the conventions just printing out whatever was required tab
delimited, which is easy

but if I wanted to generate a genbank file, extracting features from a gff file
and a plain fasta file it was more complicated
is there support for gff in bioperl now?
anyone can contribute with  smart way to go from/to gff, genebank and embl?

thanks very much

Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania

From lstein at cshl.edu  Thu Nov 15 12:38:04 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Nov 2007 12:38:04 -0500
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>

Depending on which Feature object you use, you may have to use a tag named
"note" instead of "description".

Lincoln

On Nov 15, 2007 10:26 AM, Bernd Web <bernd.web at gmail.com> wrote:

> Hi,
>
> Has someone been able to access '$description' for the production of
> imagemaps with Graphics::Panel?
> The map below does not print the "title" tag at all, '$description'
> seems not available, although for the tracks ($panel->add_track) it is
> available.
> $map = $panel->create_web_map($mapname, $linkrule, '$description');
>
> Replacing '$description' with a coderef for the titletag does work, if
> I use the code below
> my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
>
>
> I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From bernd.web at gmail.com  Thu Nov 15 13:03:19 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 19:03:19 +0100
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
	<6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com>

On Nov 15, 2007 6:38 PM, Lincoln Stein <lstein at cshl.edu> wrote:
> Depending on which Feature object you use, you may have to use a tag named
> "note" instead of "description".
>
> Lincoln
>
>
>
> On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote:
> >
> >
> >
> > Hi,
> >
> > Has someone been able to access '$description' for the production of
> > imagemaps with Graphics::Panel?
> > The map below does not print the "title" tag at all, '$description'
> > seems not available, although for the tracks ($panel->add_track) it is
> > available.
> > $map = $panel->create_web_map($mapname, $linkrule, '$description');
> >
> > Replacing '$description' with a coderef for the titletag does work, if
> > I use the code below
> > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
> >
> >
> > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu

From cjfields at uiuc.edu  Thu Nov 15 13:43:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Nov 2007 12:43:02 -0600
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>

There are currently many ways to get what you want, but not all are  
consistent (particularly re: GFF3).  We are aiming for more  
consistent, compliant GFF/GTF output in the next developer series  
(1.7) of Bioperl.

You can try using bp_genbank2gff or bp_genbank2gff3 (both in the  
scripts directory); these are probably the most common way when  
working directly from a seq record.  Bio::Tools::GFF is the most  
commonly used class though I'm unsure of it's status for GFF3  
output.  From within a Bio::SeqI you can call write_gff() (currently  
not very flexible) or from the SeqFeature itself gff_string().   
Bio::Graphics::Feature has the additional method gff3_string().   
Bio::FeatureIO is also an option, though I would consider it very  
experimental (it will likely undergo significant revision in the next  
bioperl dev series).

Any others anyone can think of, maybe non-BioPerl related as well?

chris

On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:

> Hi
> I was asked this question recently
> and it occurred to me I must be doing things inefficiently
> To produce gff file I was using SeqIO to parse the required fields,  
> then
> according to the conventions just printing out whatever was  
> required tab
> delimited, which is easy
>
> but if I wanted to generate a genbank file, extracting features  
> from a gff file
> and a plain fasta file it was more complicated
> is there support for gff in bioperl now?
> anyone can contribute with  smart way to go from/to gff, genebank  
> and embl?
>
> thanks very much
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Nov 15 14:19:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 14:19:41 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>
Message-ID: <C36205FD.103EA%bosborne11@verizon.net>

Chris,

There's also a genbank2gff3.PLS script in the GMOD package (
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
revision=1.5&view=markup). However, it has not been modified for a couple of
years, it may not be the "preferred" script.

See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information
on using Bioperl's bp_genbank2gff3 script.

Brian O.


On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> There are currently many ways to get what you want, but not all are
> consistent (particularly re: GFF3).  We are aiming for more
> consistent, compliant GFF/GTF output in the next developer series
> (1.7) of Bioperl.
> 
> You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> scripts directory); these are probably the most common way when
> working directly from a seq record.  Bio::Tools::GFF is the most
> commonly used class though I'm unsure of it's status for GFF3
> output.  From within a Bio::SeqI you can call write_gff() (currently
> not very flexible) or from the SeqFeature itself gff_string().
> Bio::Graphics::Feature has the additional method gff3_string().
> Bio::FeatureIO is also an option, though I would consider it very
> experimental (it will likely undergo significant revision in the next
> bioperl dev series).
> 
> Any others anyone can think of, maybe non-BioPerl related as well?
> 
> chris
> 
> On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> 
>> Hi
>> I was asked this question recently
>> and it occurred to me I must be doing things inefficiently
>> To produce gff file I was using SeqIO to parse the required fields,
>> then
>> according to the conventions just printing out whatever was
>> required tab
>> delimited, which is easy
>> 
>> but if I wanted to generate a genbank file, extracting features
>> from a gff file
>> and a plain fasta file it was more complicated
>> is there support for gff in bioperl now?
>> anyone can contribute with  smart way to go from/to gff, genebank
>> and embl?
>> 
>> thanks very much
>> 
>> Lucia Peixoto
>> Department of Biology,SAS
>> University of Pennsylvania
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Thu Nov 15 17:31:28 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 16 Nov 2007 11:31:28 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>

Just to add to this, does anyone have any code for reading .sff 'traces'
from 454 sequences?

Thanx,

Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From torsten.seemann at infotech.monash.edu.au  Thu Nov 15 20:13:22 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 16 Nov 2007 12:13:22 +1100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
Message-ID: <a79f6a4b0711151713g26905bc6g5b19202b992f4e08@mail.gmail.com>

> Just to add to this, does anyone have any code for reading .sff 'traces'
> from 454 sequences?

The .SFF files can be manipulated using the SFF tools which 454
distribute with their result data. eg. "sffinfo 454AllContigs.sff"
will list all the reads with the original flowgram values etc.
However, the SFF tools are i386.Linux binaries, so not really a
portable solution.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University

From mvrmakam at yahoo.com  Thu Nov 15 22:04:55 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST)
Subject: [Bioperl-l] Problem with installing bioperl on Windows
Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com>

Hi,

I have installed Perl Package Manager ver 5.8.8.822 on windows XP.  I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View.  However, I am not able to see any packages in the view box.  Can anyone help me in this matter.

Roshan


      ____________________________________________________________________________________
Get easy, one-click access to your favorites. 
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs 

From David.Messina at sbc.su.se  Fri Nov 16 03:33:04 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 16 Nov 2007 09:33:04 +0100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com>

> If this is not possible, do you know if drawing an scf is in the
> works?  Thanks.
>


One non-BioPerl solution is 4peaks:
http://mekentosj.com/4peaks/

Mac only, but really great software. I'm also a fan of their Papers journal
article PDF library program.


Dave

From neetisomaiya at gmail.com  Mon Nov 19 01:11:49 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 19 Nov 2007 11:41:49 +0530
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>

Hi,

I am using Bio::SeqIO for parsing KEGG gene ent files.

A part of my code is

foreach my $key ( $ac->get_all_annotation_keys() )
                                {
                                        if($key eq "dblink")
                                        {
                                                my %values =
$ac->get_Annotations($key);
                                                foreach my $value (
keys(%values ))
                                                {
                                                        print "\n*****VALUE
$value*****\n";
                                                }
                                        }
                                 }

Here not all dblinks present in the actual file get parsed. For eg, in the
data below,
ENTRY       116064            CDS       H.sapiens
NAME        LRRC58
DEFINITION  leucine rich repeat containing 58
POSITION    3q13.33
MOTIF       Pfam: SdiA-regulated LRR_1
            PROSITE: LEU_RICH
DBLINKS     NCBI-GI: 153792305
            NCBI-GeneID: 116064
            HGNC: 26968
            Ensembl: ENSG00000163428
            UniProt: Q96CX6

Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE,
but doesnt give me HGNC and UniProt. For other entries it gives me other
combinations of dbs.

Can anyone help me with this. Why is this happenning? I have no clue.

Thanks and Regards,
Neeti.
-- 
-Neeti
Even my blood says, B positive

From johnston at biochem.ucl.ac.uk  Mon Nov 19 06:44:59 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT)
Subject: [Bioperl-l] blast database names
Message-ID: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>

Hello,

Is there a list of the possible database names for -data =>
$dbname in RemoteBlast somwhere?

Cheers,
Cass


From cjfields at uiuc.edu  Mon Nov 19 08:44:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 07:44:46 -0600
Subject: [Bioperl-l] blast database names
In-Reply-To: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
Message-ID: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>

Here's a recent list (don't know if it's up-to-date):

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote:

> Hello,
>
> Is there a list of the possible database names for -data =>
> $dbname in RemoteBlast somwhere?
>
> Cheers,
> Cass
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Nov 19 09:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 08:33:46 -0600
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
Message-ID: <F81EBCF4-20AD-486C-A9EC-301FE9475504@uiuc.edu>

It makes sense in the light that you're (erroneously) using a hash:

    my %values = $ac->get_Annotations($key);

This assigns key-value pairs of DBLink => DBLink; you don't see an  
error b/c the number of links happens to be even (I get 8) but you  
would if the number of links returned is odd (missing value for key  
error or something along those lines).  So when you call:

    foreach my $value (keys(%values)) {....}

you only get half of the DBLinks.  You should use an array:

    my @values = $ac->get_Annotations($key);
    foreach my $value (@values) {
       print $value->as_text,"\n";
    }

Note the loop change; Bio::Annotation are no longer operator  
overloaded so your print statement wouldn't work in a bioperl 1.6 world.

chris

On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote:

> Hi,
>
> I am using Bio::SeqIO for parsing KEGG gene ent files.
>
> A part of my code is
>
> foreach my $key ( $ac->get_all_annotation_keys() )
>                                 {
>                                         if($key eq "dblink")
>                                         {
>                                                 my %values =
> $ac->get_Annotations($key);
>                                                 foreach my $value (
> keys(%values ))
>                                                 {
>                                                         print  
> "\n*****VALUE
> $value*****\n";
>                                                 }
>                                         }
>                                  }
>
> Here not all dblinks present in the actual file get parsed. For eg,  
> in the
> data below,
> ENTRY       116064            CDS       H.sapiens
> NAME        LRRC58
> DEFINITION  leucine rich repeat containing 58
> POSITION    3q13.33
> MOTIF       Pfam: SdiA-regulated LRR_1
>             PROSITE: LEU_RICH
> DBLINKS     NCBI-GI: 153792305
>             NCBI-GeneID: 116064
>             HGNC: 26968
>             Ensembl: ENSG00000163428
>             UniProt: Q96CX6
>
> Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and  
> PROSITE,
> but doesnt give me HGNC and UniProt. For other entries it gives me  
> other
> combinations of dbs.
>
> Can anyone help me with this. Why is this happenning? I have no clue.
>
> Thanks and Regards,
> Neeti.
> -- 
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From akarger at CGR.Harvard.edu  Mon Nov 19 10:38:26 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 19 Nov 2007 10:38:26 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
References: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
	<3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E0747C64A@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> Sent: Tuesday, November 13, 2007 12:42 PM
> To: Amir Karger
> Cc: Steve Chervitz; Dave Messina; bioperl-l
> Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result?
> 
> Amir,
> 
> Can you file this as a bug?  

Done.

http://bugzilla.open-bio.org/show_bug.cgi?id=2399

> Dave mentioned he would look 
> into it but  
> I think it warrants tracking to make sure it gets fixed:
> 
> http://www.bioperl.org/wiki/Bugs
> 
> Attach the example BLAST report from your last post to the report.   
> BTW, I wonder how this appears in XML output?
> 
> chris
> 
> On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:
> 
> >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
> >> Of Steve Chervitz
> >>
> >> The Bioperl blast parser should extract that value and you 
> can obtain
> >> it from an HSP object, via the HSPI::n() method, documented here:
> >>
> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> > io/Search/HSP/HSPI.html#POD23
> >
> > As I mentioned in my email:
> >
> > And does anyone know off-hand if Bioperl will tell me when 
> situations
> > like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> > subroutine
> > would help, but I just get a bunch of empty strings for that,  
> > whether or
> > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> > {"_n"} is
> > undef.)
> >
> > And the docs for n() actually say, "This value is not defined with  
> > NCBI
> > Blast2 with gapping" although they don't say why. Which may 
> explain  
> > why,
> > when I ran the following code on the blast result I included in my  
> > last
> > email, I got empty values for all of the n's. (Why is n() 
> undefined  
> > for
> > gapped blast if I'm getting n's in my results from that blast?)
> >
> > use warnings;
> > use strict;
> > use Bio::SearchIO;
> >
> > my $blast_out = $ARGV[0];
> > my $in = new Bio::SearchIO(-format => 'blast',
> >                             -file   => $blast_out,
> >                             -report_type => 'tblastn');
> >
> > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart 
> Send Frame N
> > Evalue)), "\n";
> > while(my $query = $in->next_result) {
> >     while(my $subject = $query->next_hit) {
> >         while (my $hsp = $subject->next_hsp) {
> >             print join("\t",
> >                 $query->query_name,
> >                 $hsp->start("query"),
> >                 $hsp->end("query"),
> >                 $hsp->strand("hit"),
> >                 $subject->name,
> >                 $hsp->start("hit"),
> >                 $hsp->end("hit"),
> >                 $subject->frame,
> >                 $hsp->n,
> >                 $hsp->evalue,
> >             ),"\n";
> >         }
> >     }
> > }
> >
> >> Dave's basically correct in his explanation. It's a result of the
> >> application of sum statistics by the blast algorithm. You 
> can read  
> >> all
> >> about it in Korf et al's BLAST book. Here's the relevant section:
> >
> > [snip]
> >
> > Thanks,
> >
> > -Amir
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> 


From aaron.j.mackey at gsk.com  Mon Nov 19 11:50:53 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 19 Nov 2007 11:50:53 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <C36205FD.103EA%bosborne11@verizon.net>
Message-ID: <OF0C0B3E21.611ACEBE-ON85257398.005C01A8-85257398.005C8D95@gsk.com>

While Lucia's subject line asked for genbank2gff, her message actually 
asked the reverse (gff + fasta -> genbank).

e.g. pretend you had to prepare a genome annotation for submission to 
GenBank ...

and no, I don't know of any generalized gff2genbank script out there ...

Lucia, the SeqIO::genbank module will write GenBank format, but you have 
to get all the bits and bobs together in the right way, i.e. construct the 
various AnnotationCollections and SeqFeatures (with SplitLocations for 
exons, CDS, etc.) that a GenBank record expects.  One way to do this is to 
start with a template GenBank file that you'd like to mimic, strip it down 
to only two gene models, use SeqIO::genbank to read it into memory, and 
then step through the object with the Perl debugger to see how it is 
composed.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM:

> Chris,
> 
> There's also a genbank2gff3.PLS script in the GMOD package (
> 
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
> revision=1.5&view=markup). However, it has not been modified for a 
couple of
> years, it may not be the "preferred" script.
> 
> See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
> http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more 
information
> on using Bioperl's bp_genbank2gff3 script.
> 
> Brian O.
> 
> 
> On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > There are currently many ways to get what you want, but not all are
> > consistent (particularly re: GFF3).  We are aiming for more
> > consistent, compliant GFF/GTF output in the next developer series
> > (1.7) of Bioperl.
> > 
> > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> > scripts directory); these are probably the most common way when
> > working directly from a seq record.  Bio::Tools::GFF is the most
> > commonly used class though I'm unsure of it's status for GFF3
> > output.  From within a Bio::SeqI you can call write_gff() (currently
> > not very flexible) or from the SeqFeature itself gff_string().
> > Bio::Graphics::Feature has the additional method gff3_string().
> > Bio::FeatureIO is also an option, though I would consider it very
> > experimental (it will likely undergo significant revision in the next
> > bioperl dev series).
> > 
> > Any others anyone can think of, maybe non-BioPerl related as well?
> > 
> > chris
> > 
> > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> > 
> >> Hi
> >> I was asked this question recently
> >> and it occurred to me I must be doing things inefficiently
> >> To produce gff file I was using SeqIO to parse the required fields,
> >> then
> >> according to the conventions just printing out whatever was
> >> required tab
> >> delimited, which is easy
> >> 
> >> but if I wanted to generate a genbank file, extracting features
> >> from a gff file
> >> and a plain fasta file it was more complicated
> >> is there support for gff in bioperl now?
> >> anyone can contribute with  smart way to go from/to gff, genebank
> >> and embl?
> >> 
> >> thanks very much
> >> 
> >> Lucia Peixoto
> >> Department of Biology,SAS
> >> University of Pennsylvania
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From johnston at biochem.ucl.ac.uk  Mon Nov 19 09:46:03 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT)
Subject: [Bioperl-l] blast database names
In-Reply-To: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
	<B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0711191441010.3141@localhost.localdomain>

On Mon, 19 Nov 2007, Chris Fields wrote:

> Here's a recent list (don't know if it's up-to-date):
>
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

Thanks. Perhaps I missed something in the docs, but I don't think I've
quite understood how this is supposed to work. I'm trying to blast primer
sequences against the ref genome sequence. Should I be using ref_contig?
How can I limit the blast to a single species?

cheers,
Cass.

From Kevin.M.Brown at asu.edu  Mon Nov 19 13:31:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 19 Nov 2007 11:31:38 -0700
Subject: [Bioperl-l] pSW vs dpAlign
Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu>

I was able to get the Ext package installed, just had to copy the
Align.pm file up one directory from where it was being put by the
installer.  Now I have a technician trying to use pSW (Bio::Tools::pSW)
and it appears to have been last updated back in '99 and seems to lack
certain methods to get things out of the alignment like the score.  The
test.pl script that Bio::Ext comes with actually uses
Bio::Tools::dpAlign.  Is dpAlign the replacement for pSW?


From bernd.web at gmail.com  Wed Nov 21 11:42:40 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 17:42:40 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de>
	<470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>

Hi Russell,

I came across your question. At first I thought all was well on my
system, but indeed I also have these colouring problems.
I noted that scrore in the bgcolor callback gets a different value!
Printing score during hit parsing($hit->raw_score) gives the same
score as -description
my $score = $feature->score; However, printing score in the bgcolor
sub gives 2573!
All scores in the bgcolor routine all different and higher than the
real scores. Were you able to solve this colouring issue?

Regards,
Bernd

> Hi all,
> I'm using a modified version of Lincoln's tutorial
> (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> to give a similar image to that from NCBI but for some reason, my
> colours are coming out wrong (see attached example)
> They seem to be off by one but I can't see why.
>
> Any ideas?
>
> I can't be certain but I think it's only started doing this since our
> BLAST upgrade to 2.2.17 a few weeks ago.
>
> Here's the colouring code:
> ------------------------------------------------------------------------
> -------
> my $track = $panel->add_track(
>                               -glyph       => 'segments',
>                               -label       => 1,
>                               -connector   => 'dashed',
>                               -bgcolor     => sub {
>                                 my $feature = shift;
>                                 my $score = $feature->score;
>                         return 'red'       if $score >= 200;
>                                     return 'fuchsia' if $score >= 80;
>                                     return 'lime'      if $score >= 50;
>                         return 'blue'      if $score >= 40;
>                                     return 'black';
>                                },
>                               -font2color  => 'gray',
>                               -sort_order  => 'high_score',
>                               -description => sub {
>                                 my $feature = shift;
>                                 return unless
> $feature->has_tag('description');
>                                 my ($description) =
> $feature->each_tag_value('description');
>                                 my $score = $feature->score;
>                                 "$description, score=$score";
>                                },
>                              );
> ------------------------------------------------------------------------
> ---------
>
>
> Thanx,
>
> Russell Smithies
>
>
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From bernd.web at gmail.com  Wed Nov 21 12:38:30 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 18:38:30 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk>
	<47022278.7010700@web.de> <47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>

Hi,

I now found that bgcolor is using a  $feature->score that is coming
directly from the blast report, it is not the bit score.
     -bgcolor     => sub {my $feature = shift;
                                  my $score = $feature->score;
				  print "$score\n"; }
always print the score, even if the score is not set in the
Bio::SeqFeature::Generic object.

-description callbacks are somehow using the score from the SeqFeature object.

Does anyone have an idea why?

Further is is possible to get the raw_score of a hit. $hit->raw_score
actually gets the bitscore (w/o decimal point).

Bernd

On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> Hi Russell,
>
> I came across your question. At first I thought all was well on my
> system, but indeed I also have these colouring problems.
> I noted that scrore in the bgcolor callback gets a different value!
> Printing score during hit parsing($hit->raw_score) gives the same
> score as -description
> my $score = $feature->score; However, printing score in the bgcolor
> sub gives 2573!
> All scores in the bgcolor routine all different and higher than the
> real scores. Were you able to solve this colouring issue?
>
> Regards,
> Bernd
>
>
> > Hi all,
> > I'm using a modified version of Lincoln's tutorial
> > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > to give a similar image to that from NCBI but for some reason, my
> > colours are coming out wrong (see attached example)
> > They seem to be off by one but I can't see why.
> >
> > Any ideas?
> >
> > I can't be certain but I think it's only started doing this since our
> > BLAST upgrade to 2.2.17 a few weeks ago.
> >
> > Here's the colouring code:
> > ------------------------------------------------------------------------
> > -------
> > my $track = $panel->add_track(
> >                               -glyph       => 'segments',
> >                               -label       => 1,
> >                               -connector   => 'dashed',
> >                               -bgcolor     => sub {
> >                                 my $feature = shift;
> >                                 my $score = $feature->score;
> >                         return 'red'       if $score >= 200;
> >                                     return 'fuchsia' if $score >= 80;
> >                                     return 'lime'      if $score >= 50;
> >                         return 'blue'      if $score >= 40;
> >                                     return 'black';
> >                                },
> >                               -font2color  => 'gray',
> >                               -sort_order  => 'high_score',
> >                               -description => sub {
> >                                 my $feature = shift;
> >                                 return unless
> > $feature->has_tag('description');
> >                                 my ($description) =
> > $feature->each_tag_value('description');
> >                                 my $score = $feature->score;
> >                                 "$description, score=$score";
> >                                },
> >                              );
> > ------------------------------------------------------------------------
> > ---------
> >
> >
> > Thanx,
> >
> > Russell Smithies
> >
> >
> >
> >
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>

From sac at bioperl.org  Wed Nov 21 13:43:54 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 21 Nov 2007 10:43:54 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>

On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> [snip]
>
> Further is is possible to get the raw_score of a hit. $hit->raw_score
> actually gets the bitscore (w/o decimal point).

Hmmm. raw_score should not be the same as bit score. So given an
example blast hit line such as:

       Score = 60.0 bits (30), Expect = 1e-06

$hit->raw_score() should return 30, not 60, as you seem to be getting.

Could you submit a bug report for this?  http://www.bioperl.org/wiki/Bugs

Thanks,
Steve

>
> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> > Hi Russell,
> >
> > I came across your question. At first I thought all was well on my
> > system, but indeed I also have these colouring problems.
> > I noted that scrore in the bgcolor callback gets a different value!
> > Printing score during hit parsing($hit->raw_score) gives the same
> > score as -description
> > my $score = $feature->score; However, printing score in the bgcolor
> > sub gives 2573!
> > All scores in the bgcolor routine all different and higher than the
> > real scores. Were you able to solve this colouring issue?
> >
> > Regards,
> > Bernd
> >
> >
> > > Hi all,
> > > I'm using a modified version of Lincoln's tutorial
> > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > > to give a similar image to that from NCBI but for some reason, my
> > > colours are coming out wrong (see attached example)
> > > They seem to be off by one but I can't see why.
> > >
> > > Any ideas?
> > >
> > > I can't be certain but I think it's only started doing this since our
> > > BLAST upgrade to 2.2.17 a few weeks ago.
> > >
> > > Here's the colouring code:
> > > ------------------------------------------------------------------------
> > > -------
> > > my $track = $panel->add_track(
> > >                               -glyph       => 'segments',
> > >                               -label       => 1,
> > >                               -connector   => 'dashed',
> > >                               -bgcolor     => sub {
> > >                                 my $feature = shift;
> > >                                 my $score = $feature->score;
> > >                         return 'red'       if $score >= 200;
> > >                                     return 'fuchsia' if $score >= 80;
> > >                                     return 'lime'      if $score >= 50;
> > >                         return 'blue'      if $score >= 40;
> > >                                     return 'black';
> > >                                },
> > >                               -font2color  => 'gray',
> > >                               -sort_order  => 'high_score',
> > >                               -description => sub {
> > >                                 my $feature = shift;
> > >                                 return unless
> > > $feature->has_tag('description');
> > >                                 my ($description) =
> > > $feature->each_tag_value('description');
> > >                                 my $score = $feature->score;
> > >                                 "$description, score=$score";
> > >                                },
> > >                              );
> > > ------------------------------------------------------------------------
> > > ---------
> > >
> > >
> > > Thanx,
> > >
> > > Russell Smithies
> > >
> > >
> > >
> > >
> > > =======================================================================
> > > Attention: The information contained in this message and/or attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or privileged
> > > material. Any review, retransmission, dissemination or other use of, or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > > =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From binkley at genome.stanford.edu  Wed Nov 21 19:35:02 2007
From: binkley at genome.stanford.edu (Jonathan Binkley)
Date: Wed, 21 Nov 2007 16:35:02 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>

Hi,

I installed bioperl on a Mac (OS 10.4, Intel) via fink,
which put it here:

/sw/lib/perl5/5.8.6/Bio/

It seems to work fine, but I need bioperl-ext for
Smith-Waterman alignments.

So, into which directory should I download bioperl-ext and
run the Makefile?

Thanks.


From dcj at sanger.ac.uk  Thu Nov 22 09:47:09 2007
From: dcj at sanger.ac.uk (Daniel Jeffares)
Date: Thu, 22 Nov 2007 14:47:09 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
Message-ID: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>

Hi all,

Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to  
be a little 'broken', at least in my hands.
First,  $bml->set_parameter('runmode', 0); does not work (sets  
runmode to -2). setting runmode to 1 is OK.
Also,  $bml->no_param_checks(1); doesn't seem to work.

The result is that the baseml.ctl file created under /tmp is not  
runnable by baseml with runmode 0. The phylip file created is run OK  
by baeml(with another .ctl file). My script & baseml.ctl below.

Hope it can be fixed,

cheers,

Dan


#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
   use Bio::AlignIO;
   my $alignio = Bio::AlignIO->new(-format => 'phylip',-file =>  
'test.phy');
   my $aln = $alignio->next_aln;

   my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
   $bml->alignment($aln);
   $bml->save_tempfiles(1);
   my $tempdir = $bml->tempdir();


   #set the runmode to zero
   $bml->set_parameter('runmode', 0);

   my ($rc,$parser) = $bml->run();
   system "more $tempdir/baseml.ctl";

   while( my $result = $parser->next_result ) {
     my @otus = $result->get_seqs();
     my $MLmatrix = $result->get_MLmatrix();
     # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
   }
exit;


The baseml.ctl file produced:
seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA
outfile = mlb
fix_rho = 1
verbose = 0
noisy = 0
RateAncestor = 1
kappa = 2.5
model = 0
ndata = 5
Small_Diff = 1e-6
runmode = -2
alpha = 0
fix_kappa = 0
rho = 0
nhomo = 0
getSE = 0
cleandata = 1
fix_alpha = 1
clock = 0
Malpha = 0
ncatG = 5
fix_blength = -1
nparK = 0


Regards,

Daniel Jeffares

______________________________
Population and Comparative Genomics
Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
Phone: +44(0)1223 834244 x 7297
Fax: +44 (0)1223 494919
www.sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From David.Messina at sbc.su.se  Thu Nov 22 11:06:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 22 Nov 2007 17:06:16 +0100
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>

Daniel,

I don't have bioperl-run or PAML installed on my system to test it myself,
but have you tried the latest version of bioperl-run from CVS? It looks like
that code has been worked on since 1.5.2 was released.


If that still doesn't work, could you file this as a bug to make sure it
gets followed up?


Dave


You can grab the tarball here:
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl


and if necessary file the bug here:
BioPerl Bugzilla tracking system <http://bugzilla.open-bio.org/>

From arareko at campus.iztacala.unam.mx  Thu Nov 22 11:37:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 22 Nov 2007 10:37:24 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
Message-ID: <4745B044.5090102@campus.iztacala.unam.mx>

Hi Peter,

In BioPerl, there's no such mapping for db_xref's that I'm aware of. 
Each parser handles db_xref records on its own. Take a look at the 
Bio::SeqIO::genbank code, inside the next_seq() method for example:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup

Regards,
Mauricio.

Peter wrote:
> Dear all,
> 
> I'm one of the Biopython developers.  I've recently got going with
> BioSQL and have been getting to grips with the Biopython BioSQL
> interface.  I'm aware that we need to try and be consistent with
> BioPerl and BioJava, so I'd like to pose my first question related to
> that.
> 
> When loading GenBank records, many features have db_xref qualifiers,
> e.g. from a random CDS feature in E. coli K12:
> 
>                      /db_xref="ASAP:1309"
>                      /db_xref="GI:16128366"
>                      /db_xref="ECOCYC:EG10213"
>                      /db_xref="GeneID:945313"
> 
> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
> "GeneID" before using recording these entries in the seqfeature_dbxref
> and dbxref tables.  For example, "GI" becomes "GeneIndex".
> Biopython's current mapping is as follows:
> 
> # Dictionary of database types, keyed by GenBank db_xref abbreviation
> db_dict = {'GeneID': 'Entrez',
>            'GI': 'GeneIndex',
>            'COG': 'COG',
>            'CDD': 'CDD',
>            'DDBJ': 'DNA Databank of Japan',
>            'Entrez': 'Entrez',
>            'GeneIndex': 'GeneIndex',
>            'PUBMED': 'PubMed',
>            'taxon': 'Taxon',
>            'ATCC': 'ATCC',
>            'ISFinder': 'ISFinder',
>            'GOA': 'Gene Ontology Annotation',
>            'ASAP': 'ASAP',
>            'PSEUDO': 'PSEUDO',
>            'InterPro': 'InterPro',
>            'GEO': 'Gene Expression Omnibus',
>            'EMBL': 'EMBL',
>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>            'ECOCYC': 'EcoCyc',
>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>            }
> 
> In my testing, I've found several GenBank db_xref abbreviation for
> which we don't have a mapping defined, such as "LocusID", "dbSNP",
> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
> 
> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
> similar mapping in their BioSQL code (or GenBank parser), so that
> Biopython can follow your example.
> 
> Thank you,
> 
> Peter
> 
> P.S. See also Biopython bug 2405
> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From avilella at gmail.com  Thu Nov 22 16:55:10 2007
From: avilella at gmail.com (Albert Vilella)
Date: Thu, 22 Nov 2007 21:55:10 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>

Hi,

Am I right in thinking that the '_symbols' hash in SimpleAlign is only
used if one calls the symbol_chars method?

When I comment out this line:

map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
$seq->seq; # line 257

I get a nice speed boost on loading alignments.

Can I comment this line out in the CVS HEAD?

Cheers,

    Albert.

[init] 5.96046447753906e-06 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.0022270679473877 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
2.14348912239075 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
6.91910791397095 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
15.8402290344238 secs...

avilella at magneto:~$ perl
/home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl
-dir /home/avilella/ensembl/exoseq/test -verbose
[init] 1.21593475341797e-05 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.00294303894042969 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
0.510555982589722 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
1.6192569732666 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
3.86473417282104 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta]
6.99602198600769 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta]
7.26704716682434 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta]
8.44332504272461 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta]
12.103296995163 secs...

From cjfields at uiuc.edu  Thu Nov 22 19:30:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:30:51 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu>

How are tests affected?  It might be worth going through the revision  
history to see if there was a specific reason this was implemented,  
but if it passes tests I don't see why we need it.

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 22 19:42:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:42:12 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
	<4745B044.5090102@campus.iztacala.unam.mx>
Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu>

I think SeqIO checks the name for parsing reasons only, in cases  
where the format changes based on the source (such as GenPept  
DBSOURCE data).  I don't think we go beyond that in Bioperl, probably  
b/c modifying or expanding names for data persistence would lead to  
volatile coding issues (i.e. consistency between parsers, constant  
updating to cover new crossrefs, etc).

I would definitely suggest retaining the original DB as it appears in  
the dbxref for consistency/sanity; if needed return expanded names  
using a different method if they are designated.

chris

On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote:

> Hi Peter,
>
> In BioPerl, there's no such mapping for db_xref's that I'm aware of.
> Each parser handles db_xref records on its own. Take a look at the
> Bio::SeqIO::genbank code, inside the next_seq() method for example:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
> Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup
>
> Regards,
> Mauricio.
>
> Peter wrote:
>> Dear all,
>>
>> I'm one of the Biopython developers.  I've recently got going with
>> BioSQL and have been getting to grips with the Biopython BioSQL
>> interface.  I'm aware that we need to try and be consistent with
>> BioPerl and BioJava, so I'd like to pose my first question related to
>> that.
>>
>> When loading GenBank records, many features have db_xref qualifiers,
>> e.g. from a random CDS feature in E. coli K12:
>>
>>                      /db_xref="ASAP:1309"
>>                      /db_xref="GI:16128366"
>>                      /db_xref="ECOCYC:EG10213"
>>                      /db_xref="GeneID:945313"
>>
>> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
>> "GeneID" before using recording these entries in the  
>> seqfeature_dbxref
>> and dbxref tables.  For example, "GI" becomes "GeneIndex".
>> Biopython's current mapping is as follows:
>>
>> # Dictionary of database types, keyed by GenBank db_xref abbreviation
>> db_dict = {'GeneID': 'Entrez',
>>            'GI': 'GeneIndex',
>>            'COG': 'COG',
>>            'CDD': 'CDD',
>>            'DDBJ': 'DNA Databank of Japan',
>>            'Entrez': 'Entrez',
>>            'GeneIndex': 'GeneIndex',
>>            'PUBMED': 'PubMed',
>>            'taxon': 'Taxon',
>>            'ATCC': 'ATCC',
>>            'ISFinder': 'ISFinder',
>>            'GOA': 'Gene Ontology Annotation',
>>            'ASAP': 'ASAP',
>>            'PSEUDO': 'PSEUDO',
>>            'InterPro': 'InterPro',
>>            'GEO': 'Gene Expression Omnibus',
>>            'EMBL': 'EMBL',
>>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>>            'ECOCYC': 'EcoCyc',
>>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>>            }
>>
>> In my testing, I've found several GenBank db_xref abbreviation for
>> which we don't have a mapping defined, such as "LocusID", "dbSNP",
>> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
>>
>> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
>> similar mapping in their BioSQL code (or GenBank parser), so that
>> Biopython can follow your example.
>>
>> Thank you,
>>
>> Peter
>>
>> P.S. See also Biopython bug 2405
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 22 19:49:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:49:15 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>

Albert,

Found it:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
SimpleAlign.pm.diff?r1=1.36&r2=1.37

If it slows performance that dramatically, maybe we can move this to  
a separate AlignUtils method instead.  Maybe something to ask Jason  
about?

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 23 07:29:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Nov 2007 12:29:37 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
Message-ID: <4746C7B1.1010002@sendu.me.uk>

Dave Messina wrote:
> Daniel,
> 
> I don't have bioperl-run or PAML installed on my system to test it myself,
> but have you tried the latest version of bioperl-run from CVS? It looks like
> that code has been worked on since 1.5.2 was released.

Yes, I fixed it in CVS so it should at least /run/. I don't know about 
the parsing side of things, though that may also have been fixed 
recently by someone else.


From avilella at gmail.com  Fri Nov 23 08:08:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Nov 2007 13:08:59 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <4746C7B1.1010002@sendu.me.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>

Just to mention that the new paml4 has a "basemlg" instead of a
"baseml" binary. AFAIK, Jason fixed codeml to make it work both for
paml3.xx a paml4, but I am not sure about baseml.

Also, I think if you set runmode 0, you have to provide a tree:

#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
use Bio::AlignIO;
use Bio::TreeIO;
my $alignio = Bio::AlignIO->new(-format => 'phylip',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy');
my $treeio = Bio::TreeIO->new(-format => 'newick',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree');
my $aln = $alignio->next_aln;
my $tree = $treeio->next_tree;

my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
$bml->alignment($aln);
$bml->tree($tree);
$bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml");
$bml->save_tempfiles(1);
my $tempdir = $bml->tempdir();


#set the runmode to zero
$bml->set_parameter('runmode', 0);

my ($rc,$parser) = $bml->run();
system "more $tempdir/baseml.ctl";

while ( my $result = $parser->next_result ) {
    my @otus = $result->get_seqs();
    my $MLmatrix = $result->get_MLmatrix();
    $DB::single=1;1;
    # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
}
exit;

4 50
Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC
Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC
Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC-
Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC

ACAUUUU-CCUUGCAAAG
ACAUCAU-CCUUGCAAAG
ACAUCAUCCCUCGCAGAG
ACAUCAUCCCUUGCAGAG

(((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm);
On Nov 23, 2007 12:29 PM, Sendu Bala <bix at sendu.me.uk> wrote:
> Dave Messina wrote:
> > Daniel,
> >
> > I don't have bioperl-run or PAML installed on my system to test it myself,
> > but have you tried the latest version of bioperl-run from CVS? It looks like
> > that code has been worked on since 1.5.2 was released.
>
> Yes, I fixed it in CVS so it should at least /run/. I don't know about
> the parsing side of things, though that may also have been fixed
> recently by someone else.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at uiuc.edu  Fri Nov 23 11:24:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 10:24:59 -0600
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
	<358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu>

I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just  
'basemlg'), so it would need to work with both.

Do we want to put a PAML parser/wrapper overhaul on the TODO list for  
1.6?

chris

On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote:

> Just to mention that the new paml4 has a "basemlg" instead of a
> "baseml" binary. AFAIK, Jason fixed codeml to make it work both for
> paml3.xx a paml4, but I am not sure about baseml.
...


From arvindvanam at gmail.com  Fri Nov 23 16:26:06 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST)
Subject: [Bioperl-l]  run RNAfold in perl
Message-ID: <13918981.post@talk.nabble.com>


how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
my $rnafold = $factory->program('rnafold');
my $job=$rnafold->run(-rnafold =>
'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');

I installed Vienna package and then i tried using Pise to create an object
for the program but its giving the following error
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bio::Tools::Run::PiseJob terminated: URL missing
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Tools::Run::PiseJob::terminated
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
STACK: Bio::Tools::Run::PiseApplication::submit
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
STACK: Bio::Tools::Run::PiseApplication::run
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
STACK: evaluate.pl:12


how to make the program RNAfold run in perl... 
IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???

plz reply soon
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Fri Nov 23 17:49:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 16:49:43 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13918981.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>

The Pise wrappers run the programs remotely; see  
Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
mfold wrappers but haven't done so yet.  The Vienna tools do have a  
Perl-based (non-BioPerl-based) module included which uses libRNA, and  
is well worth a look.  Try 'perldoc RNA' if you have installed the  
tools locally, or look here for other Perl-based tools:

http://www.tbi.univie.ac.at/~ivo/RNA/utils.html

chris

On Nov 23, 2007, at 3:26 PM, vanam wrote:

>
> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>
> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
> my $rnafold = $factory->program('rnafold');
> my $job=$rnafold->run(-rnafold =>
> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>
> I installed Vienna package and then i tried using Pise to create an  
> object
> for the program but its giving the following error
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Tools::Run::PiseJob::terminated
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
> STACK: Bio::Tools::Run::PiseApplication::submit
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
> STACK: Bio::Tools::Run::PiseApplication::run
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
> STACK: evaluate.pl:12
>
>
> how to make the program RNAfold run in perl...
> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>
> plz reply soon
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13918981
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Sat Nov 24 02:29:11 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
Message-ID: <13922740.post@talk.nabble.com>


i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and
i used it exactly as it was mentioned in it.

i just want that instead of running its perl version "RNAfold.pl" I can use
the functions associated with RNAfold with a perl program without having to
call the program using system() command.

if you can just tell me how to use these wrapper modules it would b of gr8
help...like while using clustalw or clustalx we define the environment
variable for it ..do we have to do the same for RNAfold or Mfold


Chris Fields wrote:
> 
> The Pise wrappers run the programs remotely; see  
> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
> mfold wrappers but haven't done so yet.  The Vienna tools do have a  
> Perl-based (non-BioPerl-based) module included which uses libRNA, and  
> is well worth a look.  Try 'perldoc RNA' if you have installed the  
> tools locally, or look here for other Perl-based tools:
> 
> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
> 
> chris
> 
> On Nov 23, 2007, at 3:26 PM, vanam wrote:
> 
>>
>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>
>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>> my $rnafold = $factory->program('rnafold');
>> my $job=$rnafold->run(-rnafold =>
>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>
>> I installed Vienna package and then i tried using Pise to create an  
>> object
>> for the program but its giving the following error
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::Tools::Run::PiseJob::terminated
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>> STACK: Bio::Tools::Run::PiseApplication::submit
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>> STACK: Bio::Tools::Run::PiseApplication::run
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>> STACK: evaluate.pl:12
>>
>>
>> how to make the program RNAfold run in perl...
>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>
>> plz reply soon
>> -- 
>> View this message in context: http://www.nabble.com/run-RNAfold-in- 
>> perl-tf4863835.html#a13918981
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From avilella at gmail.com  Sun Nov 25 06:50:42 2007
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 25 Nov 2007 11:50:42 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>

cvs commited now. it is calculated anyway when calling symbol_chars so...

On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> Found it:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>
> If it slows performance that dramatically, maybe we can move this to
> a separate AlignUtils method instead.  Maybe something to ask Jason
> about?
>
> chris
>
> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>
>
> > Hi,
> >
> > Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> > used if one calls the symbol_chars method?
> >
> > When I comment out this line:
> >
> > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> > $seq->seq; # line 257
> >
> > I get a nice speed boost on loading alignments.
> >
> > Can I comment this line out in the CVS HEAD?
> >
> > Cheers,
> >
> >     Albert.
> >
> > [init] 5.96046447753906e-06 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.0022270679473877 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 2.14348912239075 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 6.91910791397095 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 15.8402290344238 secs...
> >
> > avilella at magneto:~$ perl
> > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> > ancestral_alleles.pl
> > -dir /home/avilella/ensembl/exoseq/test -verbose
> > [init] 1.21593475341797e-05 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.00294303894042969 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 0.510555982589722 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 1.6192569732666 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 3.86473417282104 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000203717.chr1.fasta]
> > 6.99602198600769 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000196188.chr1.fasta]
> > 7.26704716682434 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000025800.chr1.fasta]
> > 8.44332504272461 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000117475.chr1.fasta]
> > 12.103296995163 secs...
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From cjfields at uiuc.edu  Sun Nov 25 10:05:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:05:27 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13922740.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>

Again, these wrappers are for submitting data to a Pise server for  
the corresponding programs (run on a remote server).  There are no  
wrappers for running RNAfold on your computer (i.e. locally), with or  
w/o a set env. variable.  You can try instaling Pise locally and  
setting the location() as shown in POD to localhost, however I don't  
know how stable these modules are with newer versions of Pise.  These  
haven't been updated in a few years, apart from getting tests to work.

Another option is installing EMBOSS along with the EMBASSY version of  
RNAFold; this could conceivably be run through Bio::Factory::EMBOSS.

chris

On Nov 24, 2007, at 1:29 AM, vanam wrote:

>
> i have seen the documentation for  
> Bio::Tools::Run::AnalysisFactory::Pise and
> i used it exactly as it was mentioned in it.
>
> i just want that instead of running its perl version "RNAfold.pl" I  
> can use
> the functions associated with RNAfold with a perl program without  
> having to
> call the program using system() command.
>
> if you can just tell me how to use these wrapper modules it would b  
> of gr8
> help...like while using clustalw or clustalx we define the environment
> variable for it ..do we have to do the same for RNAfold or Mfold
>
>
>
>
> Chris Fields wrote:
>>
>> The Pise wrappers run the programs remotely; see
>> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a
>> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/
>> mfold wrappers but haven't done so yet.  The Vienna tools do have a
>> Perl-based (non-BioPerl-based) module included which uses libRNA, and
>> is well worth a look.  Try 'perldoc RNA' if you have installed the
>> tools locally, or look here for other Perl-based tools:
>>
>> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
>>
>> chris
>>
>> On Nov 23, 2007, at 3:26 PM, vanam wrote:
>>
>>>
>>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>>
>>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>>> my $rnafold = $factory->program('rnafold');
>>> my $job=$rnafold->run(-rnafold =>
>>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>>
>>> I installed Vienna package and then i tried using Pise to create an
>>> object
>>> for the program but its giving the following error
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::Tools::Run::PiseJob::terminated
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>>> STACK: Bio::Tools::Run::PiseApplication::submit
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>>> STACK: Bio::Tools::Run::PiseApplication::run
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>>> STACK: evaluate.pl:12
>>>
>>>
>>> how to make the program RNAfold run in perl...
>>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>>
>>> plz reply soon
>>> -- 
>>> View this message in context: http://www.nabble.com/run-RNAfold-in-
>>> perl-tf4863835.html#a13918981
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13922740
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 10:38:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:38:40 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
Message-ID: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>

Albert,

I was getting a single AlignIO.t fail which appeared to be related to  
this:

...
ok 122 - The object isa Bio::Align::AlignI
ok 123 - consensus_string on metafasta

not ok 124 - symbol_chars() using metafasta
#   Failed test 'symbol_chars() using metafasta'
#   in t/AlignIO.t at line 346.
#          got: '0'
#     expected: '23'

It was b/c the symbol hash was initialized in the constructor (so it  
was present, just empty).  I have changed that in CVS; all tests pass  
now.

chris

On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:

> cvs commited now. it is calculated anyway when calling symbol_chars  
> so...
>
> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> Found it:
>>
>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
>> Bio/
>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>
>> If it slows performance that dramatically, maybe we can move this to
>> a separate AlignUtils method instead.  Maybe something to ask Jason
>> about?
>>
>> chris
>>
>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>
>>
>>> Hi,
>>>
>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is  
>>> only
>>> used if one calls the symbol_chars method?
>>>
>>> When I comment out this line:
>>>
>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>> $seq->seq; # line 257
>>>
>>> I get a nice speed boost on loading alignments.
>>>
>>> Can I comment this line out in the CVS HEAD?
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> [init] 5.96046447753906e-06 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.0022270679473877 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 2.14348912239075 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 6.91910791397095 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 15.8402290344238 secs...
>>>
>>> avilella at magneto:~$ perl
>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>> ancestral_alleles.pl
>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>> [init] 1.21593475341797e-05 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.00294303894042969 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 0.510555982589722 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 1.6192569732666 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 3.86473417282104 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000203717.chr1.fasta]
>>> 6.99602198600769 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000196188.chr1.fasta]
>>> 7.26704716682434 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000025800.chr1.fasta]
>>> 8.44332504272461 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000117475.chr1.fasta]
>>> 12.103296995163 secs...
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Sun Nov 25 11:13:44 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 25 Nov 2007 17:13:44 +0100
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>

Hi,

I am not sure if this is related, but I remember SimpleAlign was
adapted to cope with more gap symbols that can occur in
alignments/FastA sequences, as: . _ - =
Previous versions would throw an error on 'illegal' gap characters,

Regards,
Bernd

On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> I was getting a single AlignIO.t fail which appeared to be related to
> this:
>
> ...
> ok 122 - The object isa Bio::Align::AlignI
> ok 123 - consensus_string on metafasta
>
> not ok 124 - symbol_chars() using metafasta
> #   Failed test 'symbol_chars() using metafasta'
> #   in t/AlignIO.t at line 346.
> #          got: '0'
> #     expected: '23'
>
> It was b/c the symbol hash was initialized in the constructor (so it
> was present, just empty).  I have changed that in CVS; all tests pass
> now.
>
> chris
>
>
> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>
> > cvs commited now. it is calculated anyway when calling symbol_chars
> > so...
> >
> > On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Albert,
> >>
> >> Found it:
> >>
> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >> Bio/
> >> SimpleAlign.pm.diff?r1=1.36&r2=1.37
> >>
> >> If it slows performance that dramatically, maybe we can move this to
> >> a separate AlignUtils method instead.  Maybe something to ask Jason
> >> about?
> >>
> >> chris
> >>
> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
> >>> only
> >>> used if one calls the symbol_chars method?
> >>>
> >>> When I comment out this line:
> >>>
> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> >>> $seq->seq; # line 257
> >>>
> >>> I get a nice speed boost on loading alignments.
> >>>
> >>> Can I comment this line out in the CVS HEAD?
> >>>
> >>> Cheers,
> >>>
> >>>     Albert.
> >>>
> >>> [init] 5.96046447753906e-06 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.0022270679473877 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 2.14348912239075 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 6.91910791397095 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 15.8402290344238 secs...
> >>>
> >>> avilella at magneto:~$ perl
> >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> >>> ancestral_alleles.pl
> >>> -dir /home/avilella/ensembl/exoseq/test -verbose
> >>> [init] 1.21593475341797e-05 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.00294303894042969 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 0.510555982589722 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 1.6192569732666 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 3.86473417282104 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000203717.chr1.fasta]
> >>> 6.99602198600769 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000196188.chr1.fasta]
> >>> 7.26704716682434 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000025800.chr1.fasta]
> >>> 8.44332504272461 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000117475.chr1.fasta]
> >>> 12.103296995163 secs...
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at uiuc.edu  Sun Nov 25 11:39:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 10:39:01 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
	<716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
Message-ID: <B849A608-7C12-4C87-BB93-D846959F0523@uiuc.edu>

Bernd,

That would be when generating Bio::LocatableSeq instances for  
building a Bio::SimpleAlign object.  Judging by test suite results  
that doesn't appear to be affected.

chris

On Nov 25, 2007, at 10:13 AM, Bernd Web wrote:

> Hi,
>
> I am not sure if this is related, but I remember SimpleAlign was
> adapted to cope with more gap symbols that can occur in
> alignments/FastA sequences, as: . _ - =
> Previous versions would throw an error on 'illegal' gap characters,
>
> Regards,
> Bernd
>
> On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> I was getting a single AlignIO.t fail which appeared to be related to
>> this:
>>
>> ...
>> ok 122 - The object isa Bio::Align::AlignI
>> ok 123 - consensus_string on metafasta
>>
>> not ok 124 - symbol_chars() using metafasta
>> #   Failed test 'symbol_chars() using metafasta'
>> #   in t/AlignIO.t at line 346.
>> #          got: '0'
>> #     expected: '23'
>>
>> It was b/c the symbol hash was initialized in the constructor (so it
>> was present, just empty).  I have changed that in CVS; all tests pass
>> now.
>>
>> chris
>>
>>
>> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>>
>>> cvs commited now. it is calculated anyway when calling symbol_chars
>>> so...
>>>
>>> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>>>> Albert,
>>>>
>>>> Found it:
>>>>
>>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
>>>> Bio/
>>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>>>
>>>> If it slows performance that dramatically, maybe we can move  
>>>> this to
>>>> a separate AlignUtils method instead.  Maybe something to ask Jason
>>>> about?
>>>>
>>>> chris
>>>>
>>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
>>>>> only
>>>>> used if one calls the symbol_chars method?
>>>>>
>>>>> When I comment out this line:
>>>>>
>>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>>>> $seq->seq; # line 257
>>>>>
>>>>> I get a nice speed boost on loading alignments.
>>>>>
>>>>> Can I comment this line out in the CVS HEAD?
>>>>>
>>>>> Cheers,
>>>>>
>>>>>     Albert.
>>>>>
>>>>> [init] 5.96046447753906e-06 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.0022270679473877 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 2.14348912239075 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 6.91910791397095 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 15.8402290344238 secs...
>>>>>
>>>>> avilella at magneto:~$ perl
>>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>>>> ancestral_alleles.pl
>>>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>>>> [init] 1.21593475341797e-05 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.00294303894042969 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 0.510555982589722 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 1.6192569732666 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 3.86473417282104 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000203717.chr1.fasta]
>>>>> 6.99602198600769 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000196188.chr1.fasta]
>>>>> 7.26704716682434 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000025800.chr1.fasta]
>>>>> 8.44332504272461 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000117475.chr1.fasta]
>>>>> 12.103296995163 secs...
>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 13:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 12:51:42 -0600
Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live
Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu>

I have been making some significant changes to  
Bio::SeqIO::staden::read over the last few months which incorporate  
code from Bugzilla (bugs 2074 and 2329, very kindly donated from  
Chris Bailey and Joel Martin, cheers!).

Significant Changes:

* All Inline code in staden::read are now XS-based
* A new method has been added to Bio::SeqIO::staden::read for  
optionally getting trace data (i.e. for drawing graphs).

The method ode is now implemented in Bio::SeqIO::abi, with example  
code in examples/quality/svgtrace.pl.  These changes should allow  
newer versions of Staden io_lib as well (the code is tested with  
io_lib 1.9.2), though they haven't been tested extensively as I am  
having problems compiling newer io_lib versions on my MacBook.  It's  
very likely more changes will need to be made along the way; some  
issues were found with XS compilation which appear harmless but need  
to be investigated, and trace data from other formats need to be  
evaluated.  The possibility exists that many of these changes break  
backward compatibility with older bioperl releases, though tests  
passed with bioperl 1.5.2.

Any feedback re: platform issues, test results using newer io_lib  
versions, older bioperl-versions, etc would be greatly appreciated.   
I'm hoping this will stimulate more interest in getting other bioperl- 
ext modules up-to-date with bioperl-live.

chris

From cjfields at uiuc.edu  Mon Nov 26 13:59:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 12:59:23 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
Message-ID: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>

Steve, Bernd, (and Jason, since you may have some input on this as  
well),

I am now looking into the bug Bernd submitted and it seems there is a  
serious discrepancy with the way the hit raw_score, bits, and  
significance is determined for Hit objects.  Unless I am mistaken  
these should always come from the best HSP when they are present,  
falling back to the hit table data only when no HSP alignments are  
present.  Under the latter conditions a minimal Hit object is made  
from data in the hit table, which reports the rounded bit score, not  
the raw score, so in those cases the raw score would be undefined  
(and you probably should get a nasty warning indicating there are no  
HSPs present to get the data from).

What is occurring now, though, is the raw_score and significance is  
explicitly set from the hit table in the BLAST parser for the Hit  
object at all times, while the bits are correctly derived from the  
best HSP (no fallback to the hit table).  Changing to the behavior  
above results in several tests failing via SearchIO.t, with each  
failed test reporting the expected (read:correct) raw score.

I'll look through the tests just in case, but I am planning on  
committing changes to the BLAST parsers, GenericHit, and SearchIO.t  
(to reflect the correct expected data) in the next day or two unless  
there are any objections.

chris

On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:

> On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
>> [snip]
>>
>> Further is is possible to get the raw_score of a hit. $hit->raw_score
>> actually gets the bitscore (w/o decimal point).
>
> Hmmm. raw_score should not be the same as bit score. So given an
> example blast hit line such as:
>
>        Score = 60.0 bits (30), Expect = 1e-06
>
> $hit->raw_score() should return 30, not 60, as you seem to be getting.
>
> Could you submit a bug report for this?  http://www.bioperl.org/ 
> wiki/Bugs
>
> Thanks,
> Steve
>
>>
>> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
>>> Hi Russell,
>>>
>>> I came across your question. At first I thought all was well on my
>>> system, but indeed I also have these colouring problems.
>>> I noted that scrore in the bgcolor callback gets a different value!
>>> Printing score during hit parsing($hit->raw_score) gives the same
>>> score as -description
>>> my $score = $feature->score; However, printing score in the bgcolor
>>> sub gives 2573!
>>> All scores in the bgcolor routine all different and higher than the
>>> real scores. Were you able to solve this colouring issue?
>>>
>>> Regards,
>>> Bernd
>>>
>>>
>>>> Hi all,
>>>> I'm using a modified version of Lincoln's tutorial
>>>> (http://www.bioperl.org/wiki/ 
>>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
>>>> and I'm colouring the HSPs by setting the -bgcolor by score with  
>>>> a sub
>>>> to give a similar image to that from NCBI but for some reason, my
>>>> colours are coming out wrong (see attached example)
>>>> They seem to be off by one but I can't see why.
>>>>
>>>> Any ideas?
>>>>
>>>> I can't be certain but I think it's only started doing this  
>>>> since our
>>>> BLAST upgrade to 2.2.17 a few weeks ago.
>>>>
>>>> Here's the colouring code:
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> -------
>>>> my $track = $panel->add_track(
>>>>                               -glyph       => 'segments',
>>>>                               -label       => 1,
>>>>                               -connector   => 'dashed',
>>>>                               -bgcolor     => sub {
>>>>                                 my $feature = shift;
>>>>                                 my $score = $feature->score;
>>>>                         return 'red'       if $score >= 200;
>>>>                                     return 'fuchsia' if $score  
>>>> >= 80;
>>>>                                     return 'lime'      if $score  
>>>> >= 50;
>>>>                         return 'blue'      if $score >= 40;
>>>>                                     return 'black';
>>>>                                },
>>>>                               -font2color  => 'gray',
>>>>                               -sort_order  => 'high_score',
>>>>                               -description => sub {
>>>>                                 my $feature = shift;
>>>>                                 return unless
>>>> $feature->has_tag('description');
>>>>                                 my ($description) =
>>>> $feature->each_tag_value('description');
>>>>                                 my $score = $feature->score;
>>>>                                 "$description, score=$score";
>>>>                                },
>>>>                              );
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> ---------
>>>>
>>>>
>>>> Thanx,
>>>>
>>>> Russell Smithies
>>>>
>>>>
>>>>
>>>>
>>>> =================================================================== 
>>>> ====
>>>> Attention: The information contained in this message and/or  
>>>> attachments
>>>> from AgResearch Limited is intended only for the persons or  
>>>> entities
>>>> to which it is addressed and may contain confidential and/or  
>>>> privileged
>>>> material. Any review, retransmission, dissemination or other use  
>>>> of, or
>>>> taking of any action in reliance upon, this information by  
>>>> persons or
>>>> entities other than the intended recipients is prohibited by  
>>>> AgResearch
>>>> Limited. If you have received this message in error, please  
>>>> notify the
>>>> sender immediately.
>>>> =================================================================== 
>>>> ====
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Mon Nov 26 14:08:41 2007
From: arvindvanam at gmail.com (vanam)
Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
Message-ID: <13955209.post@talk.nabble.com>


i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m
unable to find a downloadable version.all ther is a web interface for it.
can u tell frm wher to fdownload it????

or can you just tell me how to set the location in piseapplication to
localhost n wat to enter in the $email variable????
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Nov 26 15:08:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 14:08:24 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13955209.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
	<13955209.post@talk.nabble.com>
Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu>


On Nov 26, 2007, at 1:08 PM, vanam wrote:

> i searches for the embassy version of RNAFOLD (i guess its  
> vrnafold) but i m
> unable to find a downloadable version.all ther is a web interface  
> for it.
> can u tell frm wher to fdownload it????

You will need to install EMBOSS as well as the EMBASSY version of  
VIENNA (something which is documented in the docs that come along  
with the distributions and I will not go into detail on):

ftp://emboss.open-bio.org/pub/EMBOSS/

This would be your best bet.  Understand that there is no specific  
class framework for dealing with RNA secondary structure in BioPerl,  
so you will be on your own for now.

My suggestion for using Pise had the very important caveats that (1)  
it very well may not work, (2) I have no experience with Pise, let  
alone setting it up locally, therefore (3) I haven't tested it (and  
don't intend to, as I don't have the time).

> or can you just tell me how to set the location in piseapplication to
> localhost n wat to enter in the $email variable????

I have pointed out documentation previously which comes with the  
modules in question.  Remember perldoc is your friend; consulting it  
saves me (and everyone else) time.

 From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise':

----------------------------------------------

DESCRIPTION

Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli-
cation objects, that let you submit jobs on a Pise server.

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                             -email => 'me at myhome');

The email is optional (there is default one). It can be useful, though.
Your program might enter infinite loops, or just run many jobs: the
Pise server maintainer needs a contact (s/he could of course cancel any
requests from your address...). And if you plan to run a lot of heavy
jobs, or to do a course with many students, please ask the maintainer
before.

The location parameter stands for the actual CGI location, except when
set at the factory creation step, where it is rather the root of all
CGI.  There are default values for most of Pise programs.

You can either set location at:

1 factory creation:
      my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                     -location => 'http://somewhere/ 
Pise/cgi-bin',
                                     -email => 'me at myhome');

2 program creation:
      my $program = $factory->program('water',
                               -location => 'http://somewhere/Pise/ 
cgi-bin/water.pl'
                                      );

3 any time before running:
      $program->location('http://somewhere/Pise/cgi-bin/water.pl');
      $job = $program->run();

4 when running:
      $job = $program->run(-location => 'http://somewhere/Pise/cgi- 
bin/water.pl');

You can also retrieve a previous job results by providing its url:

   $job = $factory->job($url);

You get the url of a job by:

   $job->jobid;

----------------------------------------------

chris


From sac at bioperl.org  Mon Nov 26 20:41:59 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 17:41:59 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>

Chris,

Cood catch. You're on track here with one exception: WU blast and NCBI
blast behave differently in what they report in the hit table: WU
blast puts the raw score in the table not the bit score as NCBI blast
does (see example below for reference). WU blast also swaps their
location in the HSP header relative to how NCBI reports it. It would
be good to verify that the blast parser isn't befuddled by this. A
quick look at SearchIO::blast and it appears that data from the hit
table is always getting stored as score, not bits for WU blast. Not
sure if the HSP section data are parsed correctly. I'd recommend
looking into these things when you do your fixes.

So in the end, WU blast HSPs that are built from the hit table should
report a value for raw_score and punt on bits, but NCBI HSPs so
constructed should do the opposite. The downside to this arrangement
is that code that works for NCBI blast hits will need modification to
work for WU blast hits, but that is just the nature of the data. It
shouldn't be an issue for the majority of users that stick with one
flavor of blast and don't switch back and forth, or for users that get
their HSP scoring data from HSP sections rather than relying on the
hit table.

Ideally, the HSP object would know whether it was NCBI or WU-based and
issue an informative warning when attempting to access data it doesn't
have. One solution might be for the parser to put a 'WU-' in front of
the algorithm name for WU blast reports, so it would then be available
for the contained hit/hsp objects. This could break anything dependent
on algorithm name, so it would need some testing.

Steve

Example WU blast table and HSP header:
                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh...  4141  0.0       1
gb|AAC76922.1| (AE000468) aspartokinase II and homoserine...   844  3.1e-86   1
gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi...   483  2.8e-47   1
gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c...    97  0.0010    1

>gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I
            [Escherichia coli]
        Length = 820

 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0
 Identities = 820/820 (100%), Positives = 820/820 (100%)


Example NCBI blast table and HSP header:

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189...
115   8e-26

>ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397
           transcript:ENST00000357569
          Length = 425

 Score =  120 bits (301), Expect = 3e-27
 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%)


On Nov 26, 2007 10:59 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Steve, Bernd, (and Jason, since you may have some input on this as
> well),
>
> I am now looking into the bug Bernd submitted and it seems there is a
> serious discrepancy with the way the hit raw_score, bits, and
> significance is determined for Hit objects.  Unless I am mistaken
> these should always come from the best HSP when they are present,
> falling back to the hit table data only when no HSP alignments are
> present.  Under the latter conditions a minimal Hit object is made
> from data in the hit table, which reports the rounded bit score, not
> the raw score, so in those cases the raw score would be undefined
> (and you probably should get a nasty warning indicating there are no
> HSPs present to get the data from).
>
> What is occurring now, though, is the raw_score and significance is
> explicitly set from the hit table in the BLAST parser for the Hit
> object at all times, while the bits are correctly derived from the
> best HSP (no fallback to the hit table).  Changing to the behavior
> above results in several tests failing via SearchIO.t, with each
> failed test reporting the expected (read:correct) raw score.
>
> I'll look through the tests just in case, but I am planning on
> committing changes to the BLAST parsers, GenericHit, and SearchIO.t
> (to reflect the correct expected data) in the next day or two unless
> there are any objections.
>
> chris
>
>
> On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:
>
> > On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> >> [snip]
> >>
> >> Further is is possible to get the raw_score of a hit. $hit->raw_score
> >> actually gets the bitscore (w/o decimal point).
> >
> > Hmmm. raw_score should not be the same as bit score. So given an
> > example blast hit line such as:
> >
> >        Score = 60.0 bits (30), Expect = 1e-06
> >
> > $hit->raw_score() should return 30, not 60, as you seem to be getting.
> >
> > Could you submit a bug report for this?  http://www.bioperl.org/
> > wiki/Bugs
> >
> > Thanks,
> > Steve
> >
> >>
> >> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> >>> Hi Russell,
> >>>
> >>> I came across your question. At first I thought all was well on my
> >>> system, but indeed I also have these colouring problems.
> >>> I noted that scrore in the bgcolor callback gets a different value!
> >>> Printing score during hit parsing($hit->raw_score) gives the same
> >>> score as -description
> >>> my $score = $feature->score; However, printing score in the bgcolor
> >>> sub gives 2573!
> >>> All scores in the bgcolor routine all different and higher than the
> >>> real scores. Were you able to solve this colouring issue?
> >>>
> >>> Regards,
> >>> Bernd
> >>>
> >>>
> >>>> Hi all,
> >>>> I'm using a modified version of Lincoln's tutorial
> >>>> (http://www.bioperl.org/wiki/
> >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
> >>>> and I'm colouring the HSPs by setting the -bgcolor by score with
> >>>> a sub
> >>>> to give a similar image to that from NCBI but for some reason, my
> >>>> colours are coming out wrong (see attached example)
> >>>> They seem to be off by one but I can't see why.
> >>>>
> >>>> Any ideas?
> >>>>
> >>>> I can't be certain but I think it's only started doing this
> >>>> since our
> >>>> BLAST upgrade to 2.2.17 a few weeks ago.
> >>>>
> >>>> Here's the colouring code:
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> -------
> >>>> my $track = $panel->add_track(
> >>>>                               -glyph       => 'segments',
> >>>>                               -label       => 1,
> >>>>                               -connector   => 'dashed',
> >>>>                               -bgcolor     => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 my $score = $feature->score;
> >>>>                         return 'red'       if $score >= 200;
> >>>>                                     return 'fuchsia' if $score
> >>>> >= 80;
> >>>>                                     return 'lime'      if $score
> >>>> >= 50;
> >>>>                         return 'blue'      if $score >= 40;
> >>>>                                     return 'black';
> >>>>                                },
> >>>>                               -font2color  => 'gray',
> >>>>                               -sort_order  => 'high_score',
> >>>>                               -description => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 return unless
> >>>> $feature->has_tag('description');
> >>>>                                 my ($description) =
> >>>> $feature->each_tag_value('description');
> >>>>                                 my $score = $feature->score;
> >>>>                                 "$description, score=$score";
> >>>>                                },
> >>>>                              );
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> ---------
> >>>>
> >>>>
> >>>> Thanx,
> >>>>
> >>>> Russell Smithies
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ===================================================================
> >>>> ====
> >>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>> from AgResearch Limited is intended only for the persons or
> >>>> entities
> >>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>> material. Any review, retransmission, dissemination or other use
> >>>> of, or
> >>>> taking of any action in reliance upon, this information by
> >>>> persons or
> >>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>> Limited. If you have received this message in error, please
> >>>> notify the
> >>>> sender immediately.
> >>>> ===================================================================
> >>>> ====
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From sac at bioperl.org  Mon Nov 26 22:27:09 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 19:27:09 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com>

Hi Jon,

I'd recommend downloading it into a separate location of your choosing
(~/lib/bioperl-ext for example) and running the installer as specified
in the docs that come with the download. Then you can include the
location you installed it into via a "use lib '~/lib/bioperl-ext'"
statement at the top of your script. It may be handy to install it as
a non-root user so that you don't alter the main perl installation.

This way your ext install will stay separate from your main bioperl
and perl installations.

There are some docs about the ext packages you might want to check out
at http://www.bioperl.org/wiki/Ext_package.

Steve

On Nov 21, 2007 4:35 PM, Jonathan Binkley <binkley at genome.stanford.edu> wrote:
> Hi,
>
> I installed bioperl on a Mac (OS 10.4, Intel) via fink,
> which put it here:
>
> /sw/lib/perl5/5.8.6/Bio/
>
> It seems to work fine, but I need bioperl-ext for
> Smith-Waterman alignments.
>
> So, into which directory should I download bioperl-ext and
> run the Makefile?
>
> Thanks.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From a_arya2000 at yahoo.com  Tue Nov 27 09:51:41 2007
From: a_arya2000 at yahoo.com (a_arya2000)
Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST)
Subject: [Bioperl-l] Bioperl-ext test fails
Message-ID: <615478.1036.qm@web60113.mail.yahoo.com>

Hello,
I downloaded latest bioperl-ext from bioperl website,
and I have io_lib v1.8.11 installed, and I was trying
to install Bio::SeqIO::staden::read (of bioperl-ext).
It compiled fine without any error but when I run make
test I got following output. 


ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib/lib', 'blib/arch')" t/*.t
t/staden_read....ok 3/94# Test 7 got: "0"
(t/staden_read.t at line 110 *TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
#  t/staden_read.t line 110 is:         ok(0, undef,
"We don't have the ability to write files for $format
format") for 1..7;
# Test 8 got: "0" (t/staden_read.t at line 110 fail #2
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 9 got: "0" (t/staden_read.t at line 110 fail #3
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 10 got: "0" (t/staden_read.t at line 110 fail
#4 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 11 got: "0" (t/staden_read.t at line 110 fail
#5 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 12 got: "0" (t/staden_read.t at line 110 fail
#6 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 13 got: "0" (t/staden_read.t at line 110 fail
#7 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 14 got: "0" (t/staden_read.t at line 62 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
#  t/staden_read.t line 62 is:  ok(0, undef, "Still
missing test files for $format format") for
(1..$formatlooptests);
# Test 15 got: "0" (t/staden_read.t at line 62 fail #2
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 16 got: "0" (t/staden_read.t at line 62 fail #3
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 17 got: "0" (t/staden_read.t at line 62 fail #4
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 18 got: "0" (t/staden_read.t at line 62 fail #5
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 19 got: "0" (t/staden_read.t at line 62 fail #6
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 20 got: "0" (t/staden_read.t at line 62 fail #7
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 21 got: "0" (t/staden_read.t at line 62 fail #8
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 22 got: "0" (t/staden_read.t at line 62 fail #9
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 23 got: "0" (t/staden_read.t at line 62 fail
#10 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 24 got: "0" (t/staden_read.t at line 62 fail
#11 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 25 got: "0" (t/staden_read.t at line 62 fail
#12 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 31 got: "0" (t/staden_read.t at line 107
*TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
#  t/staden_read.t line 107 is:             ok(0,
undef, "Can't write valid ctf files until we have a
trace object") for 1..7;
# Test 32 got: "0" (t/staden_read.t at line 107 fail
#2 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 33 got: "0" (t/staden_read.t at line 107 fail
#3 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 34 got: "0" (t/staden_read.t at line 107 fail
#4 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 35 got: "0" (t/staden_read.t at line 107 fail
#5 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 36 got: "0" (t/staden_read.t at line 107 fail
#6 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 37 got: "0" (t/staden_read.t at line 107 fail
#7 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
t/staden_read....ok                                   
                      
All tests successful.
Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
0.15 csys =  1.71 CPU)


Anyone has any idea what might be going wrong here? By
the way, my OS is Linux. Thank you very much.

Arya


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/

From bix at sendu.me.uk  Tue Nov 27 10:41:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 15:41:38 +0000
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com>
References: <615478.1036.qm@web60113.mail.yahoo.com>
Message-ID: <474C3AB2.5050208@sendu.me.uk>

a_arya2000 wrote:
> Hello,
> I downloaded latest bioperl-ext from bioperl website,
> and I have io_lib v1.8.11 installed, and I was trying
> to install Bio::SeqIO::staden::read (of bioperl-ext).
> It compiled fine without any error but when I run make
> test I got following output. 
[...]
> All tests successful.
> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
> 0.15 csys =  1.71 CPU)
> 
> 
> Anyone has any idea what might be going wrong here? By
> the way, my OS is Linux. Thank you very much.

Not being familiar with the test script or ext, I can at least say that 
nothing actually went wrong: 'All tests successful'. Apparently there 
are some things in the test script that are known by the author to not 
work quite right, so he marked them as 'todo'. The problems seem 
harmless in any case, with things returning 0 instead of undef.

So, unless you've reason to believe there is something significant going 
on, all is well.

From alison.waller at utoronto.ca  Mon Nov 26 16:06:35 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Mon, 26 Nov 2007 16:06:35 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
Message-ID: <005a01c83070$3a814580$d81efea9@AWALL>

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs help
with bioinformatics so I can analyze huge amounts of data to finish my
thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want to
have the flexibility to obtain information on say the top5.  I am pretty
sure I have done this wrong, any advice on how to correct my script to do
this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many hits
to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From bix at sendu.me.uk  Tue Nov 27 12:01:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 17:01:36 +0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <474C4D70.2010206@sendu.me.uk>

alison waller wrote:
> I am trying to write a script that will parse large blast files (usually
> blastx) I also want to be able to specify how many hits I want to report
> information on.
> 
> Most of the time I will only want information on the top hit, but I want to
> have the flexibility to obtain information on say the top5.  I am pretty
> sure I have done this wrong, any advice on how to correct my script to do
> this, would be great.

[snip]

>       if ($top_hit=$result->next_hit) # this might be wrong - I want to
> specify how many hits to print results for

I didn't really pay attention to the rest of your code, but assuming it 
all works except for only ever giving you info for the top hit, you just 
need to change this 'if' to a loop of some kind.

# ...
my $hits = 0;

while (my $hit = $result->next_hit) {
  $hits++;
  last if $hits > $tophit;
  # ...
}

From David.Messina at sbc.su.se  Tue Nov 27 12:55:44 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 27 Nov 2007 18:55:44 +0100
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <474C4D70.2010206@sendu.me.uk>
References: <005a01c83070$3a814580$d81efea9@AWALL>
	<474C4D70.2010206@sendu.me.uk>
Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>

Hi Alison,
As Sendu mentioned, the key bit is adding a condition to the hit loop to
limit the number of hits that are printed. I didn't test the below
extensively, but give it a try...


Dave


#!/usr/local/bin/perl -w

# Parsing BLAST reports with BioPerl's Bio::SearchIO module
# alison waller November 2007

use strict;
use warnings;
use Bio::SearchIO;

my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }

my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query

#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";

my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);

print OUT
  "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

# Go through BLAST reports one by one
while ( my $result = $report->next_result ) {
    my $i = 0;
    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
        while ( my $hsp = $hit->next_hsp ) {

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t";
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t";
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t";
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n";
        }
    }

    if ($i == 0) { print OUT "no hits\n"; }
}

From Russell.Smithies at agresearch.co.nz  Tue Nov 27 14:31:29 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 28 Nov 2007 08:31:29 +1300
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>

Do the hits need to be sorted first or is this done automagicly?
I ask this as I know Megablast doesn't provide sorted output for most of
it's formats.

Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Dave Messina
> Sent: Wednesday, 28 November 2007 6:56 a.m.
> To: alison waller
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
> 
> Hi Alison,
> As Sendu mentioned, the key bit is adding a condition to the hit loop
to
> limit the number of hits that are printed. I didn't test the below
> extensively, but give it a try...
> 
> 
> Dave
> 
> 
> 
> #!/usr/local/bin/perl -w
> 
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
> 
> use strict;
> use warnings;
> use Bio::SearchIO;
> 
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
hits>\n";
> if (@ARGV != 2) { die $usage; }
> 
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                         # to report for each query
> 
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
$!\n";
> 
> my $report = new Bio::SearchIO(
>     -file   => "$infile",
>     -format => "blast"
> );
> 
> print OUT
>
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
> Qstrand\tHstrand\n";
> 
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>     my $i = 0;
>     while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>         while ( my $hsp = $hit->next_hsp ) {
> 
>             # Print some tab-delimited data about this hit
>             print OUT $result->query_name,     "\t";
>             print OUT $hit->name,              "\t";
>             print OUT $hit->significance,      "\t";
>             print OUT $hit->bits,              "\t";
>             print OUT $hsp->evalue,            "\t";
>             print OUT $hsp->percent_identity,  "\t";
>             print OUT $hsp->length('total'),   "\t";
>             print OUT $hsp->num_identical,     "\t";
>             print OUT $hsp->gaps,              "\t";
>             print OUT $hsp->strand('query'),   "\t";
>             print OUT $hsp->strand('hit'),     "\n";
>         }
>     }
> 
>     if ($i == 0) { print OUT "no hits\n"; }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Nov 27 16:09:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:09:43 -0600
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <474C3AB2.5050208@sendu.me.uk>
References: <615478.1036.qm@web60113.mail.yahoo.com>
	<474C3AB2.5050208@sendu.me.uk>
Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu>

You can always test it within the bioperl suite after it's installed;  
several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read.  In general  
though if it's passing tests it should be fine.

chris

On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote:

> a_arya2000 wrote:
>> Hello,
>> I downloaded latest bioperl-ext from bioperl website,
>> and I have io_lib v1.8.11 installed, and I was trying
>> to install Bio::SeqIO::staden::read (of bioperl-ext).
>> It compiled fine without any error but when I run make
>> test I got following output.
> [...]
>> All tests successful.
>> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr +
>> 0.15 csys =  1.71 CPU)
>>
>>
>> Anyone has any idea what might be going wrong here? By
>> the way, my OS is Linux. Thank you very much.
>
> Not being familiar with the test script or ext, I can at least say  
> that
> nothing actually went wrong: 'All tests successful'. Apparently there
> are some things in the test script that are known by the author to not
> work quite right, so he marked them as 'todo'. The problems seem
> harmless in any case, with things returning 0 instead of undef.
>
> So, unless you've reason to believe there is something significant  
> going
> on, all is well.

From cjfields at uiuc.edu  Tue Nov 27 16:00:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:00:33 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Tue Nov 27 20:06:30 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT)
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
Message-ID: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>

Hello,

I was playing around with Primer3, and I hit a problem. Not sure if it's a
bug or if I was doing something I wasn't supposed to, but if it's the
latter, I thought it might save someone else half an hour of banging their
head of a keyboard if I mentioned it:

What I was doing was roughly:

# create a primer3 obj
my $p3 = ...Primer3->new();

# loop through some sequences generating primers for
# each of them using the same primer3 obj
while (@some_bio_seqs){
  my $res = $p3->run;
  ...
}

This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC,
at which point it worked for a few sequences then I got a "can't place
primer on sequence"  error.

After a bit of faffing about, I think the problem occurs when no primers
are found. In which case $p3 still has the primers from the previous run,
which don't come from the current sequence, so can't be placed on it. I
tried calling $p3->cleanup in the loop, but that didn't work either.
Creating a new $p3 every time works fine.

Are you supposed to create a new Primer3 object for every sequence?
(Apologies if I missed the relevant bit of the docs).

Cheers,
Cass xx

From alison.waller at utoronto.ca  Tue Nov 27 16:32:07 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Tue, 27 Nov 2007 16:32:07 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>
Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>

Thanks Everyone,

Your edits worked Dave, however after looking at the output I realized that
I only want information on the top hsp per query returned.  For example some
of the querys the top hit has two hsps so it returned both.

I tried to further edit it, but after 3 attempts they are all failing, I
think due to me using the loops wrong.

I also have another problem, I also want to retrieve the gi, this doesn't
seem to be straight forward as it should.  I found another method
_get_seq_identifiers, but this looks awkward, isn't there and object for the
gi?

I've pasted my non-working script below if there are any suggestions on how
to get it to print out just the first hsp per hit, that would be great.

Thanks,


#!/usr/local/bin/perl -w


# Parsing BLAST reports with BioPerl's Bio::SearchIO module 
# alison waller November 2007


use strict;
use warnings;
use Bio::SearchIO;


my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }


my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query


#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";


my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);


print OUT
 
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
strand\tHstrand\n";


# Go through BLAST reports one by one
while (my $result = $report->next_result) {
	my $i=0;
	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
        

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t"; 
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t"; 
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t"; 
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n"; 
        }
}
    if ($i == 0) { print OUT "no hits\n"; } 

}

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, November 27, 2007 4:01 PM
To: Smithies, Russell
Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dennis.prickett at bbsrc.ac.uk  Wed Nov 28 05:18:26 2007
From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C))
Date: Wed, 28 Nov 2007 10:18:26 -0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk>

Dear Alison
 
Or, if you are absolutely only interested in the top hit you could limit
it to that in the blast  command by adding the parameters  " -b 1 ".  

This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps,
etc).  Your blasts run faster and then you won't have to worry about how
to parse out the top blast hit(s).

However, if there are any caveats for using this parameter that I am not
aware of please let us know. 

Dennis Prickett
Institute of Animal Health
Compton, nr Newbury
RG2 9FS
United Kingdom


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller
Sent: 26 November 2007 21:07
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] help using SEARCH IO to parse blast results

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs
help with bioinformatics so I can analyze huge amounts of data to finish
my thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want
to have the flexibility to obtain information on say the top5.  I am
pretty sure I have done this wrong, any advice on how to correct my
script to do this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many
hits to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From t.nugent at cs.ucl.ac.uk  Wed Nov 28 08:10:41 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Wed, 28 Nov 2007 13:10:41 +0000
Subject: [Bioperl-l] Helical Wheel module
Message-ID: <474D68D1.3080602@cs.ucl.ac.uk>

Hi everyone,

I've been drawing a lot of helical wheels recently so put all my code 
into a module. I don't think there's anything in bioperl to do this yet 
though there are a few programs written in perl and flash on the web to 
do the same thing. I was thinking this could fit into biographics. Has 
lots of options to adjust labels, colours, ttf fonts and can output to 
png & svg.

Tim

...

Here's the output, converted to jpg from svg:
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg

Module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz

Works like this:

use DrawHelicalWheel;

my $im = DrawHelicalWheel->new(-title=>$title,
                               -sequence=>$sequence,
                               -helices=>\@helices,
                               -ttf_font=>$font);
open(OUTPUT, ">$svg");
binmode OUTPUT;
print OUTPUT $im->svg;
close OUTPUT;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From tristan.lefebure at gmail.com  Wed Nov 28 10:46:11 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:46:11 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281046.11146.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan

From bix at sendu.me.uk  Wed Nov 28 11:19:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Nov 2007 16:19:36 +0000
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <200711281046.11146.tnl7@cornell.edu>
References: <200711281046.11146.tnl7@cornell.edu>
Message-ID: <474D9518.7010201@sendu.me.uk>

Tristan Lefebure wrote:
> Hello!
> 
> I was wondering if there was a function to remove sites/columns of an 
> alignment. Something like: $aln->remove_sites(@sites_to_remove)
> I looked around Bio::SimpleAlign but did not find exactly that. There is 
> remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

You might want to take a second look at the docs. You can supply column 
number ranges to remove_columns(), so it does exactly what you want.


From tnl7 at cornell.edu  Wed Nov 28 10:44:17 2007
From: tnl7 at cornell.edu (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:44:17 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281044.17770.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan

From cjfields at uiuc.edu  Wed Nov 28 08:57:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:57:27 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
Message-ID: <B3E0F9EA-9452-483E-AA17-5174B743B164@uiuc.edu>

I had some code which does this which I committed yesterday to CVS; it  
catches the GI for the query and the hits:

$result->query_gi;
$hit->ncbi_gi;

I am in the midst of fixing additional problems with WU-BLAST parsing  
but you are more than welcome to try it.

chris

On Nov 27, 2007, at 3:32 PM, alison waller wrote:

> Thanks Everyone,
>
> Your edits worked Dave, however after looking at the output I  
> realized that
> I only want information on the top hsp per query returned.  For  
> example some
> of the querys the top hit has two hsps so it returned both.
>
> I tried to further edit it, but after 3 attempts they are all  
> failing, I
> think due to me using the loops wrong.
>
> I also have another problem, I also want to retrieve the gi, this  
> doesn't
> seem to be straight forward as it should.  I found another method
> _get_seq_identifiers, but this looks awkward, isn't there and object  
> for the
> gi?
>
> I've pasted my non-working script below if there are any suggestions  
> on how
> to get it to print out just the first hsp per hit, that would be  
> great.
>
> Thanks,
>
>
> #!/usr/local/bin/perl -w
>
>
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
>
>
> use strict;
> use warnings;
> use Bio::SearchIO;
>
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits>\n";
> if (@ARGV != 2) { die $usage; }
>
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                        # to report for each query
>
>
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
>
> my $report = new Bio::SearchIO(
>    -file   => "$infile",
>    -format => "blast"
> );
>
>
> print OUT
>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tgaps\t
> strand\tHstrand\n";
>
>
> # Go through BLAST reports one by one
> while (my $result = $report->next_result) {
> 	my $i=0;
> 	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
>    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
>
>
>            # Print some tab-delimited data about this hit
>            print OUT $result->query_name,     "\t";
>            print OUT $hit->name,              "\t";
>            print OUT $hit->significance,      "\t";
>            print OUT $hit->bits,              "\t";
>            print OUT $hsp->evalue,            "\t";
>            print OUT $hsp->percent_identity,  "\t";
>            print OUT $hsp->length('total'),   "\t";
>            print OUT $hsp->num_identical,     "\t";
>            print OUT $hsp->gaps,              "\t";
>            print OUT $hsp->strand('query'),   "\t";
>            print OUT $hsp->strand('hit'),     "\n";
>        }
> }
>    if ($i == 0) { print OUT "no hits\n"; }
>
> }
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, November 27, 2007 4:01 PM
> To: Smithies, Russell
> Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>
> The hits/HSPs are generally in the order they appear in the report.
>
> If you are looking for best/worst HSP after parsing you can use the
> $hit->hsp() method:
>
> # best and worst
> my $best = $hit->hsp('best'); # also 'first'
> my $worst = $hit->hsp('worst'); # also last
>
> The SearchIO text BLAST parser also has several options implemented
> for finer control:
>
>     -inclusion_threshold => e-value threshold for inclusion in the
>                             PSI-BLAST score matrix model (blastpgp)
>     -signif      => float or scientific notation number to be used
>                     as a P- or Expect value cutoff
>     -score       => integer or scientific notation number to be used
>                     as a blast score value cutoff
>     -bits        => integer or scientific notation number to be used
>                     as a bit score value cutoff
>     -hit_filter  => reference to a function to be used for
>                     filtering hits based on arbitrary criteria.
>                     All hits of each BLAST report must satisfy
>                     this criteria to be retained.
>                     If a hit fails this test, it is ignored.
>                     This function should take a
>                     Bio::Search::Hit::BlastHit.pm object as its first
>                     argument and return true
>                     if the hit should be retained.
>                     Sample filter function:
>                        -hit_filter => sub { $hit = shift;
>                                             $hit->gaps == 0; },
>                     (Note: -filt_func is synonymous with -hit_filter)
>     -overlap     => integer. The amount of overlap to permit between
>                     adjacent HSPs when tiling HSPs. A reasonable
> value is 2.
>                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.
>
> chris
>
> On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:
>
>> Do the hits need to be sorted first or is this done automagicly?
>> I ask this as I know Megablast doesn't provide sorted output for
>> most of
>> it's formats.
>>
>> Russell
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-
>>> bio.org] On Behalf Of Dave Messina
>>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>>> To: alison waller
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>>
>>> Hi Alison,
>>> As Sendu mentioned, the key bit is adding a condition to the hit  
>>> loop
>> to
>>> limit the number of hits that are printed. I didn't test the below
>>> extensively, but give it a try...
>>>
>>>
>>> Dave
>>>
>>>
>>>
>>> #!/usr/local/bin/perl -w
>>>
>>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>>> # alison waller November 2007
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SearchIO;
>>>
>>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
>> hits>\n";
>>> if (@ARGV != 2) { die $usage; }
>>>
>>> my $infile  = $ARGV[0];
>>> my $outfile = $infile . '.parsed';
>>> my $tophit  = $ARGV[1]; # to specify in the command line how many
>>> hits
>>>                       # to report for each query
>>>
>>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!
>>> \n";
>>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
>> $!\n";
>>>
>>> my $report = new Bio::SearchIO(
>>>   -file   => "$infile",
>>>   -format => "blast"
>>> );
>>>
>>> print OUT
>>>
>> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent
>> \tga
>> ps\t
>>> Qstrand\tHstrand\n";
>>>
>>> # Go through BLAST reports one by one
>>> while ( my $result = $report->next_result ) {
>>>   my $i = 0;
>>>   while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>>       while ( my $hsp = $hit->next_hsp ) {
>>>
>>>           # Print some tab-delimited data about this hit
>>>           print OUT $result->query_name,     "\t";
>>>           print OUT $hit->name,              "\t";
>>>           print OUT $hit->significance,      "\t";
>>>           print OUT $hit->bits,              "\t";
>>>           print OUT $hsp->evalue,            "\t";
>>>           print OUT $hsp->percent_identity,  "\t";
>>>           print OUT $hsp->length('total'),   "\t";
>>>           print OUT $hsp->num_identical,     "\t";
>>>           print OUT $hsp->gaps,              "\t";
>>>           print OUT $hsp->strand('query'),   "\t";
>>>           print OUT $hsp->strand('hit'),     "\n";
>>>       }
>>>   }
>>>
>>>   if ($i == 0) { print OUT "no hits\n"; }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =
>> = 
>> =====================================================================
>> Attention: The information contained in this message and/or
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or
>> privileged
>> material. Any review, retransmission, dissemination or other use of,
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by
>> AgResearch
>> Limited. If you have received this message in error, please notify  
>> the
>> sender immediately.
>> =
>> = 
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 08:54:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:54:39 -0600
Subject: [Bioperl-l] Helical Wheel module
In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk>
References: <474D68D1.3080602@cs.ucl.ac.uk>
Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu>

Looks good!  We recently added in your transmembrane module, so we  
could definitely add this in.

chris

On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've been drawing a lot of helical wheels recently so put all my code
> into a module. I don't think there's anything in bioperl to do this  
> yet
> though there are a few programs written in perl and flash on the web  
> to
> do the same thing. I was thinking this could fit into biographics. Has
> lots of options to adjust labels, colours, ttf fonts and can output to
> png & svg.
>
> Tim
>
> ...
>
> Here's the output, converted to jpg from svg:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg
>
> Module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz
>
> Works like this:
>
> use DrawHelicalWheel;
>
> my $im = DrawHelicalWheel->new(-title=>$title,
>                               -sequence=>$sequence,
>                               -helices=>\@helices,
>                               -ttf_font=>$font);
> open(OUTPUT, ">$svg");
> binmode OUTPUT;
> print OUTPUT $im->svg;
> close OUTPUT;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> http://www.cs.ucl.ac.uk/staff/T.Nugent
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 13:43:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 12:43:58 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
	<8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu>


On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote:

> Chris,
>
> Cood catch. You're on track here with one exception: WU blast and NCBI
> blast behave differently in what they report in the hit table: WU
> blast puts the raw score in the table not the bit score as NCBI blast
> does (see example below for reference). WU blast also swaps their
> location in the HSP header relative to how NCBI reports it. It would
> be good to verify that the blast parser isn't befuddled by this. A
> quick look at SearchIO::blast and it appears that data from the hit
> table is always getting stored as score, not bits for WU blast. Not
> sure if the HSP section data are parsed correctly. I'd recommend
> looking into these things when you do your fixes.

What I have now after commits is:

GenericHit - use the best HSP when possible for bits, score/raw_score,  
significance.  When there is no HSP, construct a minimal Hit object  
using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST  
maps to bits(), both map evalue/pvalue to significance).  HSP mapping  
seems to be correct.

One issue that has popped up is GenericHit::significance  
preferentially uses the best HSP.  However, GenericHSP::significance  
uses evalues preferentially over pvalues; both Expect and P appear to  
be parsed for WU-BLAST HSPs now (so the evalue is reported); this  
apparently wasn't always the case if I read the GenericHit docs  
correctly.  As NCBI BLAST doesn't report pvalues we could change that  
so it preferentially returns a pvalue if present, falling back to an  
evalue.   This would match what is found hit table more closely and  
resembles what is documented for the method (for significance(), WU- 
BLAST gets pvalues, NCBI BLAST gets evalues).

> So in the end, WU blast HSPs that are built from the hit table should
> report a value for raw_score and punt on bits, but NCBI HSPs so
> constructed should do the opposite. The downside to this arrangement
> is that code that works for NCBI blast hits will need modification to
> work for WU blast hits, but that is just the nature of the data. It
> shouldn't be an issue for the majority of users that stick with one
> flavor of blast and don't switch back and forth, or for users that get
> their HSP scoring data from HSP sections rather than relying on the
> hit table.

In general I get my data from the HSPs, so this shouldn't be a  
significant issue (bad pun).  I did find that changing it so that Hit  
objects use HSP data pointed out issues with test data; hit table raw/ 
bit scores were rounded from the HSP score data or vice versa since  
all data came from the hit table, so tests flunked.

I think changing the way minimal hit objects report data (particularly  
for NCBI BLAST) will lead to a lot of confusion unless we clarify  
warnings when one or the other is missing (as you also indicated).   
I'm working on that now.

> Ideally, the HSP object would know whether it was NCBI or WU-based and
> issue an informative warning when attempting to access data it doesn't
> have. One solution might be for the parser to put a 'WU-' in front of
> the algorithm name for WU blast reports, so it would then be available
> for the contained hit/hsp objects. This could break anything dependent
> on algorithm name, so it would need some testing.
>
> Steve


I can probably work around as noted above that unless you think it's  
warranted to add a 'WU' designation (the version info in the Result  
object has 'WashU' attached, so one could feasibly use that for  
distinguishing the two report types).

Anyway, I'm committing my first batch of fixes, the significance test  
will fail for at least a day until I can look into it more.

chris


From tristan.lefebure at gmail.com  Wed Nov 28 14:03:44 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 14:03:44 -0500
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <474D9518.7010201@sendu.me.uk>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
Message-ID: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>

Hoops. I was reading the BioPerl 1.4 documentation. Actually,
http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to
http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be
the 1.4documentation...

Thank you, it works great.


On Nov 28, 2007 11:19 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Tristan Lefebure wrote:
> > Hello!
> >
> > I was wondering if there was a function to remove sites/columns of an
> > alignment. Something like: $aln->remove_sites(@sites_to_remove)
> > I looked around Bio::SimpleAlign but did not find exactly that. There is
> > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch'
> criteria.
>
> You might want to take a second look at the docs. You can supply column
> number ranges to remove_columns(), so it does exactly what you want.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From Russell.Smithies at agresearch.co.nz  Wed Nov 28 16:57:14 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 29 Nov 2007 10:57:14 +1300
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>

Has anyone got a good example of parsing ASN.1 with
Bio::SeqIO::entrezgene?
I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
and my Perl isn't that good  :-(

Russell
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From stefan.kirov at bms.com  Wed Nov 28 17:16:18 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time)
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
Message-ID: <Pine.WNT.4.64.0711281708590.21768@A103728.hpw.stf.bms.com>

Here is an example for GO, will send the one for KEGG later:
my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene',
 	-service_record=>'yes');#, -locuslink=>'convert');
while (my $seq=$eio->next_seq) {
 	my $gid=$seq->accession_number;
 	foreach my $ot ($ann->get_Annotations('OntologyTerm')) {
     		next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers
     		my $evid=$ot->comment;
     		$evid=~s/evidence: //i;
     		my @ref=$ot->term->get_references; #Really there should be just one?
     		my $id=$ot->identifier;
     		my $fid='GO:' . sprintf("%07u",$id);
     		print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n";
 	}
}
Please note there is a bug in the parser that makes it suck a lot of RAM. 
I am fixing this and will commit probably by the week's end- you will have 
to update at that point. If you work with few records this should not 
matter.
Stefan


On Thu, 29 Nov 2007, Smithies, Russell wrote:

> Has anyone got a good example of parsing ASN.1 with
> Bio::SeqIO::entrezgene?
> I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
> and my Perl isn't that good  :-(
>
> Russell
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at uiuc.edu  Thu Nov 29 18:06:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 17:06:42 -0600
Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml
Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu>

For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST  
parsing in Bio::SearchIO::blastxml (though it appears to be pretty  
stable!).  Since there isn't any easy way to distinguish between  
normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to  
BLAST, you have to indicate how the report is to be parsed by passing  
in a '-blasttype' parameter:

$searchio = Bio::SearchIO->new('-tempfile' => 1,
        '-format' => 'blastxml',
        '-file'   => 'psiblast.xml',
        '-blasttype' => 'psiblast');

Otherwise it chunks the individual iterations out as separate BLAST  
reports and parses them as separate reports.

Tests have also been added to SearchIO.t.  I will update the HOWTO and  
blastxml docs soon.

chris

From cjfields at uiuc.edu  Thu Nov 29 21:41:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 20:41:49 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
In-Reply-To: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
References: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu>

It's probably safer to create a new instance each time but it really  
shouldn't be necessary for a wrapper module; this sounds like a bug to  
me.  Could you file it in Bugzilla?

On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote:

> Hello,
>
> I was playing around with Primer3, and I hit a problem. Not sure if  
> it's a
> bug or if I was doing something I wasn't supposed to, but if it's the
> latter, I thought it might save someone else half an hour of banging  
> their
> head of a keyboard if I mentioned it:
>
> What I was doing was roughly:
>
> # create a primer3 obj
> my $p3 = ...Primer3->new();
>
> # loop through some sequences generating primers for
> # each of them using the same primer3 obj
> while (@some_bio_seqs){
>  my $res = $p3->run;
>  ...
> }
>
> This worked fine for a while, but broke when I tried to set  
> PRIMER_MIN_GC,
> at which point it worked for a few sequences then I got a "can't place
> primer on sequence"  error.
>
> After a bit of faffing about, I think the problem occurs when no  
> primers
> are found. In which case $p3 still has the primers from the previous  
> run,
> which don't come from the current sequence, so can't be placed on  
> it. I
> tried calling $p3->cleanup in the loop, but that didn't work either.
> Creating a new $p3 every time works fine.
>
> Are you supposed to create a new Primer3 object for every sequence?
> (Apologies if I missed the relevant bit of the docs).
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From paulhengen at coh.org  Wed Nov 28 20:20:42 2007
From: paulhengen at coh.org (Paul N. Hengen)
Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST)
Subject: [Bioperl-l]  Collecting genomic DNA sequences using Entrez IDs
Message-ID: <14017289.post@talk.nabble.com>


Hi.

I have a number of gene IDs from Entrez and I want to find the
start and end locations in the human genome. This seemed simple
enough, so I started working through some of the examples for
using the EntrezGene module at www.bioperl.org  Of course this
did not work because the core installation does not include this
module. So, I think I have two choices (1) install the module (how?),
or (2) find an easier way to get the locations in the human genome.
I want to use the locations to grab sequences out of the genome.
Can anyone offer advice on this? Thanks.

-Paul.

--
Paul N. Hengen, Ph.D.
Hematopoietic Stem Cell and Leukemia Research
City of Hope National Medical Center
1500 E. Duarte Road, Duarte, CA 91010 USA
mailto:paulhengen at coh.org

-- 
View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Viktor.Martyanov at Dartmouth.EDU  Thu Nov 29 15:20:19 2007
From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov)
Date: 29 Nov 2007 15:20:19 -0500
Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases
Message-ID: <193573097@newdonner.Dartmouth.EDU>

A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 444 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071129/a6380324/attachment-0001.bin 

From alison.waller at utoronto.ca  Thu Nov 29 11:20:59 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Thu, 29 Nov 2007 11:20:59 -0500
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from
	CVS)
Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL>

Hi all,

 
I would like to install the CVS version of bioperl  as I know of some code
changes that will be useful to me.  However, I am having problems installing
it.  

I am trying to install bioperl in my home directly on a linux cluster.  

 
I used

 
> cd bioperl-live

*       perl Build.PL -install /home/awaller

 
However after the build command I got a lot of errors.  Do I have to also
have perl installed in my home directory??  There is perl installed on the
cluster in /usr/bin.  Do I need to point to this or does Build.PL
automatically look there?  I noticed a few errors about not having
permission and a few about not being able to connect. I've copied a portion
of the messages after my Build.pl command.  

 
Any help would be appreciated,

 
alison 

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/02packages.details.txt.gz

Trying to get away with old file:

3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
/root/.cpan/sources/modules/02packages.details.txt.gz

Going to read /root/.cpan/sources/modules/02packages.details.txt.gz

  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT

 
  There's a new CPAN.pm version (v1.9204) available!

  [Current version is v1.7601]

  You might want to try

    install Bundle::CPAN

    reload cpan

  without quitting the current session. It should be a seamless upgrade

  while we are running...

 
Warning: You are not allowed to write into directory
"/root/.cpan/sources/modules".

    I'll continue, but if you encounter problems, they may be due

    to insufficient permissions.

Fetching with LWP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[Cannot write to
'/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied]

Fetching with Net::FTP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from ftp.nrc.ca

Fetching with LWP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[FTP close response: 500 Network seems to
have barfed - Let's all phone our ISP and go postal!

Unknown command.

]

Fetching with Net::FTP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca

Fetching with LWP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'cpan.mirror.cygnal.ca']

Fetching with Net::FTP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Fetching with LWP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'mirror.isurf.ca']

Fetching with Net::FTP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

 
Trying with "/usr/bin/lynx -source" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL
ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: cpan.mirror.cygnal.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/03modlist.data.gz

Trying to get away with old file:

3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
/root/.cpan/sources/modules/03modlist.data.gz

Going to read /root/.cpan/sources/modules/03modlist.data.gz

Going to write /root/.cpan/Metadata

can't create /root/.cpan/Metadata: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 3432

Running install for module Test::Harness

Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz

mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 2342

******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From cjfields at uiuc.edu  Thu Nov 29 23:53:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:53:09 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <D344C28E-BC9B-4226-AD15-149EA001FCAB@uiuc.edu>

Alison,

There are directions on how to do this here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA

(TinyURL link)
http://tinyurl.com/3263dd

Note the additional configuration for CPAN in that section; you'll  
need to set up CPAN so it installs everything locally.

chris

On Nov 29, 2007, at 10:20 AM, alison waller wrote:

> Hi all,
>
>
>
> I would like to install the CVS version of bioperl  as I know of  
> some code
> changes that will be useful to me.  However, I am having problems  
> installing
> it.
>
> I am trying to install bioperl in my home directly on a linux cluster.
>
>
>
> I used
>
>
>
>> cd bioperl-live
>
> *       perl Build.PL -install /home/awaller
>
>
>
> However after the build command I got a lot of errors.  Do I have to  
> also
> have perl installed in my home directory??  There is perl installed  
> on the
> cluster in /usr/bin.  Do I need to point to this or does Build.PL
> automatically look there?  I noticed a few errors about not having
> permission and a few about not being able to connect. I've copied a  
> portion
> of the messages after my Build.pl command.
>
>
>
> Any help would be appreciated,
>
>
>
> alison
>
>
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/02packages.details.txt.gz
>
> Trying to get away with old file:
>
> 3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
> /root/.cpan/sources/modules/02packages.details.txt.gz
>
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
>
>  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT
>
>
>
>  There's a new CPAN.pm version (v1.9204) available!
>
>  [Current version is v1.7601]
>
>  You might want to try
>
>    install Bundle::CPAN
>
>    reload cpan
>
>  without quitting the current session. It should be a seamless upgrade
>
>  while we are running...
>
>
>
> Warning: You are not allowed to write into directory
> "/root/.cpan/sources/modules".
>
>    I'll continue, but if you encounter problems, they may be due
>
>    to insufficient permissions.
>
> Fetching with LWP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[Cannot write to
> '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission  
> denied]
>
> Fetching with Net::FTP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from ftp.nrc.ca
>
> Fetching with LWP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[FTP close response: 500 Network  
> seems to
> have barfed - Let's all phone our ISP and go postal!
>
> Unknown command.
>
> ]
>
> Fetching with Net::FTP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca
>
> Fetching with LWP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'cpan.mirror.cygnal.ca']
>
> Fetching with Net::FTP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Fetching with LWP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'mirror.isurf.ca']
>
> Fetching with Net::FTP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: cpan.mirror.cygnal.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz 
> .
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/03modlist.data.gz
>
> Trying to get away with old file:
>
> 3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
> /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to write /root/.cpan/Metadata
>
> can't create /root/.cpan/Metadata: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 3432
>
> Running install for module Test::Harness
>
> Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz
>
> mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 2342
>
> ******************************************
> Alison S. Waller  M.A.Sc.
> Doctoral Candidate
> awaller at chem-eng.utoronto.ca
> 416-978-4222 (lab)
> Department of Chemical Engineering
> Wallberg Building
> 200 College st.
> Toronto, ON
> M5S 3E5
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 29 23:57:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:57:36 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>

Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
core (I think they were added prior to the 1.5.1 release, but I'm not  
positive).  If possible you should try installing bioperl 1.5.2 or the  
latest code from CVS.

For directions on installing Bioperl for most OS's go here:

http://www.bioperl.org/wiki/Installing_BioPerl

 From CVS:

http://www.bioperl.org/wiki/Using_CVS

chris

On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:

>
> Hi.
>
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
>
> -Paul.
>
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org
>
> -- 
> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 30 03:45:57 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Nov 2007 08:45:57 +0000
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
 from	CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <474FCDC5.5020100@sendu.me.uk>

alison waller wrote:
> I would like to install the CVS version of bioperl  as I know of some code
> changes that will be useful to me.  However, I am having problems installing
> it.  
> 
> I am trying to install bioperl in my home directly on a linux cluster.  
[...]
> Please check, if the URLs I found in your configuration file
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are
> valid. The urllist can be edited. E.g. with 'o conf urllist push
> ftp://myurl/'

Either these urls are invalid as suggested (try setting the urllist to 
nothing), or your linux cluster doesn't have internet access. You can't 
do a 'proper' install of BioPerl and its dependencies without internet 
access.

However, for most purposes simply downloading the BioPerl modules (ie. 
from a different machine with internet access) and pointing your 
PERL5LIB to their location is sufficient. You can download CVS modules 
from the BioPerl website individually, or as a tarball or everything.

From MEC at stowers-institute.org  Fri Nov 30 09:12:09 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 30 Nov 2007 08:12:09 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>

How many, how often?

Use ensembl biomart!

First time interactively.

Then if you to pipeline it, take the perl code it generates for you and
run it - of course you'll have to install the Ensembl Perl API....


Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Paul N. Hengen
> Sent: Wednesday, November 28, 2007 7:21 PM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
> 
> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find 
> the start and end locations in the human genome. This seemed 
> simple enough, so I started working through some of the 
> examples for using the EntrezGene module at www.bioperl.org  
> Of course this did not work because the core installation 
> does not include this module. So, I think I have two choices 
> (1) install the module (how?), or (2) find an easier way to 
> get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research City of Hope 
> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 
> USA mailto:paulhengen at coh.org
> 
> --
> View this message in context: 
> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
> ntrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bosborne11 at verizon.net  Fri Nov 30 09:38:58 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 30 Nov 2007 09:38:58 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
Message-ID: <C3758AB2.10609%bosborne11@verizon.net>

Paul,

Have you taken a look at this page?

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

There's code there that looks similar to what you're proposing.


Brian O.


On 11/28/07 8:20 PM, "Paul N. Hengen" <paulhengen at coh.org> wrote:

> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org


From cjfields at uiuc.edu  Fri Nov 30 10:47:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 09:47:32 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47502C75.60809@bms.com>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>

My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
Mingyi Liu if he would like to include this parser with BioPerl (since  
it requires it, makes sense to me, and it avoids the circular  
dependency that has plagued these modules).

chris

On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:

> Chris Fields wrote:
> Chris,
> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
> low-level parser and is not part of bioperl. There is a circular
> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
> Paul, you can get it from CPAN and this should make
> Bio::SeqIO::entrezgene functional for you.
> Stefan
>
>
>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>> core (I think they were added prior to the 1.5.1 release, but I'm not
>> positive).  If possible you should try installing bioperl 1.5.2 or  
>> the
>> latest code from CVS.
>>
>> For directions on installing Bioperl for most OS's go here:
>>
>> http://www.bioperl.org/wiki/Installing_BioPerl
>>
>> From CVS:
>>
>> http://www.bioperl.org/wiki/Using_CVS
>>
>> chris
>>
>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>
>>
>>> Hi.
>>>
>>> I have a number of gene IDs from Entrez and I want to find the
>>> start and end locations in the human genome. This seemed simple
>>> enough, so I started working through some of the examples for
>>> using the EntrezGene module at www.bioperl.org  Of course this
>>> did not work because the core installation does not include this
>>> module. So, I think I have two choices (1) install the module  
>>> (how?),
>>> or (2) find an easier way to get the locations in the human genome.
>>> I want to use the locations to grab sequences out of the genome.
>>> Can anyone offer advice on this? Thanks.
>>>
>>> -Paul.
>>>
>>> --
>>> Paul N. Hengen, Ph.D.
>>> Hematopoietic Stem Cell and Leukemia Research
>>> City of Hope National Medical Center
>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>> mailto:paulhengen at coh.org
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Fri Nov 30 11:12:22 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 11:12:22 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
Message-ID: <47503666.8090004@bms.com>

Chris Fields wrote:
> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
> Mingyi Liu if he would like to include this parser with BioPerl (since  
> it requires it, makes sense to me, and it avoids the circular  
> dependency that has plagued these modules).
>   
Yes, I think this would be a good step.
Stefan
> chris
>
> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>
>   
>> Chris Fields wrote:
>> Chris,
>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>> low-level parser and is not part of bioperl. There is a circular
>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>> Paul, you can get it from CPAN and this should make
>> Bio::SeqIO::entrezgene functional for you.
>> Stefan
>>
>>
>>     
>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>> the
>>> latest code from CVS.
>>>
>>> For directions on installing Bioperl for most OS's go here:
>>>
>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>
>>> From CVS:
>>>
>>> http://www.bioperl.org/wiki/Using_CVS
>>>
>>> chris
>>>
>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>
>>>
>>>       
>>>> Hi.
>>>>
>>>> I have a number of gene IDs from Entrez and I want to find the
>>>> start and end locations in the human genome. This seemed simple
>>>> enough, so I started working through some of the examples for
>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>> did not work because the core installation does not include this
>>>> module. So, I think I have two choices (1) install the module  
>>>> (how?),
>>>> or (2) find an easier way to get the locations in the human genome.
>>>> I want to use the locations to grab sequences out of the genome.
>>>> Can anyone offer advice on this? Thanks.
>>>>
>>>> -Paul.
>>>>
>>>> --
>>>> Paul N. Hengen, Ph.D.
>>>> Hematopoietic Stem Cell and Leukemia Research
>>>> City of Hope National Medical Center
>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>> mailto:paulhengen at coh.org
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From stefan.kirov at bms.com  Fri Nov 30 10:29:57 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 10:29:57 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
Message-ID: <47502C75.60809@bms.com>

Chris Fields wrote:
Chris,
Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
low-level parser and is not part of bioperl. There is a circular
dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
Paul, you can get it from CPAN and this should make
Bio::SeqIO::entrezgene functional for you.
Stefan


> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
> core (I think they were added prior to the 1.5.1 release, but I'm not  
> positive).  If possible you should try installing bioperl 1.5.2 or the  
> latest code from CVS.
>
> For directions on installing Bioperl for most OS's go here:
>
> http://www.bioperl.org/wiki/Installing_BioPerl
>
>  From CVS:
>
> http://www.bioperl.org/wiki/Using_CVS
>
> chris
>
> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>
>   
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find the
>> start and end locations in the human genome. This seemed simple
>> enough, so I started working through some of the examples for
>> using the EntrezGene module at www.bioperl.org  Of course this
>> did not work because the core installation does not include this
>> module. So, I think I have two choices (1) install the module (how?),
>> or (2) find an easier way to get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research
>> City of Hope National Medical Center
>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>> mailto:paulhengen at coh.org
>>
>> -- 
>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From arareko at campus.iztacala.unam.mx  Fri Nov 30 12:01:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 30 Nov 2007 11:01:29 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47503666.8090004@bms.com>
References: <14017289.post@talk.nabble.com>	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>	<47502C75.60809@bms.com>	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
	<47503666.8090004@bms.com>
Message-ID: <475041E9.8050909@campus.iztacala.unam.mx>

I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the 
past, he mentioned he doesn't track the list closely).

Mauricio.

Stefan Kirov wrote:
> Chris Fields wrote:
>> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
>> Mingyi Liu if he would like to include this parser with BioPerl (since  
>> it requires it, makes sense to me, and it avoids the circular  
>> dependency that has plagued these modules).
>>   
> Yes, I think this would be a good step.
> Stefan
>> chris
>>
>> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>>
>>   
>>> Chris Fields wrote:
>>> Chris,
>>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>>> low-level parser and is not part of bioperl. There is a circular
>>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>>> Paul, you can get it from CPAN and this should make
>>> Bio::SeqIO::entrezgene functional for you.
>>> Stefan
>>>
>>>
>>>     
>>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>>> the
>>>> latest code from CVS.
>>>>
>>>> For directions on installing Bioperl for most OS's go here:
>>>>
>>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>>
>>>> From CVS:
>>>>
>>>> http://www.bioperl.org/wiki/Using_CVS
>>>>
>>>> chris
>>>>
>>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>>
>>>>
>>>>       
>>>>> Hi.
>>>>>
>>>>> I have a number of gene IDs from Entrez and I want to find the
>>>>> start and end locations in the human genome. This seemed simple
>>>>> enough, so I started working through some of the examples for
>>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>>> did not work because the core installation does not include this
>>>>> module. So, I think I have two choices (1) install the module  
>>>>> (how?),
>>>>> or (2) find an easier way to get the locations in the human genome.
>>>>> I want to use the locations to grab sequences out of the genome.
>>>>> Can anyone offer advice on this? Thanks.
>>>>>
>>>>> -Paul.
>>>>>
>>>>> --
>>>>> Paul N. Hengen, Ph.D.
>>>>> Hematopoietic Stem Cell and Leukemia Research
>>>>> City of Hope National Medical Center
>>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>>> mailto:paulhengen at coh.org
>>>>>
>>>>> -- 
>>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>         
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Fri Nov 30 15:21:13 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Nov 2007 12:21:13 -0800
Subject: [Bioperl-l] Trying to find multiple homologs in multiple
	databases
In-Reply-To: <193573097@newdonner.Dartmouth.EDU>
References: <193573097@newdonner.Dartmouth.EDU>
Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org>

Viktor -
Bio::SearchIO helps you parse BLAST reports, but don't underestimate  
the power of going as low-tech as possible and outputting scores with  
the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular  
format that is parseable with the 'split' function in Perl.

See the wiki http://bioperl.org/wiki for HOWTOs and examples of using  
the parsers.

You might also consider already-written tools like OrthoMCL,  
InParanoid, and other that help you define relationships like   
orthologs and paralogs among species. There also exist a few  
published web resources that have pre-computed homologs for you,  
might take a look around first unless the point of the project is to  
learn how to run these kinds of searches.

For general Perl help consider Perlmonks.org and some of  the  
introductory books that are available.
-jason
--
Jason Stajich
jason at bioperl.org

On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote:

> Hello,
>
> My name is Viktor Martyanov and I am a Ph.D. student in biology at  
> Dartmouth.
>
> I need to be able to use a set of genes or FASTA sequences from S.  
> cerevisiae and retrieve a set of corresponding homologs from other  
> fungal species via BLASTP searches.
>
> I would like to find out if there are Perl scripts that approach  
> this problem. By the way, is there a Perl community or forum where  
> I could post this question?
>
> Thanks very much.  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri Nov 30 17:03:23 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 30 Nov 2007 15:03:23 -0700
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
References: <14017289.post@talk.nabble.com>
	<CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
Message-ID: <B839F4C3-C225-40B2-B7B0-C2940A35B964@genetics.utah.edu>

Paul,

One other alternative is to use the UCSC table browser (http:// 
genome.ucsc.edu/cgi-bin/hgTables?command=start).  Select your  
organism, upload your ID list.  Select you output options.  You can  
download the coordinates or the fasta directly.  You have options for  
including or excluding various parts of the gene, and upstream/ 
downstream sequences.  This is similar the solution that Malcom  
suggested except the Ensembl option can be run repeatedly as perl  
code as he pointed out.  UCSC allows you to do remote connections to  
their MySQL server so you could set up a repeated task and more  
complex queries that way with the UCSC method.

Barry

On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote:

> How many, how often?
>
> Use ensembl biomart!
>
> First time interactively.
>
> Then if you to pipeline it, take the perl code it generates for you  
> and
> run it - of course you'll have to install the Ensembl Perl API....
>
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Paul N. Hengen
>> Sent: Wednesday, November 28, 2007 7:21 PM
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez  
>> IDs
>>
>>
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find
>> the start and end locations in the human genome. This seemed
>> simple enough, so I started working through some of the
>> examples for using the EntrezGene module at www.bioperl.org
>> Of course this did not work because the core installation
>> does not include this module. So, I think I have two choices
>> (1) install the module (how?), or (2) find an easier way to
>> get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research City of Hope
>> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010
>> USA mailto:paulhengen at coh.org
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
>> ntrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Nov 30 23:37:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 22:37:50 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from	CVS)
In-Reply-To: <000901c833bf$33d53500$0a02a8c0@AWALL>
References: <000901c833bf$33d53500$0a02a8c0@AWALL>
Message-ID: <75FF7E93-1633-4D43-9BC0-8BE2A6A7711D@uiuc.edu>

Make sure to keep this on the list.

ncbi_gi() is only in bioperl-live (CVS); my guess is you either  
somehow got 1.5.2 instead or the bioperl-live version is not found in  
your path.  It's very likely the latter, as perl's looking for  
whatever else is present (which appears to be an older version of  
bioperl). That should give you a hint that the problem may be with  
your lib path.  Try changing the 'Use lib '/home/awaller/bioperl-live/ 
Bio'' to:

use lib '/home/awaller/bioperl-live';

chris

On Nov 30, 2007, at 8:09 PM, alison waller wrote:

> Okay so Now I'm really confused.
> I edited > #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live/Bio.
> I ran the script below with the *special hit->ncbi from Chris.  It  
> worked,
> it was great, I got the gi! No errors, no bugs that I saw in  
> checking the
> output.  Then I went back in, edited the script to retrieve further  
> info
> (specifically the strand).  Saved it, now when I try to run it I get  
> the
> same error message that I was previously getting.
>
> barrett ~ $ perl blast_parse_awcf.pl OldMoBlastxGiTest.txt 1
> Can't locate object method "ncbi_gi" via package
> "Bio::Search::Hit::BlastHit" at blast_parse_awcf.pl line 50, <GEN1>  
> line
> 189.
>
> Thanks soo much,
>
>
> #!usr/bin/perl
>
> use strict;
> use warnings;
> use lib "/home/awaller/bioperl-live/Bio";
> use Bio::Perl;
> use Bio::SearchIO;
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits per
> query> \n"; if (@ARGV != 2) { die $usage; }
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                      # to report for each query
>
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
> my $report = Bio::SearchIO->new(
>  -file   => $infile,
>  -format => "blast"
> );
>
> print OUT join("\t",qw(
>              Query
>              HitDesc
>              HitAccess
>              HitGi
> 		HitBits
>              Evalue
>              %id
>              AlignLen
>              NumIdent
>              NumPos
>              gaps
>              Qframe
>              Qstrand
>              Hframe
> 		Hstrand))."\n";
>
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>  my $ct = 0;
>  my @tophits = grep {$ct++ < $tophit } $result->hits;
>  if (scalar(@tophits) == 0) {
>     print OUT "no hits\n";
>  }
>  for my $hit (@tophits) {
>     my $tophsp=$hit->hsp('best');
>     # Print some tab-delimited data about this hit
>     print OUT join("\t",
>                    $result->query_name,
>                    $hit->description,
>                    $hit->accession,
>                    $hit->ncbi_gi,
>                    $hit->bits,
>                    $tophsp->evalue,
>                    $tophsp->percent_identity,
>                    $tophsp->length('total'),
>                    $tophsp->num_identical,
>                    $tophsp->num_conserved,
>                    $tophsp->gaps,
>                    $tophsp->query->frame,
> 		      $tophsp->strand('query'),	
>                    $tophsp->hit->frame,
> 		      $tophsp->strand('hit'),	
>                   )."\n";
>  }
> }
>
>
>
>
> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Friday, November 30, 2007 6:24 PM
> To: alison waller
> Subject: Re: [Bioperl-l] Problems installing bioperl (bioperl-live  
> tarball
> from CVS)
>
> alison waller wrote:
>> Thank you Sendu,
>>
>> So I'm trying the second option.  I have downloaded the bioperl-live
> tarball
>> from the CVS on my windows laptop, and then moved it to my home  
>> directory
> in
>> the linux cluster where I unzipped and tared it.  So I now have a
> directory
>> /home/awaller/bioperl-live.
>>
>> I edited my .bashrc file as below:
>> Export PERL5LIB='/home/awaller/bioperl-live'
>>
>> I also edited a sample script to include:
>> #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live'
>
> Does this directory contain a 'Bio' directory with all the BioPerl
> modules inside it?
>
>
>> But it still isn't working.
>> At the prompt I typed$ perl script.pl
>> It gave me the warning - can't locate object method ncbi_gi which  
>> is why
> I'm
>> trying to download the CVS version as Chris Fields added code to  
>> make the
>> ncbi-gi object.
>
> You'll have to give me the complete, unedited error message and  
> ideally
> the script itself before I can help you further.
>
>
>> Don't I have to do something similar to what the Build.PL file does?
>
> Probably not. It doesn't matter where your perl executable is, btw, as
> long as the system knows how to run perl, which it obviously does.
> <OldMoBlastxGiTest.txt.parsed><OldMoBlastxGiTest.txt>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From barry.moore at genetics.utah.edu  Thu Nov  1 00:03:01 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 31 Oct 2007 22:03:01 -0600
Subject: [Bioperl-l] BLAST output parsing
In-Reply-To: <a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
References: <13519112.post@talk.nabble.com>
	<a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu>

Swapna-

If you are using NCBI fasta files you can use files from NCBIs gene  
database to map your gene IDs to names and organisms.  Look in  
particular at the files gene2accession, gene2refseq, and gene_info.   
For example, if you had RefSeq protein IDs like NP_123456, you could  
use gene2refseq to map those RefSeq accessions to gene IDs and then  
gene_info to map the gene IDs to organisms and gene name.

B

On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote:

> Swapna,
>
>> I am new to bioperl.  I did BLAST search of ~4000 genes and I need  
>> to parse
>> it.  I did use -m 9 option to get a tabular information of the  
>> blast data.
>> But it does not include the gene names or the names of the  
>> organisms of each
>> hit.  Are there any parsers that can do this job ??
>
> The -m 9 tabular output does not include gene descriptions and
> organisms. It only includes the "gene id" that was present immediately
> after the ">" sign in the FASTA file that was used to create the BLAST
> database you specified with the -d option when you ran BLAST.
>
> Hence, no parser will help you. You either have to re-do the BLAST
> with a different -m value that includes the information you desire, or
> write code to convert your gene IDs into what you want.
>
> --
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 05:45:43 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 10:45:43 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on
	windows
Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de>

Dear all,

I have emboss installed on a windows machine. (Embosswin). I can run
this from the dos command line and the path is present. However, when I 
try to call
an emboss application from bioperl I get a "Application not found error"


  my $f = Bio::Factory::EMBOSS->new();
  # get an EMBOSS application  object from the factory
  my $fuzznuc = $f->program('fuzznuc');
    $fuzznuc->run(
                  { -sequence  => $infile,
                        -pattern   => $motif,
                       -outfile   => $outfile                       
              });
 gives the following error

-------------------- WARNING ---------------------
MSG: Application [fuzznuc] is not available!
---------------------------------------------------
Can't call method "run" on an undefined value at searchPatterns.pl line 
102.

Can somebody help me fix this ?

best regards
Rohit

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 10:22:14 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:22:14 -0400
Subject: [Bioperl-l] PAML/Codeml parsing
Message-ID: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>

PAML4 breaks our PAML parser right now because the order of things in  
the result file has changed.  Now sequences precede the information  
about the version or the program run.  This means that $result- 
 >get_seqs() fails because we don't parse the sequences.

We'll see what we can do, but as usual with supporting 3rd party  
programs it is brittle when file formats change.  Th

-jason

--
Jason Stajich
jason at bioperl.org


From jason at bioperl.org  Thu Nov  1 10:32:06 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:32:06 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>

Presumably the PATH is not getting set properly - you should play  
around printing the $ENV{PATH} variable in a perl script to see if  
actually contains the directory where the emboss programs are  
installed.  Bioperl can only guess so much as to where to find an  
application.  It is also possible that we aren't creating the proper  
path to the executable - you can print the executable path with
print $fuzznuc->executable
I believe unless it is throwing an error at the program() line.

It looks like the code in the Factory object is a little fragile  
assuming that the programs HAVE to be in your $PATH.  I don't know if  
windows+perl is special in any way that it run things so I can't  
really tell if there is specific things you have to do here. You may  
have to run this through cygwin in case PATH and such are just not  
available properly to windowsPerl.

-jason
On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:

> Dear all,
>
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. However,  
> when I
> try to call
> an emboss application from bioperl I get a "Application not found  
> error"
>
>
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile
>               });
>  gives the following error
>
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at searchPatterns.pl  
> line
> 102.
>
> Can somebody help me fix this ?
>
> best regards
> Rohit
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Thu Nov  1 10:54:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 09:54:09 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu>

This worked for me previously when I tested with WinXP on my old  
machine using EMBOSS v5:

ftp://emboss.open-bio.org/pub/EMBOSS/windows

I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably  
better to use the latest EMBOSS version anyway so I suggest trying  
the version in the above link.  I'll test it again today and let you  
know what I find.

chris

On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote:

> Presumably the PATH is not getting set properly - you should play
> around printing the $ENV{PATH} variable in a perl script to see if
> actually contains the directory where the emboss programs are
> installed.  Bioperl can only guess so much as to where to find an
> application.  It is also possible that we aren't creating the proper
> path to the executable - you can print the executable path with
> print $fuzznuc->executable
> I believe unless it is throwing an error at the program() line.
>
> It looks like the code in the Factory object is a little fragile
> assuming that the programs HAVE to be in your $PATH.  I don't know if
> windows+perl is special in any way that it run things so I can't
> really tell if there is specific things you have to do here. You may
> have to run this through cygwin in case PATH and such are just not
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However,
>> when I
>> try to call
>> an emboss application from bioperl I get a "Application not found
>> error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl
>> line
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Thu Nov  1 11:31:40 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 11:31:40 -0400
Subject: [Bioperl-l] PAML3 vs 4
Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org>

Small tweaks were needed to parse PAML4 results.

Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly  
now on both PAML 3 and 4.
You'll need to get the latest code from CVS in order to see the  
changes to Bio/Tools/Phylo/PAML.pm

I've added tests for PAML4 in the parser and the run code.

If you have scripts that use codeml please give it a try.  I have not  
attempted to play with BASEML or AAML results at this point so if you  
also have codes that use those programs, please try it out and  
provide bugreports if we need to fix things.

-jason

--
Jason Stajich
jason at bioperl.org


From Kevin.M.Brown at asu.edu  Thu Nov  1 13:25:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 1 Nov 2007 10:25:30 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	onwindows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu>

Sounds like a path issue.  Try to tell bioperl the full path to the
executable rather than just the executable name. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 2:46 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl: cannot run emboss programs 
> using bioperl onwindows
> 
> Dear all,
> 
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. 
> However, when I 
> try to call
> an emboss application from bioperl I get a "Application not 
> found error"
> 
> 
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile                       
>               });
>  gives the following error
> 
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at 
> searchPatterns.pl line 
> 102.
> 
> Can somebody help me fix this ?
> 
> best regards
> Rohit
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 14:06:48 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:06:48 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de>


Thanks for all the suggestions... but I unfortunately still cannot run 
emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
and the
path is set correctly. I printed $ENV{$PATH} and this contains 
C:\EMBOSSwin which is the correct location.
I also tried setting the path directly but I'm not sure how to do this, 
so I tried this...

my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');

this also did not work.

Also tried printing...
$fuzznuc->executable()

gave the following error again
-------------------- WARNING ---------------------
MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
---------------------------------------------------

Any more ideas ?

thanks !
Rohit


here's the code...

use strict;
use Bio::Factory::EMBOSS;
use Data::Dumper;

#
# print "PATH=$ENV{PATH}\n";
# path contains C:\EMBOSSwin which is the correct location
# embossversion is 2.10.0-Win-0.8

 my $f = Bio::Factory::EMBOSS->new();
 # get an EMBOSS application  object from the factory
 print Dumper ($f);
 my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
as well,
 print Dump ($fuzznuc);
 
 #dump of fuzznuc
 #$VAR1 = bless( {
 #                '_programgroup' => {},
 #                '_programs' => {},
 #                '_groups' => {}
 #              }, 'Bio::Factory::EMBOSS' );
 
 #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
 
 my $infile = "temp.fasta";
 my $motif  = "ATGTCGATC";
 my $outfile = "test.out";

 
 $fuzznuc->run(
                  { -sequence  => $infile,
                    -pattern   => $motif,
                    -outfile   => $outfile                      
              });
    
Here's the error again....

#-------------------- WARNING ---------------------
#MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
#---------------------------------------------------


Jason Stajich wrote:
> Presumably the PATH is not getting set properly - you should play 
> around printing the $ENV{PATH} variable in a perl script to see if 
> actually contains the directory where the emboss programs are 
> installed.  Bioperl can only guess so much as to where to find an 
> application.  It is also possible that we aren't creating the proper 
> path to the executable - you can print the executable path with 
> print $fuzznuc->executable 
> I believe unless it is throwing an error at the program() line.  
>
> It looks like the code in the Factory object is a little fragile 
> assuming that the programs HAVE to be in your $PATH.  I don't know if 
> windows+perl is special in any way that it run things so I can't 
> really tell if there is specific things you have to do here. You may 
> have to run this through cygwin in case PATH and such are just not 
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However, when I 
>> try to call
>> an emboss application from bioperl I get a "Application not found error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile                       
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl line 
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 14:37:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 14:37:24 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>

You could try this - can't test it though so not sure.
my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\fuzznuc');

-jason
On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:

>
>
> Thanks for all the suggestions... but I unfortunately still cannot run
> emboss. I am running the latest version of embosswin  (2.10.0- 
> Win-0.8),
> and the
> path is set correctly. I printed $ENV{$PATH} and this contains
> C:\EMBOSSwin which is the correct location.
> I also tried setting the path directly but I'm not sure how to do  
> this,
> so I tried this...
>
> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>
> this also did not work.
>
> Also tried printing...
> $fuzznuc->executable()
>
> gave the following error again
> -------------------- WARNING ---------------------
> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> ---------------------------------------------------
>
> Any more ideas ?
>
> thanks !
> Rohit
>
>
> here's the code...
>
> use strict;
> use Bio::Factory::EMBOSS;
> use Data::Dumper;
>
> #
> # print "PATH=$ENV{PATH}\n";
> # path contains C:\EMBOSSwin which is the correct location
> # embossversion is 2.10.0-Win-0.8
>
>  my $f = Bio::Factory::EMBOSS->new();
>  # get an EMBOSS application  object from the factory
>  print Dumper ($f);
>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
> fuzznuc.exe
> as well,
>  print Dump ($fuzznuc);
>
>  #dump of fuzznuc
>  #$VAR1 = bless( {
>  #                '_programgroup' => {},
>  #                '_programs' => {},
>  #                '_groups' => {}
>  #              }, 'Bio::Factory::EMBOSS' );
>
>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>
>  my $infile = "temp.fasta";
>  my $motif  = "ATGTCGATC";
>  my $outfile = "test.out";
>
>
>  $fuzznuc->run(
>                   { -sequence  => $infile,
>                     -pattern   => $motif,
>                     -outfile   => $outfile
>               });
>
> Here's the error again....
>
> #-------------------- WARNING ---------------------
> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> #---------------------------------------------------
>
>
>
>
> Jason Stajich wrote:
>> Presumably the PATH is not getting set properly - you should play
>> around printing the $ENV{PATH} variable in a perl script to see if
>> actually contains the directory where the emboss programs are
>> installed.  Bioperl can only guess so much as to where to find an
>> application.  It is also possible that we aren't creating the proper
>> path to the executable - you can print the executable path with
>> print $fuzznuc->executable
>> I believe unless it is throwing an error at the program() line.
>>
>> It looks like the code in the Factory object is a little fragile
>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>> windows+perl is special in any way that it run things so I can't
>> really tell if there is specific things you have to do here. You may
>> have to run this through cygwin in case PATH and such are just not
>> available properly to windowsPerl.
>>
>> -jason
>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>
>>> Dear all,
>>>
>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>> this from the dos command line and the path is present. However,  
>>> when I
>>> try to call
>>> an emboss application from bioperl I get a "Application not found  
>>> error"
>>>
>>>
>>>   my $f = Bio::Factory::EMBOSS->new();
>>>   # get an EMBOSS application  object from the factory
>>>   my $fuzznuc = $f->program('fuzznuc');
>>>     $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                         -pattern   => $motif,
>>>                        -outfile   => $outfile
>>>               });
>>>  gives the following error
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Application [fuzznuc] is not available!
>>> ---------------------------------------------------
>>> Can't call method "run" on an undefined value at  
>>> searchPatterns.pl line
>>> 102.
>>>
>>> Can somebody help me fix this ?
>>>
>>> best regards
>>> Rohit
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 14:41:41 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:41:41 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de>

Hi Jason

I tried this as well. This also gives the same error message.

-Rohit

Jason Stajich wrote:
> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>
>>
>>
>> Thanks for all the suggestions... but I unfortunately still cannot run 
>> emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
>> and the
>> path is set correctly. I printed $ENV{$PATH} and this contains 
>> C:\EMBOSSwin which is the correct location.
>> I also tried setting the path directly but I'm not sure how to do this, 
>> so I tried this...
>>
>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>
>> this also did not work.
>>
>> Also tried printing...
>> $fuzznuc->executable()
>>
>> gave the following error again
>> -------------------- WARNING ---------------------
>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> ---------------------------------------------------
>>
>> Any more ideas ?
>>
>> thanks !
>> Rohit
>>
>>
>> here's the code...
>>
>> use strict;
>> use Bio::Factory::EMBOSS;
>> use Data::Dumper;
>>
>> #
>> # print "PATH=$ENV{PATH}\n";
>> # path contains C:\EMBOSSwin which is the correct location
>> # embossversion is 2.10.0-Win-0.8
>>
>>  my $f = Bio::Factory::EMBOSS->new();
>>  # get an EMBOSS application  object from the factory
>>  print Dumper ($f);
>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,
>>  print Dump ($fuzznuc);
>>
>>  #dump of fuzznuc
>>  #$VAR1 = bless( {
>>  #                '_programgroup' => {},
>>  #                '_programs' => {},
>>  #                '_groups' => {}
>>  #              }, 'Bio::Factory::EMBOSS' );
>>
>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>
>>  my $infile = "temp.fasta";
>>  my $motif  = "ATGTCGATC";
>>  my $outfile = "test.out";
>>
>>
>>  $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                     -pattern   => $motif,
>>                     -outfile   => $outfile                      
>>               });
>>
>> Here's the error again....
>>
>> #-------------------- WARNING ---------------------
>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> #---------------------------------------------------
>>
>>
>>
>>
>> Jason Stajich wrote:
>>> Presumably the PATH is not getting set properly - you should play 
>>> around printing the $ENV{PATH} variable in a perl script to see if 
>>> actually contains the directory where the emboss programs are 
>>> installed.  Bioperl can only guess so much as to where to find an 
>>> application.  It is also possible that we aren't creating the proper 
>>> path to the executable - you can print the executable path with 
>>> print $fuzznuc->executable 
>>> I believe unless it is throwing an error at the program() line.  
>>>
>>> It looks like the code in the Factory object is a little fragile 
>>> assuming that the programs HAVE to be in your $PATH.  I don't know if 
>>> windows+perl is special in any way that it run things so I can't 
>>> really tell if there is specific things you have to do here. You may 
>>> have to run this through cygwin in case PATH and such are just not 
>>> available properly to windowsPerl.
>>>
>>> -jason
>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>> this from the dos command line and the path is present. However, 
>>>> when I 
>>>> try to call
>>>> an emboss application from bioperl I get a "Application not found 
>>>> error"
>>>>
>>>>
>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>   # get an EMBOSS application  object from the factory
>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>     $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                         -pattern   => $motif,
>>>>                        -outfile   => $outfile                       
>>>>               });
>>>>  gives the following error
>>>>
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>> Can't call method "run" on an undefined value at searchPatterns.pl 
>>>> line 
>>>> 102.
>>>>
>>>> Can somebody help me fix this ?
>>>>
>>>> best regards
>>>> Rohit
>>>>
>>>> -- 
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de> 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From MEC at stowers-institute.org  Thu Nov  1 14:57:33 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 1 Nov 2007 13:57:33 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs
	usingbioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>


in the code
http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 

there is a call to `wossname` (c.f.
http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
)

is wossname in your path?

Maybe it needs to be wossname.exe under windows?


Malcolm Cook
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 1:42 PM
> To: Jason Stajich
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
> usingbioperlonwindows
> 
> Hi Jason
> 
> I tried this as well. This also gives the same error message.
> 
> -Rohit
> 
> Jason Stajich wrote:
> > You could try this - can't test it though so not sure.
> > my $fuzznuc = $f->program('fuzznuc');
> > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> >
> > -jason
> > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> >
> >>
> >>
> >> Thanks for all the suggestions... but I unfortunately still cannot 
> >> run emboss. I am running the latest version of embosswin  
> >> (2.10.0-Win-0.8), and the path is set correctly. I printed 
> >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
> >> location.
> >> I also tried setting the path directly but I'm not sure how to do 
> >> this, so I tried this...
> >>
> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >>
> >> this also did not work.
> >>
> >> Also tried printing...
> >> $fuzznuc->executable()
> >>
> >> gave the following error again
> >> -------------------- WARNING ---------------------
> >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> ---------------------------------------------------
> >>
> >> Any more ideas ?
> >>
> >> thanks !
> >> Rohit
> >>
> >>
> >> here's the code...
> >>
> >> use strict;
> >> use Bio::Factory::EMBOSS;
> >> use Data::Dumper;
> >>
> >> #
> >> # print "PATH=$ENV{PATH}\n";
> >> # path contains C:\EMBOSSwin which is the correct location # 
> >> embossversion is 2.10.0-Win-0.8
> >>
> >>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
> application  
> >> object from the factory  print Dumper ($f);  my $fuzznuc = 
> >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
> as well,  
> >> print Dump ($fuzznuc);
> >>
> >>  #dump of fuzznuc
> >>  #$VAR1 = bless( {
> >>  #                '_programgroup' => {},
> >>  #                '_programs' => {},
> >>  #                '_groups' => {}
> >>  #              }, 'Bio::Factory::EMBOSS' );
> >>
> >>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
> doesn't work
> >>
> >>  my $infile = "temp.fasta";
> >>  my $motif  = "ATGTCGATC";
> >>  my $outfile = "test.out";
> >>
> >>
> >>  $fuzznuc->run(
> >>                   { -sequence  => $infile,
> >>                     -pattern   => $motif,
> >>                     -outfile   => $outfile                      
> >>               });
> >>
> >> Here's the error again....
> >>
> >> #-------------------- WARNING ---------------------
> >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> #---------------------------------------------------
> >>
> >>
> >>
> >>
> >> Jason Stajich wrote:
> >>> Presumably the PATH is not getting set properly - you should play 
> >>> around printing the $ENV{PATH} variable in a perl script 
> to see if 
> >>> actually contains the directory where the emboss programs are 
> >>> installed.  Bioperl can only guess so much as to where to find an 
> >>> application.  It is also possible that we aren't creating 
> the proper 
> >>> path to the executable - you can print the executable path with 
> >>> print $fuzznuc->executable I believe unless it is 
> throwing an error 
> >>> at the program() line.
> >>>
> >>> It looks like the code in the Factory object is a little fragile 
> >>> assuming that the programs HAVE to be in your $PATH.  I 
> don't know 
> >>> if
> >>> windows+perl is special in any way that it run things so I can't
> >>> really tell if there is specific things you have to do 
> here. You may 
> >>> have to run this through cygwin in case PATH and such are 
> just not 
> >>> available properly to windowsPerl.
> >>>
> >>> -jason
> >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> I have emboss installed on a windows machine. (Embosswin). I can 
> >>>> run this from the dos command line and the path is present. 
> >>>> However, when I try to call an emboss application from bioperl I 
> >>>> get a "Application not found error"
> >>>>
> >>>>
> >>>>   my $f = Bio::Factory::EMBOSS->new();
> >>>>   # get an EMBOSS application  object from the factory
> >>>>   my $fuzznuc = $f->program('fuzznuc');
> >>>>     $fuzznuc->run(
> >>>>                   { -sequence  => $infile,
> >>>>                         -pattern   => $motif,
> >>>>                        -outfile   => $outfile            
>            
> >>>>               });
> >>>>  gives the following error
> >>>>
> >>>> -------------------- WARNING ---------------------
> >>>> MSG: Application [fuzznuc] is not available!
> >>>> ---------------------------------------------------
> >>>> Can't call method "run" on an undefined value at 
> searchPatterns.pl 
> >>>> line 102.
> >>>>
> >>>> Can somebody help me fix this ?
> >>>>
> >>>> best regards
> >>>> Rohit
> >>>>
> >>>> --
> >>>>
> >>>> Dr. Rohit Ghai
> >>>> Institute of Medical Microbiology
> >>>> Faculty of Medicine
> >>>> Justus-Liebig University
> >>>> Frankfurter Strasse 107
> >>>> 35392 - Giessen
> >>>> GERMANY
> >>>>
> >>>> Tel  : 0049 (0)641-9946413
> >>>> Fax  : 0049 (0)641-9946409
> >>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org 
> <mailto:Bioperl-l at lists.open-bio.org>
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> --
> >>> Jason Stajich
> >>> jason at bioperl.org <mailto:jason at bioperl.org>
> >>>
> >>
> >> --
> >>
> >> Dr. Rohit Ghai
> >> Institute of Medical Microbiology
> >> Faculty of Medicine
> >> Justus-Liebig University
> >> Frankfurter Strasse 107
> >> 35392 - Giessen
> >> GERMANY
> >>
> >> Tel  : 0049 (0)641-9946413
> >> Fax  : 0049 (0)641-9946409
> >> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> >
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From arareko at campus.iztacala.unam.mx  Thu Nov  1 15:51:41 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Nov 2007 13:51:41 -0600
Subject: [Bioperl-l] bioperl: cannot run emboss
	programs	usingbioperlonwindows
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx>

Doesn't EMBOSS binaries live under 'bin'? Perhaps setting 
PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this:

my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc');

Adding .exe might be worth trying as well.

Mauricio.

Cook, Malcolm wrote:
> in the code
> http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 
> 
> there is a call to `wossname` (c.f.
> http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
> )
> 
> is wossname in your path?
> 
> Maybe it needs to be wossname.exe under windows?
> 
> 
> Malcolm Cook
>   
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
>> Sent: Thursday, November 01, 2007 1:42 PM
>> To: Jason Stajich
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
>> usingbioperlonwindows
>>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot 
>>>> run emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8), and the path is set correctly. I printed 
>>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
>>>> location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this, so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location # 
>>>> embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
>> application  
>>>> object from the factory  print Dumper ($f);  my $fuzznuc = 
>>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,  
>>>> print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
>> doesn't work
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile                      
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play 
>>>>> around printing the $ENV{PATH} variable in a perl script 
>> to see if 
>>>>> actually contains the directory where the emboss programs are 
>>>>> installed.  Bioperl can only guess so much as to where to find an 
>>>>> application.  It is also possible that we aren't creating 
>> the proper 
>>>>> path to the executable - you can print the executable path with 
>>>>> print $fuzznuc->executable I believe unless it is 
>> throwing an error 
>>>>> at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile 
>>>>> assuming that the programs HAVE to be in your $PATH.  I 
>> don't know 
>>>>> if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do 
>> here. You may 
>>>>> have to run this through cygwin in case PATH and such are 
>> just not 
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can 
>>>>>> run this from the dos command line and the path is present. 
>>>>>> However, when I try to call an emboss application from bioperl I 
>>>>>> get a "Application not found error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile            
>>            
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at 
>> searchPatterns.pl 
>>>>>> line 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Dr. Rohit Ghai
>>>>>> Institute of Medical Microbiology
>>>>>> Faculty of Medicine
>>>>>> Justus-Liebig University
>>>>>> Frankfurter Strasse 107
>>>>>> 35392 - Giessen
>>>>>> GERMANY
>>>>>>
>>>>>> Tel  : 0049 (0)641-9946413
>>>>>> Fax  : 0049 (0)641-9946409
>>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org 
>> <mailto:Bioperl-l at lists.open-bio.org>
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> --
>>>>> Jason Stajich
>>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>>
>>>> --
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Nov  1 16:07:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 15:07:39 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>

I did a little investigating using my old PC and was able to get  
fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a  
hoop or two but I managed to get it working.

First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.   
You need to remove EMBOSSWin and install the one I linked to  
previously (this is an actual EMBOSS beta release).  It's possible  
older EMBOSSWin can be configured, but I don't plan on checking it  
out myself.

Next, you need to ensure the binaries are in your PATH env. variable  
(test by running 'wossname' on the command line), then set  
EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like  
path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me  
and WinXP recognizes the UNIX'y form as a valid path.  If you don't  
know how to set env. variables go here:

http://vlaurie.com/computers2/Articles/environment.htm

Once that is set up you should be able to run the script using the  
latest (greatest?) EMBOSS.

chris

On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:

> Hi Jason
>
> I tried this as well. This also gives the same error message.
>
> -Rohit
>
> Jason Stajich wrote:
>> You could try this - can't test it though so not sure.
>> my $fuzznuc = $f->program('fuzznuc');
>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>
>> -jason
>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>
>>>
>>>
>>> Thanks for all the suggestions... but I unfortunately still  
>>> cannot run
>>> emboss. I am running the latest version of embosswin  (2.10.0- 
>>> Win-0.8),
>>> and the
>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>> C:\EMBOSSwin which is the correct location.
>>> I also tried setting the path directly but I'm not sure how to do  
>>> this,
>>> so I tried this...
>>>
>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>
>>> this also did not work.
>>>
>>> Also tried printing...
>>> $fuzznuc->executable()
>>>
>>> gave the following error again
>>> -------------------- WARNING ---------------------
>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> ---------------------------------------------------
>>>
>>> Any more ideas ?
>>>
>>> thanks !
>>> Rohit
>>>
>>>
>>> here's the code...
>>>
>>> use strict;
>>> use Bio::Factory::EMBOSS;
>>> use Data::Dumper;
>>>
>>> #
>>> # print "PATH=$ENV{PATH}\n";
>>> # path contains C:\EMBOSSwin which is the correct location
>>> # embossversion is 2.10.0-Win-0.8
>>>
>>>  my $f = Bio::Factory::EMBOSS->new();
>>>  # get an EMBOSS application  object from the factory
>>>  print Dumper ($f);
>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
>>> fuzznuc.exe
>>> as well,
>>>  print Dump ($fuzznuc);
>>>
>>>  #dump of fuzznuc
>>>  #$VAR1 = bless( {
>>>  #                '_programgroup' => {},
>>>  #                '_programs' => {},
>>>  #                '_groups' => {}
>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>
>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't  
>>> work
>>>
>>>  my $infile = "temp.fasta";
>>>  my $motif  = "ATGTCGATC";
>>>  my $outfile = "test.out";
>>>
>>>
>>>  $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                     -pattern   => $motif,
>>>                     -outfile   => $outfile
>>>               });
>>>
>>> Here's the error again....
>>>
>>> #-------------------- WARNING ---------------------
>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> #---------------------------------------------------
>>>
>>>
>>>
>>>
>>> Jason Stajich wrote:
>>>> Presumably the PATH is not getting set properly - you should play
>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>> actually contains the directory where the emboss programs are
>>>> installed.  Bioperl can only guess so much as to where to find an
>>>> application.  It is also possible that we aren't creating the  
>>>> proper
>>>> path to the executable - you can print the executable path with
>>>> print $fuzznuc->executable
>>>> I believe unless it is throwing an error at the program() line.
>>>>
>>>> It looks like the code in the Factory object is a little fragile
>>>> assuming that the programs HAVE to be in your $PATH.  I don't  
>>>> know if
>>>> windows+perl is special in any way that it run things so I can't
>>>> really tell if there is specific things you have to do here. You  
>>>> may
>>>> have to run this through cygwin in case PATH and such are just not
>>>> available properly to windowsPerl.
>>>>
>>>> -jason
>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have emboss installed on a windows machine. (Embosswin). I  
>>>>> can run
>>>>> this from the dos command line and the path is present. However,
>>>>> when I
>>>>> try to call
>>>>> an emboss application from bioperl I get a "Application not found
>>>>> error"
>>>>>
>>>>>
>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>   # get an EMBOSS application  object from the factory
>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>     $fuzznuc->run(
>>>>>                   { -sequence  => $infile,
>>>>>                         -pattern   => $motif,
>>>>>                        -outfile   => $outfile
>>>>>               });
>>>>>  gives the following error
>>>>>
>>>>> -------------------- WARNING ---------------------
>>>>> MSG: Application [fuzznuc] is not available!
>>>>> ---------------------------------------------------
>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>> line
>>>>> 102.
>>>>>
>>>>> Can somebody help me fix this ?
>>>>>
>>>>> best regards
>>>>> Rohit
>>>>>
>>>>> -- 
>>>>>
>>>>> Dr. Rohit Ghai
>>>>> Institute of Medical Microbiology
>>>>> Faculty of Medicine
>>>>> Justus-Liebig University
>>>>> Frankfurter Strasse 107
>>>>> 35392 - Giessen
>>>>> GERMANY
>>>>>
>>>>> Tel  : 0049 (0)641-9946413
>>>>> Fax  : 0049 (0)641-9946409
>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From neetisomaiya at gmail.com  Fri Nov  2 00:20:27 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 2 Nov 2007 09:50:27 +0530
Subject: [Bioperl-l] need help
Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>

Hi,

This is a perl question, not bioperl.
Can anyone point me to a perl program/code/function which can calculate the
number of days between any two given dates.
Any help will be deeply appreciated.
Thanks.

-- 
-Neeti
Even my blood says, B positive


From whs at ebi.ac.uk  Fri Nov  2 01:01:20 2007
From: whs at ebi.ac.uk (Will Spooner)
Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT)
Subject: [Bioperl-l] need help
In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0711020459530.17670@parrot.ebi.ac.uk>

Hi Neeti,

A non-bioperl answer to your perl questio; Date::Calc should do the trick.

Will

On Fri, 2 Nov 2007, neeti somaiya wrote:

> Hi,
>
> This is a perl question, not bioperl.
> Can anyone point me to a perl program/code/function which can calculate the
> number of days between any two given dates.
> Any help will be deeply appreciated.
> Thanks.
>
>


From smarkel at accelrys.com  Sat Nov  3 02:01:38 2007
From: smarkel at accelrys.com (Scott Markel)
Date: Fri, 2 Nov 2007 23:01:38 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlon	windows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <OFD3D05334.F9E235EF-ON88257388.00209BED-88257388.00211BD7@accelrys.com>

I set multiple environment variables in my code.

    $ENV{EMBOSS_ROOT}    = $embossPath;
    $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); 
    $ENV{EMBOSS_DB_DIR}  = File::Spec->catdir($embossPath, "test");
    $ENV{EMBOSS_DATA}    = File::Spec->catdir($embossPath, "data"); 
    $ENV{PATH}           = $embossPath; 

I found it necessary to set both PATH and EMBOSS_ROOT.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com


bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24:

> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> 
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> 
> >
> >
> > Thanks for all the suggestions... but I unfortunately still cannot run
> > emboss. I am running the latest version of embosswin  (2.10.0- 
> > Win-0.8),
> > and the
> > path is set correctly. I printed $ENV{$PATH} and this contains
> > C:\EMBOSSwin which is the correct location.
> > I also tried setting the path directly but I'm not sure how to do 
> > this,
> > so I tried this...
> >
> > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >
> > this also did not work.
> >
> > Also tried printing...
> > $fuzznuc->executable()
> >
> > gave the following error again
> > -------------------- WARNING ---------------------
> > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > ---------------------------------------------------
> >
> > Any more ideas ?
> >
> > thanks !
> > Rohit
> >
> >
> > here's the code...
> >
> > use strict;
> > use Bio::Factory::EMBOSS;
> > use Data::Dumper;
> >
> > #
> > # print "PATH=$ENV{PATH}\n";
> > # path contains C:\EMBOSSwin which is the correct location
> > # embossversion is 2.10.0-Win-0.8
> >
> >  my $f = Bio::Factory::EMBOSS->new();
> >  # get an EMBOSS application  object from the factory
> >  print Dumper ($f);
> >  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
> > fuzznuc.exe
> > as well,
> >  print Dump ($fuzznuc);
> >
> >  #dump of fuzznuc
> >  #$VAR1 = bless( {
> >  #                '_programgroup' => {},
> >  #                '_programs' => {},
> >  #                '_groups' => {}
> >  #              }, 'Bio::Factory::EMBOSS' );
> >
> >  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
> >
> >  my $infile = "temp.fasta";
> >  my $motif  = "ATGTCGATC";
> >  my $outfile = "test.out";
> >
> >
> >  $fuzznuc->run(
> >                   { -sequence  => $infile,
> >                     -pattern   => $motif,
> >                     -outfile   => $outfile
> >               });
> >
> > Here's the error again....
> >
> > #-------------------- WARNING ---------------------
> > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > #---------------------------------------------------
> >
> >
> >
> >
> > Jason Stajich wrote:
> >> Presumably the PATH is not getting set properly - you should play
> >> around printing the $ENV{PATH} variable in a perl script to see if
> >> actually contains the directory where the emboss programs are
> >> installed.  Bioperl can only guess so much as to where to find an
> >> application.  It is also possible that we aren't creating the proper
> >> path to the executable - you can print the executable path with
> >> print $fuzznuc->executable
> >> I believe unless it is throwing an error at the program() line.
> >>
> >> It looks like the code in the Factory object is a little fragile
> >> assuming that the programs HAVE to be in your $PATH.  I don't know if
> >> windows+perl is special in any way that it run things so I can't
> >> really tell if there is specific things you have to do here. You may
> >> have to run this through cygwin in case PATH and such are just not
> >> available properly to windowsPerl.
> >>
> >> -jason
> >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>
> >>> Dear all,
> >>>
> >>> I have emboss installed on a windows machine. (Embosswin). I can run
> >>> this from the dos command line and the path is present. However, 
> >>> when I
> >>> try to call
> >>> an emboss application from bioperl I get a "Application not found 
> >>> error"
> >>>
> >>>
> >>>   my $f = Bio::Factory::EMBOSS->new();
> >>>   # get an EMBOSS application  object from the factory
> >>>   my $fuzznuc = $f->program('fuzznuc');
> >>>     $fuzznuc->run(
> >>>                   { -sequence  => $infile,
> >>>                         -pattern   => $motif,
> >>>                        -outfile   => $outfile
> >>>               });
> >>>  gives the following error
> >>>
> >>> -------------------- WARNING ---------------------
> >>> MSG: Application [fuzznuc] is not available!
> >>> ---------------------------------------------------
> >>> Can't call method "run" on an undefined value at 
> >>> searchPatterns.pl line
> >>> 102.
> >>>
> >>> Can somebody help me fix this ?
> >>>
> >>> best regards
> >>> Rohit
> >>>
> >>> -- 
> >>>
> >>> Dr. Rohit Ghai
> >>> Institute of Medical Microbiology
> >>> Faculty of Medicine
> >>> Justus-Liebig University
> >>> Frankfurter Strasse 107
> >>> 35392 - Giessen
> >>> GERMANY
> >>>
> >>> Tel  : 0049 (0)641-9946413
> >>> Fax  : 0049 (0)641-9946409
> >>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason at bioperl.org <mailto:jason at bioperl.org>
> >>
> >
> > -- 
> >
> > Dr. Rohit Ghai
> > Institute of Medical Microbiology
> > Faculty of Medicine
> > Justus-Liebig University
> > Frankfurter Strasse 107
> > 35392 - Giessen
> > GERMANY
> >
> > Tel  :   0049 (0)641-9946413
> > Fax  :   0049 (0)641-9946409
> > Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Sat Nov  3 10:07:52 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Sat, 03 Nov 2007 15:07:52 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but
actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any
path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it
runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or
am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct 
version 5.0 (uses embossversion -auto internally, and seems to know 
where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ 
or with exe suffix doesn't work
    #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a 
Bio::Tools::Run::EMBOSSApplication object.
   
   
    #however, creating a EMBOSSApplication object directly makes it 
possible to run the program
    #
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


Chris Fields wrote:
> I did a little investigating using my old PC and was able to get 
> fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a 
> hoop or two but I managed to get it working.
>
> First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.  
> You need to remove EMBOSSWin and install the one I linked to 
> previously (this is an actual EMBOSS beta release).  It's possible 
> older EMBOSSWin can be configured, but I don't plan on checking it out 
> myself.
>
> Next, you need to ensure the binaries are in your PATH env. variable 
> (test by running 'wossname' on the command line), then set EMBOSS_DATA 
> to point at the EMBOSS data directory using a UNIX-like path (i.e. 
> 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP 
> recognizes the UNIX'y form as a valid path.  If you don't know how to 
> set env. variables go here:
>
> http://vlaurie.com/computers2/Articles/environment.htm
>
> Once that is set up you should be able to run the script using the 
> latest (greatest?) EMBOSS.
>
> chris
>
> On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:
>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot run
>>>> emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8),
>>>> and the
>>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>>> C:\EMBOSSwin which is the correct location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this,
>>>> so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location
>>>> # embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();
>>>>  # get an EMBOSS application  object from the factory
>>>>  print Dumper ($f);
>>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
>>>> fuzznuc.exe
>>>> as well,
>>>>  print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>>>
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play
>>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>>> actually contains the directory where the emboss programs are
>>>>> installed.  Bioperl can only guess so much as to where to find an
>>>>> application.  It is also possible that we aren't creating the proper
>>>>> path to the executable - you can print the executable path with
>>>>> print $fuzznuc->executable
>>>>> I believe unless it is throwing an error at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile
>>>>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do here. You may
>>>>> have to run this through cygwin in case PATH and such are just not
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>>>> this from the dos command line and the path is present. However,
>>>>>> when I
>>>>>> try to call
>>>>>> an emboss application from bioperl I get a "Application not found
>>>>>> error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>>> line
>>>>>> 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> -- 
>>>>>>
>
>


From hlapp at gmx.net  Sun Nov  4 12:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 4 Nov 2007 12:42:13 -0500
Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript
In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net>

Hi Stefanie,

sorry for taking so long to respond - your email got buried in a pile  
while I was away on travel. The Bio::SeqFeature::Gene::* modules were  
written mostly with the motivation to have a model that can represent  
the results of gene predictors.

GenBank AFAIK doesn't annotate introns explicitly, though they should  
be implicit from cDNA (or mRNA? or gene, as you say) features on  
genomic sequence. The Bioperl SeqIO parsers won't transform those  
into a Bio::SeqFeature::Gene-based model, but instead will yield just  
plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent  
processing to build these into more hierarchical models.

I'm not sure whether someone's done this already for GenBank-type  
feature tables. There is a Unflattener that at least attempts to  
build a feature hierarchy from the flat array that's compliant with  
the Sequence Ontology (or so I recall).

I'm copying the list in case others have additional suggestions.

	-hilmar

On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote:

>
>
> Hello Hilmar,
>
> I have a question about your bioperl module  
> Bio::SeqFeature::Gene::Transcript:
>
> I can't figure out how to generate the $gene object for use in this  
> line:
> @introns = $gene->introns();
>
> The data I'm working with is a local file in genbank format, and  
> I'm interested in extracting intron sequences (and maybe flanking  
> exons) for certain genes. I have been trying to get the introns via  
> the sequence features ('CDS' or 'gene'), but this has not been  
> working. Which approach will I have to take?
> I'd be very grateful if you could point me into the right direction!
>
> Hope things are going well in Durham! And thank you in advance!
>
> Stefanie
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From downloadondemand at gmail.com  Sun Nov  4 13:39:42 2007
From: downloadondemand at gmail.com (download on demand)
Date: Sun, 4 Nov 2007 20:39:42 +0200
Subject: [Bioperl-l] Help with Bio::SeqIO
Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>

Hi to all.

I have a problem with a simplest script:


         use Bio::SeqIO;
         # get command-line arguments, or die with a usage statement
         my $usage = "x2y.pl infile infileformat outfile outfileformat\n";
         my $infile = shift or die $usage;
         my $infileformat = shift or die $usage;
#         my $outfile = shift or die $usage;
         my $outfileformat = shift or die $usage;

         # create one SeqIO object to read in,and another to write out
         my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => $infileformat);
         my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
                                       '-format' => $outfileformat);

         # write each entry in the input file to the output file
         while (my $inseq = $seq_in->next_seq) {

#            $seq_out->write_seq($inseq); # Whole sequence not needed

for my $feat_object ($inseq->get_SeqFeatures)
    {
    if ($feat_object->primary_tag eq "CDS")
        {
        print $feat_object->get_tag_values('product'),"\n";
        print
$feat_object->location->start,"..",$feat_object->location->end,"\n";
        print $feat_object->spliced_seq->seq,"\n\n";
        }
    }


The result seems OK to me, but in case of first CDS of NC_005213.gbk from
here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/> the
output is wrong:

It is:
hypothetical protein
1..490885
TAAATGCGATTGCTATTAGAA..................................Truncated
sequence...................................

Should be:
hypothetical protein
879..490883
ATGCGATTGCTATTAGAA...................................Truncated
sequence....................................TAA


This CDS have an unnatural location string:
CDS             complement(join(490883..490885,1..879)), but spliced_seq
should handle these things?

Please help me!
Best regards, N.


From cjfields at uiuc.edu  Sun Nov  4 19:08:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Nov 2007 18:08:34 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>

Pass in (-nosort => 1) to spliced_seq:

print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";

This ensures no sorting of sublocations occurs, if you want for  
instance typical GenBank/EMBL 'join' behavior.

To the other devs: shouldn't -nosort be the default behavior when the  
split location is a 'join'?  In other words, should spliced_seq() be  
modified to take into account the split location type when returning  
sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly  
indicates the order of the sequences is important when joined  
together; the current behavior is more like that for 'order'.

chris

On Nov 4, 2007, at 12:39 PM, download on demand wrote:

> Hi to all.
>
> I have a problem with a simplest script:
>
>
>
>          use Bio::SeqIO;
>          # get command-line arguments, or die with a usage statement
>          my $usage = "x2y.pl infile infileformat outfile  
> outfileformat\n";
>          my $infile = shift or die $usage;
>          my $infileformat = shift or die $usage;
> #         my $outfile = shift or die $usage;
>          my $outfileformat = shift or die $usage;
>
>          # create one SeqIO object to read in,and another to write out
>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => $infileformat);
>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>                                        '-format' => $outfileformat);
>
>          # write each entry in the input file to the output file
>          while (my $inseq = $seq_in->next_seq) {
>
> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>
> for my $feat_object ($inseq->get_SeqFeatures)
>     {
>     if ($feat_object->primary_tag eq "CDS")
>         {
>         print $feat_object->get_tag_values('product'),"\n";
>         print
> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>         print $feat_object->spliced_seq->seq,"\n\n";
>         }
>     }
>
>
>
> The result seems OK to me, but in case of first CDS of  
> NC_005213.gbk from
> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/ 
> > the
> output is wrong:
>
> It is:
> hypothetical protein
> 1..490885
> TAAATGCGATTGCTATTAGAA..................................Truncated
> sequence...................................
>
> Should be:
> hypothetical protein
> 879..490883
> ATGCGATTGCTATTAGAA...................................Truncated
> sequence....................................TAA
>
>
>
> This CDS have an unnatural location string:
> CDS             complement(join(490883..490885,1..879)), but  
> spliced_seq
> should handle these things?
>
> Please help me!
> Best regards, N.
> _______________________________________________
>


From jean-luc.jany at univ-brest.fr  Mon Nov  5 03:26:52 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Mon, 05 Nov 2007 09:26:52 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to
	blastall
Message-ID: <472ED3CC.2050305@univ-brest.fr>

Dear Bioperl and Mac users,

I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables.

I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?)

Actually, my blast file is in myname directory and comprises a /bin and  a /data file. I have got my blastall and other executables in myname/blast/bin/blastall.

Thank you in anticipation for your help.

Jean-Luc


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Mon Nov  5 06:36:16 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Mon, 05 Nov 2007 12:36:16 +0100
Subject: [Bioperl-l] bioperl and emboss on windows
Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #
    # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work
    # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object.
    #
    #
    #
    # however, creating a EMBOSSApplication object directly makes it possible to run the program
    #
    
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


From neetisomaiya at gmail.com  Mon Nov  5 07:20:04 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 5 Nov 2007 17:50:04 +0530
Subject: [Bioperl-l] perl question
Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>

Again a perl question, and maybe a very trivial one.
How do I terminate a number like 3.1232010098 to only 3 decimal places in
perl?

-- 
-Neeti
Even my blood says, B positive


From biology0046 at hotmail.com  Mon Nov  5 07:16:13 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Mon, 05 Nov 2007 12:16:13 +0000
Subject: [Bioperl-l] how to extract intron information from gff files.
Message-ID: <BLU108-F34DC66B7BB1B9063DA2BC8B4880@phx.gbl>

Dear all:

i got a poplar genome gff file like this:
LG_I	src	exon	2598	3280	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	2598	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 4
LG_I	src	start_codon	3278	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	stop_codon	2598	2600	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	exon	3544	3918	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	3544	3918	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 3
LG_I	src	exon	4258	4740	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	4258	4740	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 2
LG_I	src	exon	5344	6388	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	5344	6388	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 1
LG_I	src	exon	8259	8528	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8259	8528	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 3
LG_I	src	stop_codon	8259	8261	.	-	0	name "fgenesh1_pg.C_LG_I000002"
LG_I	src	exon	8897	8987	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8897	8987	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 2
LG_I	src	exon	9831	9892	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	9831	9892	.	-	1	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 1
LG_I	src	start_codon	9890	9892	.	-	0	name "fgenesh1_pg.C_LG_I000002"

I try to use Bio::DB::GFF, but this module only applies to methods given in 
the gff file.
what i want to get is "intron, 5utr, 3utr", but this information do not 
contain in this gff file.

how can i get these information through bioperl? This file do not contain 
intron information
if i consider gaps between exons as introns, non cds parts of the first and 
last exon as utrs, how can i extract them through this gff file.

Thanks~~

Wenkai

_________________________________________________________________
?????????????????????????????? MSN Hotmail??  http://www.hotmail.com  


From spiros at lokku.com  Mon Nov  5 07:36:36 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 5 Nov 2007 12:36:36 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <bba689ec0711050436r6016ae57le78db531f9eab55b@mail.gmail.com>

Hey,

use the `sprintf` function. More information can be found at ,
http://perldoc.perl.org/functions/sprintf.html.

For more proper rounding, you could use the Math::Round module,
http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm.

hope this helps,
spiros

On 11/5/07, neeti somaiya <neetisomaiya at gmail.com> wrote:
>
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ak at ebi.ac.uk  Mon Nov  5 07:43:06 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 12:43:06 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <20071105124305.GC4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

When displaying:

  printf( "The number is %.3f\n", $number );

When making a string:

  my $string = sprintf( "%.3f", $number );


BTW, this is cutting, not rounding.


Cheers,
Andreas


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------


From t.nugent at cs.ucl.ac.uk  Mon Nov  5 07:37:15 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 05 Nov 2007 12:37:15 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F0E7B.60303@cs.ucl.ac.uk>

Use Math:Round and nearest_ceil:

http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
>   

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From bix at sendu.me.uk  Mon Nov  5 07:47:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 12:47:17 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F10D5.5060006@sendu.me.uk>

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

Please don't use this list to ask general Perl questions.
See these instead:

http://perldoc.perl.org/perlfaq4.html
http://lists.cpan.org/
http://www.perlmonks.org/


$rounded = sprintf("%.3f", $number);


From Marc.Logghe at DEVGEN.com  Mon Nov  5 07:39:36 2007
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Mon, 5 Nov 2007 13:39:36 +0100
Subject: [Bioperl-l] perl question
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com>

Hi,
Have a look at
http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w
idth

In your particular case:
my $f = 3.1232010098;
printf "%0.3f", $f;


HTH,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> neeti somaiya
> Sent: Monday, November 05, 2007 1:20 PM
> To: bioperl-l
> Subject: [Bioperl-l] perl question
> 
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 
> decimal places in perl?
> 
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From bix at sendu.me.uk  Mon Nov  5 08:24:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 13:24:25 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <20071105124305.GC4491@ebi.ac.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk>
Message-ID: <472F1989.90105@sendu.me.uk>

Andreas Kahari wrote:
> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
>> Again a perl question, and maybe a very trivial one.
>> How do I terminate a number like 3.1232010098 to only 3 decimal places in
>> perl?
> 
> When displaying:
> 
>   printf( "The number is %.3f\n", $number );
> 
> When making a string:
> 
>   my $string = sprintf( "%.3f", $number );
> 
> 
> BTW, this is cutting, not rounding.

(s)printf rounds (ie. doesn't simply truncate), though for critical 
applications you should use your own rounding algorithm.


From ak at ebi.ac.uk  Mon Nov  5 08:56:24 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 13:56:24 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <472F1989.90105@sendu.me.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk>
Message-ID: <20071105135624.GD4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote:
> Andreas Kahari wrote:
> > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> >> Again a perl question, and maybe a very trivial one.
> >> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> >> perl?
> > 
> > When displaying:
> > 
> >   printf( "The number is %.3f\n", $number );
> > 
> > When making a string:
> > 
> >   my $string = sprintf( "%.3f", $number );
> > 
> > 
> > BTW, this is cutting, not rounding.
> 
> (s)printf rounds (ie. doesn't simply truncate), though for critical 
> applications you should use your own rounding algorithm.

They do indeed.  Mea culpa.


Andreas

-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------


From jay at jays.net  Mon Nov  5 10:14:17 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 10:14:17 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net>

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
> To the other devs: shouldn't -nosort be the default behavior when the
> split location is a 'join'?

I certainly think so.

> In other words, should spliced_seq() be
> modified to take into account the split location type when returning
> sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly
> indicates the order of the sequences is important when joined
> together; the current behavior is more like that for 'order'.

I don't see any value to the sorting algorithm. All tests invoke - 
nosort => 1 (except a phase test where nosort doesn't matter anyway).  
In my limited experience the sorting only serves to break real-world  
splicing.

If there is no valid use then we can remove ~20 lines from  
SeqFeatureI.pm circa line 505. If there is a valid use and someone  
would be so kind as to educate me I'd be happy to add tests which  
demonstrate them.  :)

P.S.  CSHL is neato. I plan on understanding some of this stuff some  
day.  :)

j
http://www.bioperl.org/wiki/User:Jhannah


From hlapp at duke.edu  Mon Nov  5 11:03:16 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 11:03:16 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>

I agree that there should be a meaningful default that results in  
"doing the right thing" in most cases if the user doesn't intervene.  
I'm not sure I understand all the details, but it sounds sorting or  
not sorting should depend on the split location type unless the user  
overrides it by argument. That's what you're suggesting, right?

	-hilmar

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:

> Pass in (-nosort => 1) to spliced_seq:
>
> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>
> This ensures no sorting of sublocations occurs, if you want for  
> instance typical GenBank/EMBL 'join' behavior.
>
> To the other devs: shouldn't -nosort be the default behavior when  
> the split location is a 'join'?  In other words, should spliced_seq 
> () be modified to take into account the split location type when  
> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'  
> explicitly indicates the order of the sequences is important when  
> joined together; the current behavior is more like that for 'order'.
>
> chris
>
> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>
>> Hi to all.
>>
>> I have a problem with a simplest script:
>>
>>
>>
>>          use Bio::SeqIO;
>>          # get command-line arguments, or die with a usage statement
>>          my $usage = "x2y.pl infile infileformat outfile  
>> outfileformat\n";
>>          my $infile = shift or die $usage;
>>          my $infileformat = shift or die $usage;
>> #         my $outfile = shift or die $usage;
>>          my $outfileformat = shift or die $usage;
>>
>>          # create one SeqIO object to read in,and another to write  
>> out
>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>                                       '-format' => $infileformat);
>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>                                        '-format' => $outfileformat);
>>
>>          # write each entry in the input file to the output file
>>          while (my $inseq = $seq_in->next_seq) {
>>
>> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>>
>> for my $feat_object ($inseq->get_SeqFeatures)
>>     {
>>     if ($feat_object->primary_tag eq "CDS")
>>         {
>>         print $feat_object->get_tag_values('product'),"\n";
>>         print
>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>         print $feat_object->spliced_seq->seq,"\n\n";
>>         }
>>     }
>>
>>
>>
>> The result seems OK to me, but in case of first CDS of  
>> NC_005213.gbk from
>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ 
>> Nanoarchaeum_equitans/> the
>> output is wrong:
>>
>> It is:
>> hypothetical protein
>> 1..490885
>> TAAATGCGATTGCTATTAGAA..................................Truncated
>> sequence...................................
>>
>> Should be:
>> hypothetical protein
>> 879..490883
>> ATGCGATTGCTATTAGAA...................................Truncated
>> sequence....................................TAA
>>
>>
>>
>> This CDS have an unnatural location string:
>> CDS             complement(join(490883..490885,1..879)), but  
>> spliced_seq
>> should handle these things?
>>
>> Please help me!
>> Best regards, N.
>> _______________________________________________
>>
>
>
>


From bernd.web at gmail.com  Mon Nov  5 11:53:01 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 5 Nov 2007 17:53:01 +0100
Subject: [Bioperl-l] PSI-BLAST
Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com>

Hi,

Is it possible with SearchIO to select a specific iteration (Results
from round i) part of the PSI-blast report, when parsing this with
SearchIO::blast?
It seems the parser parses the complete report. If not implemented I
could of course extract the specific part of the psi-blast report and
then give it too SearchIO (e.g. with IO::String), but maybe I am
missing a built-in option?


Regards,
Bernd


From jay at jays.net  Mon Nov  5 11:54:13 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 11:54:13 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>

On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?

If someone knows why spliced_seq() should ever sort then I'm  
suggesting we add a test demonstrating a useful example of that.

If no one has a useful example of when you would want spliced_seq()  
to sort then I'm suggesting we remove the sorting altogether and  
nosort goes away.

I can provide/add many examples where sorting is bad. I do not know  
of a case where sorting is good.

j
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Mon Nov  5 12:07:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Nov 2007 12:07:10 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>


At one point the location order was not respected/saved I believe. I  
guess we will just assume the user will build up a SplitLocation in  
order (i.e. add_SubLocation).  I'll try and remember if there were  
any other particular reasons.


-jason
On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar
>
> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>
>> Pass in (-nosort => 1) to spliced_seq:
>>
>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>
>> This ensures no sorting of sublocations occurs, if you want for
>> instance typical GenBank/EMBL 'join' behavior.
>>
>> To the other devs: shouldn't -nosort be the default behavior when
>> the split location is a 'join'?  In other words, should spliced_seq
>> () be modified to take into account the split location type when
>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>> explicitly indicates the order of the sequences is important when
>> joined together; the current behavior is more like that for 'order'.
>>
>> chris
>>
>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>
>>> Hi to all.
>>>
>>> I have a problem with a simplest script:
>>>
>>>
>>>
>>>          use Bio::SeqIO;
>>>          # get command-line arguments, or die with a usage statement
>>>          my $usage = "x2y.pl infile infileformat outfile
>>> outfileformat\n";
>>>          my $infile = shift or die $usage;
>>>          my $infileformat = shift or die $usage;
>>> #         my $outfile = shift or die $usage;
>>>          my $outfileformat = shift or die $usage;
>>>
>>>          # create one SeqIO object to read in,and another to write
>>> out
>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>                                       '-format' => $infileformat);
>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>                                        '-format' => $outfileformat);
>>>
>>>          # write each entry in the input file to the output file
>>>          while (my $inseq = $seq_in->next_seq) {
>>>
>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>> needed
>>>
>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>     {
>>>     if ($feat_object->primary_tag eq "CDS")
>>>         {
>>>         print $feat_object->get_tag_values('product'),"\n";
>>>         print
>>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>         }
>>>     }
>>>
>>>
>>>
>>> The result seems OK to me, but in case of first CDS of
>>> NC_005213.gbk from
>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>> Nanoarchaeum_equitans/> the
>>> output is wrong:
>>>
>>> It is:
>>> hypothetical protein
>>> 1..490885
>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>> sequence...................................
>>>
>>> Should be:
>>> hypothetical protein
>>> 879..490883
>>> ATGCGATTGCTATTAGAA...................................Truncated
>>> sequence....................................TAA
>>>
>>>
>>>
>>> This CDS have an unnatural location string:
>>> CDS             complement(join(490883..490885,1..879)), but
>>> spliced_seq
>>> should handle these things?
>>>
>>> Please help me!
>>> Best regards, N.
>>> _______________________________________________
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Mon Nov  5 12:16:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:16:10 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>

Yes, we would sort based on the splittype() and default to a  
particular behavior ('join') if one isn't designated, maybe with a  
warning indicating the splittype() isn't defined.  Using an 'order'  
or other defined types could also delineate a default sort/nosort  
behavior (probably the previous as it would replicate prior behavior).

chris

On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar


From cjfields at uiuc.edu  Mon Nov  5 12:20:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:20:35 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu>


On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote:

> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>
> If someone knows why spliced_seq() should ever sort then I'm
> suggesting we add a test demonstrating a useful example of that.
>
> If no one has a useful example of when you would want spliced_seq()
> to sort then I'm suggesting we remove the sorting altogether and
> nosort goes away.
>
> I can provide/add many examples where sorting is bad. I do not know
> of a case where sorting is good.
>
> j
> http://www.bioperl.org/wiki/User:Jhannah

The behavior would be based on the current use of 'join', 'order',  
and 'bond' (the latter in GenPept records).  I documented some cases  
here a while back:

http://www.bioperl.org/wiki/BioPerl_Locations#Split

chris


From hlapp at duke.edu  Mon Nov  5 12:32:24 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 12:32:24 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu>

Sounds good to me. -hilmar

On Nov 5, 2007, at 12:16 PM, Chris Fields wrote:

> Yes, we would sort based on the splittype() and default to a  
> particular behavior ('join') if one isn't designated, maybe with a  
> warning indicating the splittype() isn't defined.  Using an 'order'  
> or other defined types could also delineate a default sort/nosort  
> behavior (probably the previous as it would replicate prior behavior).
>
> chris
>
> On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 12:41:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:41:27 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
Message-ID: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>

It may have something to do with remote locations or setting strand()  
in sublocations.  This may have popped up in relation to a LocationI  
code audit I proposed a while back on the list which I never got  
around to.  Oh well...

I at least managed getting a wiki page started in case we decided to  
make changes, with the intention of making it a HOWTO at some point:

http://www.bioperl.org/wiki/BioPerl_Locations

If we go through with the changes to spliced_seq(), should it be  
implemented for inclusion in v1.6 or wait until v1.7?

chris

On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote:

>
> At one point the location order was not respected/saved I believe.  
> I guess we will just assume the user will build up a SplitLocation  
> in order (i.e. add_SubLocation).  I'll try and remember if there  
> were any other particular reasons.
>
>
> -jason
> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>>
>> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>>
>>> Pass in (-nosort => 1) to spliced_seq:
>>>
>>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>>
>>> This ensures no sorting of sublocations occurs, if you want for
>>> instance typical GenBank/EMBL 'join' behavior.
>>>
>>> To the other devs: shouldn't -nosort be the default behavior when
>>> the split location is a 'join'?  In other words, should spliced_seq
>>> () be modified to take into account the split location type when
>>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>>> explicitly indicates the order of the sequences is important when
>>> joined together; the current behavior is more like that for 'order'.
>>>
>>> chris
>>>
>>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>>
>>>> Hi to all.
>>>>
>>>> I have a problem with a simplest script:
>>>>
>>>>
>>>>
>>>>          use Bio::SeqIO;
>>>>          # get command-line arguments, or die with a usage  
>>>> statement
>>>>          my $usage = "x2y.pl infile infileformat outfile
>>>> outfileformat\n";
>>>>          my $infile = shift or die $usage;
>>>>          my $infileformat = shift or die $usage;
>>>> #         my $outfile = shift or die $usage;
>>>>          my $outfileformat = shift or die $usage;
>>>>
>>>>          # create one SeqIO object to read in,and another to write
>>>> out
>>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>>                                       '-format' => $infileformat);
>>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>>                                        '-format' =>  
>>>> $outfileformat);
>>>>
>>>>          # write each entry in the input file to the output file
>>>>          while (my $inseq = $seq_in->next_seq) {
>>>>
>>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>>> needed
>>>>
>>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>>     {
>>>>     if ($feat_object->primary_tag eq "CDS")
>>>>         {
>>>>         print $feat_object->get_tag_values('product'),"\n";
>>>>         print
>>>> $feat_object->location->start,"..",$feat_object->location- 
>>>> >end,"\n";
>>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>
>>>> The result seems OK to me, but in case of first CDS of
>>>> NC_005213.gbk from
>>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>>> Nanoarchaeum_equitans/> the
>>>> output is wrong:
>>>>
>>>> It is:
>>>> hypothetical protein
>>>> 1..490885
>>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>>> sequence...................................
>>>>
>>>> Should be:
>>>> hypothetical protein
>>>> 879..490883
>>>> ATGCGATTGCTATTAGAA...................................Truncated
>>>> sequence....................................TAA
>>>>
>>>>
>>>>
>>>> This CDS have an unnatural location string:
>>>> CDS             complement(join(490883..490885,1..879)), but
>>>> spliced_seq
>>>> should handle these things?
>>>>
>>>> Please help me!
>>>> Best regards, N.
>>>> _______________________________________________
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Mon Nov  5 11:05:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 05 Nov 2007 12:05:41 -0400
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <472ED3CC.2050305@univ-brest.fr>
Message-ID: <C354B795.10231%bosborne11@verizon.net>

Jean-luc,

>From what you written it sounds like you're using bash and not some other
shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
in your home directory, as well as a .ncbirc file. This should work.

I'm no Unix expert but I've always configured tcsh on the Mac in the same
ways I'd configure it on Linux machines. Similarly, if you're using bash
then it will read its .bashrc file, regardless of what flavor of Unix you
use (and the same thing holds true for zsh or csh or ...).

Brian O.


On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:

> Dear Bioperl and Mac users,
> 
> I am a Mac user and would like to run a script I made using
> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
> to Bioperl the pathway to Blastall and other executables.
> 
> I read carefully the following link
> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
> path to Blast, but I guess the way to proceed is slightly different in Mac and
> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
> .profile file instead of .bashrc?)
> 
> Actually, my blast file is in myname directory and comprises a /bin and  a
> /data file. I have got my blastall and other executables in
> myname/blast/bin/blastall.
> 
> Thank you in anticipation for your help.
> 
> Jean-Luc
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Nov  5 13:35:56 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 05 Nov 2007 12:35:56 -0600
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <C354B795.10231%bosborne11@verizon.net>
References: <C354B795.10231%bosborne11@verizon.net>
Message-ID: <472F628C.2000506@campus.iztacala.unam.mx>

If the ~/.bashrc file doesn't work for you, try renaming it to 
~/.bash_profile and re-login, that might work best.

~/.bashrc works as an individual per-interactive-shell startup file, 
whereas ~/.bash_profile is a personal initialization file, executed for 
login shells.

Hope this helps.

Regards,
Mauricio.


Brian Osborne wrote:
> Jean-luc,
> 
>>From what you written it sounds like you're using bash and not some other
> shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
> in your home directory, as well as a .ncbirc file. This should work.
> 
> I'm no Unix expert but I've always configured tcsh on the Mac in the same
> ways I'd configure it on Linux machines. Similarly, if you're using bash
> then it will read its .bashrc file, regardless of what flavor of Unix you
> use (and the same thing holds true for zsh or csh or ...).
> 
> Brian O.
> 
> 
> On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:
> 
>> Dear Bioperl and Mac users,
>>
>> I am a Mac user and would like to run a script I made using
>> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
>> to Bioperl the pathway to Blastall and other executables.
>>
>> I read carefully the following link
>> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
>> path to Blast, but I guess the way to proceed is slightly different in Mac and
>> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
>> .profile file instead of .bashrc?)
>>
>> Actually, my blast file is in myname directory and comprises a /bin and  a
>> /data file. I have got my blastall and other executables in
>> myname/blast/bin/blastall.
>>
>> Thank you in anticipation for your help.
>>
>> Jean-Luc
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at duke.edu  Mon Nov  5 16:04:11 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 16:04:11 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
Message-ID: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>


On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:

> If we go through with the changes to spliced_seq(), should it be  
> implemented for inclusion in v1.6 or wait until v1.7?

I would say they should be implemented ASAP because they 1) should  
not change behavior for those for which the current default behavior  
was already broken (and who therefore pass in --no_sort), and 2) fix  
the behavior for those who erroneously assumed that the code was  
going to do the right thing by default.

I.e., it sounds mostly like a bugfix to me. Am I overlooking something?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 17:12:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 16:12:23 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
	<EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu>


On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote:

>
> On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:
>
>> If we go through with the changes to spliced_seq(), should it be  
>> implemented for inclusion in v1.6 or wait until v1.7?
>
> I would say they should be implemented ASAP because they 1) should  
> not change behavior for those for which the current default  
> behavior was already broken (and who therefore pass in --no_sort),  
> and 2) fix the behavior for those who erroneously assumed that the  
> code was going to do the right thing by default.
>
> I.e., it sounds mostly like a bugfix to me. Am I overlooking  
> something?
>
> 	-hilmar
> -- 

Okay; I'll try to get this in soon.

chris


From jean-luc.jany at univ-brest.fr  Tue Nov  6 04:00:07 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Tue, 06 Nov 2007 10:00:07 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
Message-ID: <47302D17.2030500@univ-brest.fr>

Thanks Brian. Yes I use bash. I am going to follow your advice as soon 
as possible (for some reasons I am unable to run bioperl) and come back 
to you to tell you if it runs.
Jean-Luc


From jason at bioperl.org  Tue Nov  6 16:18:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 16:18:35 -0500
Subject: [Bioperl-l] lightweight sequence features
Message-ID: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>

I started a branch for implementing and playing with lightweight  
feature object. The branch is called 'lightweight_feature_branch'.

Right now it is about 70% faster just in object creation based on  
parsing features using Bio::Tools::GFF and swapping the types of  
features that are created.  It uses arrays instead of hashes under  
the hood.

So the objects don't have locations under the hood.  My hope is if  
this works okay we could use it for creating objects where we KNOW  
the underlying features have simple locations so such as parsing in  
GFF data.

-jason
--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Tue Nov  6 16:57:17 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Nov 2007 15:57:17 -0600
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>

Bravo!  I once benchmarked Location instance creation once and found  
it contributed quite a bit of overhead so the speedup with that and  
the use of arrays makes quite a bit of sense to me.

You mention only simple locations; I'm guessing this doesn't handle  
'fuzzy' ends?  If it did I could see layering the feature data from  
the get-go, so it could be used just about anywhere in the place of  
SF::Generic.  Maybe something to test out in 1.7?

chris

On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:

> I started a branch for implementing and playing with lightweight
> feature object. The branch is called 'lightweight_feature_branch'.
>
> Right now it is about 70% faster just in object creation based on
> parsing features using Bio::Tools::GFF and swapping the types of
> features that are created.  It uses arrays instead of hashes under
> the hood.
>
> So the objects don't have locations under the hood.  My hope is if
> this works okay we could use it for creating objects where we KNOW
> the underlying features have simple locations so such as parsing in
> GFF data.
>
> -jason
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Tue Nov  6 23:14:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 23:14:55 -0500
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
	<5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
Message-ID: <A021EE94-8DF8-467E-8303-E80127E3AEE2@bioperl.org>

Right - only for simple locations.  I've got a bunch more tests and  
fixes to put in.

I am hoping this can be fast replacement in the case where we're  
dealing with this "unflattened" data (i.e. GFF in FeatureIO &  
Gbrowse).  This is sort of a playground until I feel like it can  
really get  it tested a bit more.  I'll give an all clear when the  
dust settles in terms of the design if anyone wants to play/help.

-jason
On Nov 6, 2007, at 4:57 PM, Chris Fields wrote:

> Bravo!  I once benchmarked Location instance creation once and  
> found it contributed quite a bit of overhead so the speedup with  
> that and the use of arrays makes quite a bit of sense to me.
>
> You mention only simple locations; I'm guessing this doesn't handle  
> 'fuzzy' ends?  If it did I could see layering the feature data from  
> the get-go, so it could be used just about anywhere in the place of  
> SF::Generic.  Maybe something to test out in 1.7?
>
> chris
>
> On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:
>
>> I started a branch for implementing and playing with lightweight
>> feature object. The branch is called 'lightweight_feature_branch'.
>>
>> Right now it is about 70% faster just in object creation based on
>> parsing features using Bio::Tools::GFF and swapping the types of
>> features that are created.  It uses arrays instead of hashes under
>> the hood.
>>
>> So the objects don't have locations under the hood.  My hope is if
>> this works okay we could use it for creating objects where we KNOW
>> the underlying features have simple locations so such as parsing in
>> GFF data.
>>
>> -jason
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From heikki at sanbi.ac.za  Wed Nov  7 05:05:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Nov 2007 12:05:59 +0200
Subject: [Bioperl-l] Bio::Tools::Run::Mdust
Message-ID: <200711071205.59576.heikki@sanbi.ac.za>

Hi Donald,

I started using your Mdust module in bioperl-run and run into problems 
immediately.

* Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects,
  although the docs say otherwise
* Sequences are modified in place. That is really bad, because that 
  means that the user has to know to create a copy before 
  running Mdust on it.
* The docs say that you have to set MDUSTDIR envvar to tell the program 
  where to find the binary. That is actually optional if the 
  binary is on your path.
* The tests do not cover any of the options to the program


As a quick fix, I suggest that we:

* leave the current way of working for Bio::SeqI objects:
  sequence string is not masked but seqfeatures to that effect are added
* Modify run() to return the new masked sequence object when 
  the target is a Bio::PrimarySeqI.
* fix the documentation


After that it will be possible to simply write:

use Bio::Tools::Run::Mdust;
$mdust = Bio::Tools::Run::Mdust->new();
$seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI);


Are you happy for me to do this or do you want to do it yourself?


Yours,
	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    
    _/_/_/_/_/  heikki at_sanbi _ac _za    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Kevin.M.Brown at asu.edu  Wed Nov  7 13:04:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 7 Nov 2007 11:04:50 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>

I installed bioperl-ext from CVS, but can't figure out what else is
missing to utilize Bio::Tools::pSW.  The error I get from the example
script in the wiki is:

The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align)
has not been installed.
 Please read the install the bioperl-ext package

BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
Compilation failed in require at ./align_test.pl line 3.
BEGIN failed--compilation aborted at ./align_test.pl line 3.

In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
Align, but no Align.pm file.

I followed the directions in the wiki to install 1.5.2_102 (think I had
_100 installed previously).  Any thoughts on what I'm missing?


From jason at bioperl.org  Wed Nov  7 14:52:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 14:52:16 -0500
Subject: [Bioperl-l] (no subject)
Message-ID: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>

The array-based Bio::SeqFeature::Slim is only about 7% faster than  
Bio::Graphics::Feature so I suspect most of the speedup comes from  
removing location objects.

Generic     6.75        --      -37%      -41%
GraphicsF   4.26       58%        --       -7%
Slim        3.98       70%        7%        --

this is using code on the lightweight_feature_branch so
cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r  
lightweight_feature_branch -d core_lwf bioperl-live

http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl
and the GFF3 file I used to parse
http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2

-jason


From lstein at cshl.edu  Wed Nov  7 15:04:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Nov 2007 15:04:24 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>

I wonder if it is worth moving to the array-based version more generally,
then.

How does the array based feature object deal with tags?

Lincoln

On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:

> The array-based Bio::SeqFeature::Slim is only about 7% faster than
> Bio::Graphics::Feature so I suspect most of the speedup comes from removing
> location objects.
>
> Generic     6.75        --      -37%      -41%
> GraphicsF   4.26       58%        --       -7%
> Slim        3.98       70%        7%        --
>
> this is using code on the lightweight_feature_branch so
> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
> lightweight_feature_branch -d core_lwf bioperl-live
>
> http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/seqfeature_speed.pl>
> and the GFF3 file I used to parse
> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>
> -jason
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason at bioperl.org  Wed Nov  7 15:09:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 15:09:35 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>

It uses hashes there so technically it is not entirely array based.

-jason
On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:

> I wonder if it is worth moving to the array-based version more  
> generally,
> then.
>
> How does the array based feature object deal with tags?
>
> Lincoln
>
> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>
>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>> Bio::Graphics::Feature so I suspect most of the speedup comes from  
>> removing
>> location objects.
>>
>> Generic     6.75        --      -37%      -41%
>> GraphicsF   4.26       58%        --       -7%
>> Slim        3.98       70%        7%        --
>>
>> this is using code on the lightweight_feature_branch so
>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>> lightweight_feature_branch -d core_lwf bioperl-live
>>
>> http://jason.open-bio.org/~jason/bioperl/ 
>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/ 
>> seqfeature_speed.pl>
>> and the GFF3 file I used to parse
>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http:// 
>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>
>> -jason
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Nov  7 16:12:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 15:12:35 -0600
Subject: [Bioperl-l] (no subject)
In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>

I can see preferring a lightweight simple SF over SF::Generic in the  
next BioPerl dev cycle.  I guess we would just layer split locations  
as simple sub-features/segments, typing when necessary?  That  
shouldn't be much more overhead than creating a layered Location::Split.

chris

On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:

> It uses hashes there so technically it is not entirely array based.
>
> -jason
> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>
>> I wonder if it is worth moving to the array-based version more
>> generally,
>> then.
>>
>> How does the array based feature object deal with tags?
>>
>> Lincoln
>>
>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>
>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>> removing
>>> location objects.
>>>
>>> Generic     6.75        --      -37%      -41%
>>> GraphicsF   4.26       58%        --       -7%
>>> Slim        3.98       70%        7%        --
>>>
>>> this is using code on the lightweight_feature_branch so
>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>
>>> http://jason.open-bio.org/~jason/bioperl/
>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>> seqfeature_speed.pl>
>>> and the GFF3 file I used to parse
>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>
>>> -jason
>>>
>>
>>
>>
>> -- 
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Wed Nov  7 18:19:15 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Nov 2007 18:19:15 -0500
Subject: [Bioperl-l] lightweight features
In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
Message-ID: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>

It seems to me that there are applications where you're dealing with  
a huge number of features (such as GFF) and where therefore a  
lightweight object makes tremendous sense. But when you parse a  
genbank file, I'm not sure that's the bottleneck, unless maybe it's a  
large contig with lots of feature annotations.

I guess we'll ultimately want a way to control the type of feature  
being instantiated by a parser, e..g using a factory.

	-hilmar

On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:

> I can see preferring a lightweight simple SF over SF::Generic in the
> next BioPerl dev cycle.  I guess we would just layer split locations
> as simple sub-features/segments, typing when necessary?  That
> shouldn't be much more overhead than creating a layered  
> Location::Split.
>
> chris
>
> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>
>> It uses hashes there so technically it is not entirely array based.
>>
>> -jason
>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>
>>> I wonder if it is worth moving to the array-based version more
>>> generally,
>>> then.
>>>
>>> How does the array based feature object deal with tags?
>>>
>>> Lincoln
>>>
>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>
>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>> removing
>>>> location objects.
>>>>
>>>> Generic     6.75        --      -37%      -41%
>>>> GraphicsF   4.26       58%        --       -7%
>>>> Slim        3.98       70%        7%        --
>>>>
>>>> this is using code on the lightweight_feature_branch so
>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>
>>>> http://jason.open-bio.org/~jason/bioperl/
>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>> seqfeature_speed.pl>
>>>> and the GFF3 file I used to parse
>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>
>>>> -jason
>>>>
>>>
>>>
>>>
>>> -- 
>>> Lincoln D. Stein
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724
>>> (516) 367-8380 (voice)
>>> (516) 367-8389 (fax)
>>> FOR URGENT MESSAGES & SCHEDULING,
>>> PLEASE CONTACT MY ASSISTANT,
>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Nov  7 20:04:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 19:04:05 -0600
Subject: [Bioperl-l] lightweight features
In-Reply-To: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
	<D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
Message-ID: <E541C60D-6741-4923-A71D-E14CE6FE176D@uiuc.edu>

I'm also thinking a factory is a good possibility; maybe something to  
take the place of FTHelper.

chris

On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote:

> It seems to me that there are applications where you're dealing with
> a huge number of features (such as GFF) and where therefore a
> lightweight object makes tremendous sense. But when you parse a
> genbank file, I'm not sure that's the bottleneck, unless maybe it's a
> large contig with lots of feature annotations.
>
> I guess we'll ultimately want a way to control the type of feature
> being instantiated by a parser, e..g using a factory.
>
> 	-hilmar
>
> On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:
>
>> I can see preferring a lightweight simple SF over SF::Generic in the
>> next BioPerl dev cycle.  I guess we would just layer split locations
>> as simple sub-features/segments, typing when necessary?  That
>> shouldn't be much more overhead than creating a layered
>> Location::Split.
>>
>> chris
>>
>> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>>
>>> It uses hashes there so technically it is not entirely array based.
>>>
>>> -jason
>>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>>
>>>> I wonder if it is worth moving to the array-based version more
>>>> generally,
>>>> then.
>>>>
>>>> How does the array based feature object deal with tags?
>>>>
>>>> Lincoln
>>>>
>>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>>
>>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>>> removing
>>>>> location objects.
>>>>>
>>>>> Generic     6.75        --      -37%      -41%
>>>>> GraphicsF   4.26       58%        --       -7%
>>>>> Slim        3.98       70%        7%        --
>>>>>
>>>>> this is using code on the lightweight_feature_branch so
>>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl  
>>>>> co -r
>>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>>
>>>>> http://jason.open-bio.org/~jason/bioperl/
>>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>>> seqfeature_speed.pl>
>>>>> and the GFF3 file I used to parse
>>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>>
>>>>> -jason
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Lincoln D. Stein
>>>> Cold Spring Harbor Laboratory
>>>> 1 Bungtown Road
>>>> Cold Spring Harbor, NY 11724
>>>> (516) 367-8380 (voice)
>>>> (516) 367-8389 (fax)
>>>> FOR URGENT MESSAGES & SCHEDULING,
>>>> PLEASE CONTACT MY ASSISTANT,
>>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov  7 23:45:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 22:45:26 -0600
Subject: [Bioperl-l] test please ignore
Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>


From cjfields at uiuc.edu  Thu Nov  8 10:50:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Nov 2007 09:50:02 -0600
Subject: [Bioperl-l] test please ignore
In-Reply-To: <47332534.5090205@bms.com>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
	<47332534.5090205@bms.com>
Message-ID: <D0ADF51D-92BE-4645-BB1C-564536732368@uiuc.edu>

And respond back!  Just checking the mail list; the open-bio wiki  
pages were down last night.

chris

On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote:

> Chris Fields wrote:
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> This is the best way to make everyone open this e-mail ;-)
> Stefan


From stefan.kirov at bms.com  Thu Nov  8 10:03:16 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 08 Nov 2007 10:03:16 -0500
Subject: [Bioperl-l] test please ignore
In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
Message-ID: <47332534.5090205@bms.com>

Chris Fields wrote:
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
This is the best way to make everyone open this e-mail ;-)
Stefan


From Kevin.M.Brown at asu.edu  Thu Nov  8 17:30:24 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Nov 2007 15:30:24 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org>
References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>
	<20071108003638.GA5892@eniac.jgi-psf.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu>

OK, found the issue.  For whatever reason the Align.pm file is inside
the Align folder and so the package name and path don't match up once it
is installed.  This would cause it to have a name of
"Bio::Ext::Align::Align" instead of "Bio::Ext::Align".  Not sure why
this wasn't caught when I did "perl Makefile.pl && make && make test &&
make install" 

> -----Original Message-----
> From: Joel Martin [mailto:j_martin at lbl.gov] 
> Sent: Wednesday, November 07, 2007 5:37 PM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Bio::Ext::Align?
> 
> Hello,
>     Might be a side effect of fixing the other bioperl-ext package, 
> what steps exactly did this entail:
> 
> > I installed bioperl-ext from CVS, 
> 
> ?
> 
> you can probably bypass it at the moment by doing this after 
> unpacking the
> bioperl-ext package 
> 
> cd Bio/Ext/Align
> perl Makefile.PL
> make
> make test
> make install
> 
> and
> 
> cd Bio/Ext/HMM
> perl Makefile.PL
> make 
> make test
> make install
> 
> Joel
> 
> but can't figure out what else is
> > missing to utilize Bio::Tools::pSW.  The error I get from 
> the example
> > script in the wiki is:
> > 
> > The C-compiled engine for Smith Waterman alignments 
> (Bio::Ext::Align)
> > has not been installed.
> >  Please read the install the bioperl-ext package
> > 
> > BEGIN failed--compilation aborted at
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
> > Compilation failed in require at ./align_test.pl line 3.
> > BEGIN failed--compilation aborted at ./align_test.pl line 3.
> > 
> > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
> > Align, but no Align.pm file.
> > 
> > I followed the directions in the wiki to install 1.5.2_102 
> (think I had
> > _100 installed previously).  Any thoughts on what I'm missing?
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From akarger at CGR.Harvard.edu  Fri Nov  9 09:53:02 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 9 Nov 2007 09:53:02 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
Message-ID: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>

When I tblastn ENSP00000349467 against the human genome, I get a few
hits on chr10, among which are:


 Score =  192 bits (487), Expect(2) = 5e-64
 Identities = 99/109 (90%), Positives = 99/109 (90%)
 Frame = +2

Query: 40
LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99
                L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F
VFDKDGNG
Sbjct: 71593562
LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741

Query: 100      YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148
                YIS  EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA
Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA
71593885


 Score = 75.1 bits (183), Expect(2) = 5e-64
 Identities = 36/43 (83%), Positives = 39/43 (90%)
 Frame = +1

Query: 1        MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43
                MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS  ++
Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575


As you can see from Sbjct lines, these two hits are basically
contiguous.
I was surprised to see that the bit scores and identities and alignment
lengths here are totally different but the expectation values are
identical. 

After a bit of grepping in the BLAST source, I found reference to "sum
segments" and "a collection [of] multiple distinct alignments with
asymmetric gaps between the alignments" and decided it was time to cry
for help. When does BLAST decide that two or more alignments belong
"together" and how does the affect the evalue? Is the evalue really
showing how good those two alignments combined are, despite the frame
shift? (It so happens that that's what I want.)

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

Thanks,

- Amir Karger
Research Computing
Life Sciences Division
Harvard University


From cjfields at uiuc.edu  Fri Nov  9 12:58:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Nov 2007 11:58:16 -0600
Subject: [Bioperl-l] GFF3loader and indexing
Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu>

Quick question: shouldn't the new Index attribute be passed on to  
seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping  
purposes (for instance, properly reloading dumped gff3 data)?  I'm  
testing out a feature editor using volvox.gff3 data in GBrowse and  
the mRNA features appear to drop this attribute once loaded:

Original data:

ctgA	example	gene	1050	9000	.	+	.	ID=EDEN;Name=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	ID=EDEN.1;Parent=EDEN;Name=EDEN. 
1;Note=Eden splice form 1;Index=1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=EDEN.1

partial gff3_string(1) output:

ctgA	example	gene	1050	9000	.	+	.	 
Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	Name=EDEN. 
1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=51;ID=52
...

chris


From David.Messina at sbc.su.se  Sat Nov 10 06:04:25 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 10 Nov 2007 12:04:25 +0100
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>

Hi Amir,

I don't have my BLAST book handy, and my memory is a little fuzzy, but I
think the Expect(2) you're seeing is the E-value based on both HSPs
combined. And I think this is why you see the same Expect value for both --
because it is shared between them (which sounds like what you wanted).

Again, this is just from memory, but I think this is an option that has to
be turned on rather than something which Blast decides to do on its own.


I don't know whether BioPerl reports this or not. Would you mind e-mailing
me a entire BLAST report as a sample? When I have some time I'd like to play
around with this a bit.

Thanks,
Dave


From sac at bioperl.org  Sat Nov 10 17:59:28 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Nov 2007 14:59:28 -0800
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>

The Bioperl blast parser should extract that value and you can obtain
it from an HSP object, via the HSPI::n() method, documented here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23

Dave's basically correct in his explanation. It's a result of the
application of sum statistics by the blast algorithm. You can read all
about it in Korf et al's BLAST book. Here's the relevant section:

http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1

Steve

On Nov 10, 2007 3:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Amir,
>
> I don't have my BLAST book handy, and my memory is a little fuzzy, but I
> think the Expect(2) you're seeing is the E-value based on both HSPs
> combined. And I think this is why you see the same Expect value for both --
> because it is shared between them (which sounds like what you wanted).
>
> Again, this is just from memory, but I think this is an option that has to
> be turned on rather than something which Blast decides to do on its own.
>
>
> I don't know whether BioPerl reports this or not. Would you mind e-mailing
> me a entire BLAST report as a sample? When I have some time I'd like to play
> around with this a bit.
>
> Thanks,
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bernd.web at gmail.com  Tue Nov 13 06:57:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 13 Nov 2007 12:57:04 +0100
Subject: [Bioperl-l] Panel link
Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com>

Hi,

Is it possible with Panel to provide javascript event handlers?
With -link we can provide hrefs as:
  -link => 'http://www.google.com/search?q=$description'
or use a coderef that returns a href.

However, I'd like to set-up links as:
<area .... href="#id" onmouseover="function()" onmouseout="function()">

Is this possible by default with Panel?

Regards,
Bernd


From akarger at CGR.Harvard.edu  Tue Nov 13 12:12:32 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:12:32 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A0165@huls5.nucleus.harvard.edu>

Thanks for the reply. I'm curious as to how BLAST decides to do this,
but not curious enough to buy the BLAST book.

If you want to see this, you could just tblastn the ENSP00000349467
sequence vs. the genome:
 
MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG
NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE
EVDEMIREADIDGDGQVNYEEFVQMMTAK
against the human genome at NCBI or locally.
 
I've attached the tblastn report for that protein, which includes the
results I quoted. (It was done as part of a blast of 150 proteins vs.
the genome.)
 
-Amir


________________________________

	From: dave at davemessina.com [mailto:dave at davemessina.com] On
Behalf Of Dave Messina
	Sent: Saturday, November 10, 2007 6:04 AM
	To: Amir Karger
	Cc: bioperl-l
	Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast
result?
	
	
	Hi Amir,

	I don't have my BLAST book handy, and my memory is a little
fuzzy, but I think the Expect(2) you're seeing is the E-value based on
both HSPs combined. And I think this is why you see the same Expect
value for both -- because it is shared between them (which sounds like
what you wanted). 

	Again, this is just from memory, but I think this is an option
that has to be turned on rather than something which Blast decides to do
on its own.

	 
	I don't know whether BioPerl reports this or not. Would you mind
e-mailing me a entire BLAST report as a sample? When I have some time
I'd like to play around with this a bit.

	Thanks,
	Dave


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ENSP00000349467_tblastn.txt.gz
Type: application/x-gzip
Size: 9755 bytes
Desc: ENSP00000349467_tblastn.txt.gz
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071113/f8853e76/attachment-0002.gz>

From akarger at CGR.Harvard.edu  Tue Nov 13 12:30:52 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:30:52 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>

> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf 
> Of Steve Chervitz
> 
> The Bioperl blast parser should extract that value and you can obtain
> it from an HSP object, via the HSPI::n() method, documented here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
io/Search/HSP/HSPI.html#POD23

As I mentioned in my email:

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

And the docs for n() actually say, "This value is not defined with NCBI
Blast2 with gapping" although they don't say why. Which may explain why,
when I ran the following code on the blast result I included in my last
email, I got empty values for all of the n's. (Why is n() undefined for
gapped blast if I'm getting n's in my results from that blast?)

use warnings;
use strict;
use Bio::SearchIO;

my $blast_out = $ARGV[0];
my $in = new Bio::SearchIO(-format => 'blast',
                            -file   => $blast_out,
                            -report_type => 'tblastn');

print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
Evalue)), "\n";
while(my $query = $in->next_result) {
    while(my $subject = $query->next_hit) {
        while (my $hsp = $subject->next_hsp) {
            print join("\t",
                $query->query_name,
                $hsp->start("query"),
                $hsp->end("query"),
                $hsp->strand("hit"),
                $subject->name,
                $hsp->start("hit"),
                $hsp->end("hit"),
                $subject->frame,
                $hsp->n,
                $hsp->evalue,
            ),"\n";
        }
    }
}

> Dave's basically correct in his explanation. It's a result of the
> application of sum statistics by the blast algorithm. You can read all
> about it in Korf et al's BLAST book. Here's the relevant section:

[snip]

Thanks,

-Amir


From cjfields at uiuc.edu  Tue Nov 13 12:42:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Nov 2007 11:42:07 -0600
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
	<B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>

Amir,

Can you file this as a bug?  Dave mentioned he would look into it but  
I think it warrants tracking to make sure it gets fixed:

http://www.bioperl.org/wiki/Bugs

Attach the example BLAST report from your last post to the report.   
BTW, I wonder how this appears in XML output?

chris

On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:

>> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
>> Of Steve Chervitz
>>
>> The Bioperl blast parser should extract that value and you can obtain
>> it from an HSP object, via the HSPI::n() method, documented here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/Search/HSP/HSPI.html#POD23
>
> As I mentioned in my email:
>
> And does anyone know off-hand if Bioperl will tell me when situations
> like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> subroutine
> would help, but I just get a bunch of empty strings for that,  
> whether or
> not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> {"_n"} is
> undef.)
>
> And the docs for n() actually say, "This value is not defined with  
> NCBI
> Blast2 with gapping" although they don't say why. Which may explain  
> why,
> when I ran the following code on the blast result I included in my  
> last
> email, I got empty values for all of the n's. (Why is n() undefined  
> for
> gapped blast if I'm getting n's in my results from that blast?)
>
> use warnings;
> use strict;
> use Bio::SearchIO;
>
> my $blast_out = $ARGV[0];
> my $in = new Bio::SearchIO(-format => 'blast',
>                             -file   => $blast_out,
>                             -report_type => 'tblastn');
>
> print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
> Evalue)), "\n";
> while(my $query = $in->next_result) {
>     while(my $subject = $query->next_hit) {
>         while (my $hsp = $subject->next_hsp) {
>             print join("\t",
>                 $query->query_name,
>                 $hsp->start("query"),
>                 $hsp->end("query"),
>                 $hsp->strand("hit"),
>                 $subject->name,
>                 $hsp->start("hit"),
>                 $hsp->end("hit"),
>                 $subject->frame,
>                 $hsp->n,
>                 $hsp->evalue,
>             ),"\n";
>         }
>     }
> }
>
>> Dave's basically correct in his explanation. It's a result of the
>> application of sum statistics by the blast algorithm. You can read  
>> all
>> about it in Korf et al's BLAST book. Here's the relevant section:
>
> [snip]
>
> Thanks,
>
> -Amir
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lskatz at gatech.edu  Tue Nov 13 20:27:45 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Tue, 13 Nov 2007 20:27:45 -0500
Subject: [Bioperl-l] chromatogram
Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>

Hi,
I would like to know how to draw a chromatogram file.  Does anyone
have any sample code where you read in an scf file and create a jpeg
or other image file?
For that matter, I want to be able to customize these images with base
calls if possible.  I really appreciate the help, so thanks!

-- 
Lee Katz


From mvrmakam at yahoo.com  Wed Nov 14 04:52:13 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST)
Subject: [Bioperl-l] Installing Bioperl on Windows XP
Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com>

Hi,

I am encountering a problem while installing Bioperl on Windows XP.  I have installed ActivePerl version 5.8.8.822.  I am using Perl Package Manager GUI.  Also, I am following the instructions outlined for installing Bioperl on Windows.  I am getting an error.  The error is as follows:

 Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com')

I do not know how to overcome this problem.  The other issue is when I type bioperl in the search box I do not see any packages of bioperl.  I do not know what the problem is.  If anyone of you could guide me through the installation process I would appreciate it.

Thanks,

Roshan


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/


From cjfields at uiuc.edu  Wed Nov 14 09:02:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Nov 2007 08:02:05 -0600
Subject: [Bioperl-l] Installing Bioperl on Windows XP
In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com>
References: <235423.72586.qm@web33703.mail.mud.yahoo.com>
Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu>

The instructions are pretty specific:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Note the section on adding new repositories.  As for the PPM  
connection error, it's more than likely an error with the default  
address but it isn't bioperl-related; maybe answers lie here:

http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- 
faq2.html#ppm_repositories

chris

On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote:

> Hi,
>
> I am encountering a problem while installing Bioperl on Windows  
> XP.  I have installed ActivePerl version 5.8.8.822.  I am using  
> Perl Package Manager GUI.  Also, I am following the instructions  
> outlined for installing Bioperl on Windows.  I am getting an  
> error.  The error is as follows:
>
>  Downloading ActiveState Package Repository packlist ... failed 500  
> Can't connect to ppm4.activestate.com:80 (Bad hostname  
> 'ppm4.activestate.com')
>
> I do not know how to overcome this problem.  The other issue is  
> when I type bioperl in the search box I do not see any packages of  
> bioperl.  I do not know what the problem is.  If anyone of you  
> could guide me through the installation process I would appreciate it.
>
> Thanks,
>
> Roshan


From reshetovdenis at gmail.com  Wed Nov 14 12:28:40 2007
From: reshetovdenis at gmail.com (Denis Reshetov)
Date: Wed, 14 Nov 2007 20:28:40 +0300
Subject: [Bioperl-l] how to load all genomes
Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>

Dear BioPerl-db Creators,

I`m trying to load all genomes from NCBI ftp site
to my BioSql database using common script load_seqdatabase.pl

But it seems very slow. Let me know what is the better way to do it?

Thank you very much,

Denis.


From barry.moore at genetics.utah.edu  Wed Nov 14 14:18:29 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 14 Nov 2007 12:18:29 -0700
Subject: [Bioperl-l] how to load all genomes
In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu>

Denis,

You might be interested in this thread from a couple years ago.  I  
was having a similar problem, that I eventually resolved.   
Unfortunately the reason for the problem and the solution weren't  
entirely clear, but you may be able to glean some ideas from it.   
Also, you may have already done this, but I suggest searching the  
archives from this list because it seems like this comes up every now  
and then, so there may be other postings similar to the one I'm  
sending you that could help you.

http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html

Finally, if you are still having problems, you'll want to include a  
few more details about your situation.  What DB are you using, have  
you preloaded taxonomy data etc. How fast/slow are your sequences  
loading?

Barry

On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote:

> Dear BioPerl-db Creators,
>
> I`m trying to load all genomes from NCBI ftp site
> to my BioSql database using common script load_seqdatabase.pl
>
> But it seems very slow. Let me know what is the better way to do it?
>
> Thank you very much,
>
> Denis.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 14:57:49 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 08:57:49 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>

Here's my trace viewer.
Please excuse my dodgy Perl and debugging code as it's still under
development  :-)


Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz


------------------------------------------------------------------------
------------------

#!perl -w
use ABI;

use GD::Graph::lines;
use GD::Graph::colour;
use GD::Graph::Data;

use Data::Dumper;


use Getopt::Long;

use constant HEIGHT => 300;

GetOptions ('h|height=i' => \$HEIGHT,
            'f|file=s' => \$FILE,
            'o|out=s' => \$OUTFILE,
            'l|left=s' => \$LEFT_SEQ,
            'r|right=s' => \$RIGHT_SEQ,
            's|size=i' => \$SIZE,
            ) || die <<USAGE;
Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
test2.png -l actacgtacgta -r atgatcgtacgtac
or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
--out test2.png --left actacgtacgta --right atgatcgtacgtac

Options:
--height <pixels> Set height of image (${\HEIGHT} pixels default)
--file <trace file-name> Filename for the ABI trace file
--out <output file-name> Filename for the generated .png image
--left <left end sequence>
--right <right end sequence>
--size <size of clipped fasta sequence>

Parse an ABI trace file and render a PNG image.
See http://search.cpan.org/dist/ABI/ABI.pm
    or
    http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
USAGE

my $height = $HEIGHT || HEIGHT;
my $file = $FILE;
my $outfile = $OUTFILE;

my $abi = ABI->new(-file=> $file);

my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"

my @base_calls = $abi->get_base_calls(); # Get the base calls
my $sequence =$abi->get_sequence();
@bp = split(//, $sequence);


# iterate over array
$size = $abi->get_trace_length();
for ($i=0,$count = 0; $i<$size; $i++) {
     if(grep(/\b$i\b/, @base_calls)){
       $bases[$i] = $bp[$count];
       $count++;
     }else{
       $bases[$i] = ' ';
     }
}

# create the data. see GD::Graph::Data for details of the format
my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );

my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
   $graph->set(
   title => $abi->get_sample_name(),
#	y_max_value => $abi->get_max_trace() + 50,
	x_max_value => $abi->get_trace_length(),
	t_margin => 5,
    b_margin => 5,
    l_margin => 5,
    r_margin => 5,
    x_ticks => 0,
    text_space => 0,
	line_width 	=> 1,
	transparent	=> 0,
	b_margin => 30,
	t_margin => 35,
	x_plot_values => 0,
	interlaced => 1,
);

# allocate some colors for drawing the bases
#use colors same as Chromas
$graph->set( dclrs => [ qw( green blue black red pink) ] );

#plot the data
my $gd = $graph->plot(\@data);

$black = $gd->colorAllocate(0,0,0);       # A
$blue = $gd->colorAllocate(0,0,255);      # C
$red = $gd->colorAllocate(255,0,0);       # G
$green = $gd->colorAllocate(0,255,0);     # T
$magenta =$gd->colorAllocate(255,0,255);  # N
$white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
$gray = $gd->colorAllocate(210,210,210);
%colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
$magenta, " ",$white);

#$start_base = index(lc($sequence),lc($LEFT_SEQ));
$start_base = find_match($sequence,$LEFT_SEQ);

#if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
$end_base = find_match($sequence,$RIGHT_SEQ, 1);
if($end_base){
 $end_base += length($RIGHT_SEQ);
}


# get the coords of the features on the image
@coords = $graph->get_hotspot(1);
$size = @coords;
$printed_num = 1;
$basecount = 0;
$numstoprint = $basecount - $start_base;

# draw the colored bases and scale at top and bottom of image
for ($i=0,$count = 0; $i<$size; $i++) {
  $c = $coords[$i];
  (undef, $xs, undef, undef, undef, undef) = @$c;
  $base = $bases[$i];
  if($base =~ /[ACGTN]/){
   if($start_base - 1 == $basecount){$start_base_coord = $xs;}
   if($end_base - 1 == $basecount){$end_base_coord = $xs;}
   if(defined($SIZE) && $start_base+$SIZE -2 ==
$basecount){$end_base_coord_by_size = $xs;}
   $basecount++;
   $numstoprint++;
   $printed_num = 0;
  }
  # print the bases top and bottom
  $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
  $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});

  # print scale
  if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
    if($LEFT_SEQ){
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }else{
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }
  }
  $top_right_corner = $xs;
}


# only draw the clipped region if the calculated size is + or - 6bp
#if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
- $SIZE >= -6 ){
# draw the clipped regions as gray
  #if LEFT_SEQ supplied and a match found
  if($LEFT_SEQ && $start_base > 0){
     $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
33,$red);
     $clipped = 1;
  }
 #if RIGHT_SEQ supplied and a match found
 if($RIGHT_SEQ && $end_base > 0){
   print join("\t", ($end_base)),"\n";
   $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
33,$gray);
   $clipped = 1;
 }
 #if no RIGHT_SEQ supplied or no match found, use left match + seq
length
 if(!$RIGHT_SEQ || $end_base < 0){
 
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
t - 33,$blue);
  $clipped = 1;
 }
 

# set height based on max trace within clipped region
   $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);

  # need to re-plot the data over the grayed out area
  $graph->plot(\@data) if $clipped;
  $gd->filledRectangle(0,0,$top_right_corner,33,$white);

#}

#print the graph
open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
binmode OUT;
print OUT $gd->png;
close OUT;


sub find_match{
  my ($sequence,$query,$last) = @_;
  return -1 if length($query) < 6;
  my($odds, $evens, $ones, $twos, $threes, $match_pos);
    # try exact match
    $match_pos = do_regex($query, $sequence,$last); return $match_pos if
$match_pos > 0;

    # try matching every second base starting from the second base e.g.
it will be .C.T.C.G.etc
    map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
($query=~m/(\w\w)/g);
    $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
if $match_pos > 0;

    # try matching every third base starting from the first base e.g. it
will be C..T..G..T etc
    map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
$threes.="..$3"} ($query =~m/(\w\w\w)/g);
    $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($threes, $sequence,$last); return $match_pos
if $match_pos > 0;

     # not found
     return -1;
}

sub do_regex(){
	my ($query,$sequence,$last)= @_;
    #print "trying $query \n";
    my $result = -1;
      $result = pos($sequence)-length($query)+1 if $last && ($sequence
=~ m/.*($query)/ig);
      $result = pos($sequence)-length($query)+1 if($sequence =~
m/.*?($query)/ig);
    return $result;
}

------------------------------------------------------------------------
------------------

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 15:47:20 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 15:47:20 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
Message-ID: <473B5ED8.1090201@mail.nih.gov>

I guess you need chromatogram from SCF. I can't help in that. ABI.pm is 
not in Bioperl distribution. But to make the record straight, you can 
use one step chromatogram drawing in SVG from ABI file using my BioSVG
module, available at:

http://www.bioinformatics.org/~malay/biosvg/

Malay


Smithies, Russell wrote:
> Here's my trace viewer.
> Please excuse my dodgy Perl and debugging code as it's still under
> development  :-)
> 
> 
> Russell Smithies
> 
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> 
> Invermay  Research Centre
> Puddle Alley, 
> Mosgiel, 
> New Zealand
> T  +64 3 489 3809   
> F  +64 3 489 9174  
> www.agresearch.co.nz
> 
> 
> ------------------------------------------------------------------------
> ------------------
> 
> #!perl -w
> use ABI;
> 
> use GD::Graph::lines;
> use GD::Graph::colour;
> use GD::Graph::Data;
> 
> use Data::Dumper;
> 
> 
> use Getopt::Long;
> 
> use constant HEIGHT => 300;
> 
> GetOptions ('h|height=i' => \$HEIGHT,
>             'f|file=s' => \$FILE,
>             'o|out=s' => \$OUTFILE,
>             'l|left=s' => \$LEFT_SEQ,
>             'r|right=s' => \$RIGHT_SEQ,
>             's|size=i' => \$SIZE,
>             ) || die <<USAGE;
> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> test2.png -l actacgtacgta -r atgatcgtacgtac
> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> --out test2.png --left actacgtacgta --right atgatcgtacgtac
> 
> Options:
> --height <pixels> Set height of image (${\HEIGHT} pixels default)
> --file <trace file-name> Filename for the ABI trace file
> --out <output file-name> Filename for the generated .png image
> --left <left end sequence>
> --right <right end sequence>
> --size <size of clipped fasta sequence>
> 
> Parse an ABI trace file and render a PNG image.
> See http://search.cpan.org/dist/ABI/ABI.pm
>     or
>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> USAGE
> 
> my $height = $HEIGHT || HEIGHT;
> my $file = $FILE;
> my $outfile = $OUTFILE;
> 
> my $abi = ABI->new(-file=> $file);
> 
> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> 
> my @base_calls = $abi->get_base_calls(); # Get the base calls
> my $sequence =$abi->get_sequence();
> @bp = split(//, $sequence);
> 
> 
> 
> # iterate over array
> $size = $abi->get_trace_length();
> for ($i=0,$count = 0; $i<$size; $i++) {
>      if(grep(/\b$i\b/, @base_calls)){
>        $bases[$i] = $bp[$count];
>        $count++;
>      }else{
>        $bases[$i] = ' ';
>      }
> }
> 
> # create the data. see GD::Graph::Data for details of the format
> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> 
> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>    $graph->set(
>    title => $abi->get_sample_name(),
> #	y_max_value => $abi->get_max_trace() + 50,
> 	x_max_value => $abi->get_trace_length(),
> 	t_margin => 5,
>     b_margin => 5,
>     l_margin => 5,
>     r_margin => 5,
>     x_ticks => 0,
>     text_space => 0,
> 	line_width 	=> 1,
> 	transparent	=> 0,
> 	b_margin => 30,
> 	t_margin => 35,
> 	x_plot_values => 0,
> 	interlaced => 1,
> );
> 
> # allocate some colors for drawing the bases
> #use colors same as Chromas
> $graph->set( dclrs => [ qw( green blue black red pink) ] );
> 
> #plot the data
> my $gd = $graph->plot(\@data);
> 
> $black = $gd->colorAllocate(0,0,0);       # A
> $blue = $gd->colorAllocate(0,0,255);      # C
> $red = $gd->colorAllocate(255,0,0);       # G
> $green = $gd->colorAllocate(0,255,0);     # T
> $magenta =$gd->colorAllocate(255,0,255);  # N
> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> $gray = $gd->colorAllocate(210,210,210);
> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> $magenta, " ",$white);
> 
> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> $start_base = find_match($sequence,$LEFT_SEQ);
> 
> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> if($end_base){
>  $end_base += length($RIGHT_SEQ);
> }
> 
> 
> # get the coords of the features on the image
> @coords = $graph->get_hotspot(1);
> $size = @coords;
> $printed_num = 1;
> $basecount = 0;
> $numstoprint = $basecount - $start_base;
> 
> # draw the colored bases and scale at top and bottom of image
> for ($i=0,$count = 0; $i<$size; $i++) {
>   $c = $coords[$i];
>   (undef, $xs, undef, undef, undef, undef) = @$c;
>   $base = $bases[$i];
>   if($base =~ /[ACGTN]/){
>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>    if(defined($SIZE) && $start_base+$SIZE -2 ==
> $basecount){$end_base_coord_by_size = $xs;}
>    $basecount++;
>    $numstoprint++;
>    $printed_num = 0;
>   }
>   # print the bases top and bottom
>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>   $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});
> 
>   # print scale
>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>     if($LEFT_SEQ){
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }else{
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }
>   }
>   $top_right_corner = $xs;
> }
> 
> 
> 
> # only draw the clipped region if the calculated size is + or - 6bp
> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
> - $SIZE >= -6 ){
> # draw the clipped regions as gray
>   #if LEFT_SEQ supplied and a match found
>   if($LEFT_SEQ && $start_base > 0){
>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> 33,$red);
>      $clipped = 1;
>   }
>  #if RIGHT_SEQ supplied and a match found
>  if($RIGHT_SEQ && $end_base > 0){
>    print join("\t", ($end_base)),"\n";
>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
> 33,$gray);
>    $clipped = 1;
>  }
>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> length
>  if(!$RIGHT_SEQ || $end_base < 0){
>  
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> t - 33,$blue);
>   $clipped = 1;
>  }
>  
> 
> 
> # set height based on max trace within clipped region
>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);
> 
>   # need to re-plot the data over the grayed out area
>   $graph->plot(\@data) if $clipped;
>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> 
> #}
> 
> #print the graph
> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> binmode OUT;
> print OUT $gd->png;
> close OUT;
> 
> 
> sub find_match{
>   my ($sequence,$query,$last) = @_;
>   return -1 if length($query) < 6;
>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>     # try exact match
>     $match_pos = do_regex($query, $sequence,$last); return $match_pos if
> $match_pos > 0;
> 
>     # try matching every second base starting from the second base e.g.
> it will be .C.T.C.G.etc
>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> ($query=~m/(\w\w)/g);
>     $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
> if $match_pos > 0;
> 
>     # try matching every third base starting from the first base e.g. it
> will be C..T..G..T etc
>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>     $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($threes, $sequence,$last); return $match_pos
> if $match_pos > 0;
> 
>      # not found
>      return -1;
> }
> 
> sub do_regex(){
> 	my ($query,$sequence,$last)= @_;
>     #print "trying $query \n";
>     my $result = -1;
>       $result = pos($sequence)-length($query)+1 if $last && ($sequence
> =~ m/.*($query)/ig);
>       $result = pos($sequence)-length($query)+1 if($sequence =~
> m/.*?($query)/ig);
>     return $result;
> }
> 
> ------------------------------------------------------------------------
> ------------------
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Lee Katz
>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] chromatogram
>>
>> Hi,
>> I would like to know how to draw a chromatogram file.  Does anyone
>> have any sample code where you read in an scf file and create a jpeg
>> or other image file?
>> For that matter, I want to be able to customize these images with base
>> calls if possible.  I really appreciate the help, so thanks!
>>
>> --
>> Lee Katz
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Malay K Basu
www.malaybasu.net


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 15:58:19 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 09:58:19 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B5ED8.1090201@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>


We try and avoid SVG at all costs as installing plugins and viewers in a
locked down corporate environment can be more trouble than it's worth
whereas generating .png images works for any browser with no extras
required.
We actually call this trace drawing code from Python which then
generates webpages with the embedded image. 
It also means we don't need to licence, install and maintain a trace
viewer like Chromas.
:-)

Russell

> -----Original Message-----
> From: Malay [mailto:mbasu at mail.nih.gov]
> Sent: Thursday, 15 November 2007 9:47 a.m.
> To: Smithies, Russell
> Cc: Lee Katz; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] chromatogram
> 
> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
is
> not in Bioperl distribution. But to make the record straight, you can
> use one step chromatogram drawing in SVG from ABI file using my BioSVG
> module, available at:
> 
> http://www.bioinformatics.org/~malay/biosvg/
> 
> Malay
> 
> 
> 
> 
> Smithies, Russell wrote:
> > Here's my trace viewer.
> > Please excuse my dodgy Perl and debugging code as it's still under
> > development  :-)
> >
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> > #!perl -w
> > use ABI;
> >
> > use GD::Graph::lines;
> > use GD::Graph::colour;
> > use GD::Graph::Data;
> >
> > use Data::Dumper;
> >
> >
> > use Getopt::Long;
> >
> > use constant HEIGHT => 300;
> >
> > GetOptions ('h|height=i' => \$HEIGHT,
> >             'f|file=s' => \$FILE,
> >             'o|out=s' => \$OUTFILE,
> >             'l|left=s' => \$LEFT_SEQ,
> >             'r|right=s' => \$RIGHT_SEQ,
> >             's|size=i' => \$SIZE,
> >             ) || die <<USAGE;
> > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> > test2.png -l actacgtacgta -r atgatcgtacgtac
> > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> > --out test2.png --left actacgtacgta --right atgatcgtacgtac
> >
> > Options:
> > --height <pixels> Set height of image (${\HEIGHT} pixels default)
> > --file <trace file-name> Filename for the ABI trace file
> > --out <output file-name> Filename for the generated .png image
> > --left <left end sequence>
> > --right <right end sequence>
> > --size <size of clipped fasta sequence>
> >
> > Parse an ABI trace file and render a PNG image.
> > See http://search.cpan.org/dist/ABI/ABI.pm
> >     or
> >     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> > USAGE
> >
> > my $height = $HEIGHT || HEIGHT;
> > my $file = $FILE;
> > my $outfile = $OUTFILE;
> >
> > my $abi = ABI->new(-file=> $file);
> >
> > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> >
> > my @base_calls = $abi->get_base_calls(); # Get the base calls
> > my $sequence =$abi->get_sequence();
> > @bp = split(//, $sequence);
> >
> >
> >
> > # iterate over array
> > $size = $abi->get_trace_length();
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >      if(grep(/\b$i\b/, @base_calls)){
> >        $bases[$i] = $bp[$count];
> >        $count++;
> >      }else{
> >        $bases[$i] = ' ';
> >      }
> > }
> >
> > # create the data. see GD::Graph::Data for details of the format
> > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> >
> > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
> >    $graph->set(
> >    title => $abi->get_sample_name(),
> > #	y_max_value => $abi->get_max_trace() + 50,
> > 	x_max_value => $abi->get_trace_length(),
> > 	t_margin => 5,
> >     b_margin => 5,
> >     l_margin => 5,
> >     r_margin => 5,
> >     x_ticks => 0,
> >     text_space => 0,
> > 	line_width 	=> 1,
> > 	transparent	=> 0,
> > 	b_margin => 30,
> > 	t_margin => 35,
> > 	x_plot_values => 0,
> > 	interlaced => 1,
> > );
> >
> > # allocate some colors for drawing the bases
> > #use colors same as Chromas
> > $graph->set( dclrs => [ qw( green blue black red pink) ] );
> >
> > #plot the data
> > my $gd = $graph->plot(\@data);
> >
> > $black = $gd->colorAllocate(0,0,0);       # A
> > $blue = $gd->colorAllocate(0,0,255);      # C
> > $red = $gd->colorAllocate(255,0,0);       # G
> > $green = $gd->colorAllocate(0,255,0);     # T
> > $magenta =$gd->colorAllocate(255,0,255);  # N
> > $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> > $gray = $gd->colorAllocate(210,210,210);
> > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> > $magenta, " ",$white);
> >
> > #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> > $start_base = find_match($sequence,$LEFT_SEQ);
> >
> > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> > $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> > if($end_base){
> >  $end_base += length($RIGHT_SEQ);
> > }
> >
> >
> > # get the coords of the features on the image
> > @coords = $graph->get_hotspot(1);
> > $size = @coords;
> > $printed_num = 1;
> > $basecount = 0;
> > $numstoprint = $basecount - $start_base;
> >
> > # draw the colored bases and scale at top and bottom of image
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >   $c = $coords[$i];
> >   (undef, $xs, undef, undef, undef, undef) = @$c;
> >   $base = $bases[$i];
> >   if($base =~ /[ACGTN]/){
> >    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
> >    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
> >    if(defined($SIZE) && $start_base+$SIZE -2 ==
> > $basecount){$end_base_coord_by_size = $xs;}
> >    $basecount++;
> >    $numstoprint++;
> >    $printed_num = 0;
> >   }
> >   # print the bases top and bottom
> >   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
> >   $gd->string(GD::Font->Small(),$xs,$height -
30,$base,$colors{$base});
> >
> >   # print scale
> >   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
> >     if($LEFT_SEQ){
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }else{
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }
> >   }
> >   $top_right_corner = $xs;
> > }
> >
> >
> >
> > # only draw the clipped region if the calculated size is + or - 6bp
> > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
$start_base)
> > - $SIZE >= -6 ){
> > # draw the clipped regions as gray
> >   #if LEFT_SEQ supplied and a match found
> >   if($LEFT_SEQ && $start_base > 0){
> >      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> > 33,$red);
> >      $clipped = 1;
> >   }
> >  #if RIGHT_SEQ supplied and a match found
> >  if($RIGHT_SEQ && $end_base > 0){
> >    print join("\t", ($end_base)),"\n";
> >    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
-
> > 33,$gray);
> >    $clipped = 1;
> >  }
> >  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> > length
> >  if(!$RIGHT_SEQ || $end_base < 0){
> >
> >
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> > t - 33,$blue);
> >   $clipped = 1;
> >  }
> >
> >
> >
> > # set height based on max trace within clipped region
> >    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
50);
> >
> >   # need to re-plot the data over the grayed out area
> >   $graph->plot(\@data) if $clipped;
> >   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> >
> > #}
> >
> > #print the graph
> > open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> > binmode OUT;
> > print OUT $gd->png;
> > close OUT;
> >
> >
> > sub find_match{
> >   my ($sequence,$query,$last) = @_;
> >   return -1 if length($query) < 6;
> >   my($odds, $evens, $ones, $twos, $threes, $match_pos);
> >     # try exact match
> >     $match_pos = do_regex($query, $sequence,$last); return
$match_pos if
> > $match_pos > 0;
> >
> >     # try matching every second base starting from the second base
e.g.
> > it will be .C.T.C.G.etc
> >     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> > ($query=~m/(\w\w)/g);
> >     $match_pos = do_regex($odds, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($evens, $sequence,$last);  return
$match_pos
> > if $match_pos > 0;
> >
> >     # try matching every third base starting from the first base
e.g. it
> > will be C..T..G..T etc
> >     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> > $threes.="..$3"} ($query =~m/(\w\w\w)/g);
> >     $match_pos = do_regex($ones, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($twos, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($threes, $sequence,$last); return
$match_pos
> > if $match_pos > 0;
> >
> >      # not found
> >      return -1;
> > }
> >
> > sub do_regex(){
> > 	my ($query,$sequence,$last)= @_;
> >     #print "trying $query \n";
> >     my $result = -1;
> >       $result = pos($sequence)-length($query)+1 if $last &&
($sequence
> > =~ m/.*($query)/ig);
> >       $result = pos($sequence)-length($query)+1 if($sequence =~
> > m/.*?($query)/ig);
> >     return $result;
> > }
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-
> >> bio.org] On Behalf Of Lee Katz
> >> Sent: Wednesday, 14 November 2007 2:28 p.m.
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] chromatogram
> >>
> >> Hi,
> >> I would like to know how to draw a chromatogram file.  Does anyone
> >> have any sample code where you read in an scf file and create a
jpeg
> >> or other image file?
> >> For that matter, I want to be able to customize these images with
base
> >> calls if possible.  I really appreciate the help, so thanks!
> >>
> >> --
> >> Lee Katz
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> =============================================================
> ==========
> > Attention: The information contained in this message and/or
attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
privileged
> > material. Any review, retransmission, dissemination or other use of,
or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> >
> =============================================================
> ==========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> --
> Malay K Basu
> www.malaybasu.net

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 16:04:25 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 16:04:25 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
Message-ID: <473B62D9.8010004@mail.nih.gov>

You don't need any plugin. Firefox natively can show most of the SVG files.

-Malay

Smithies, Russell wrote:
> We try and avoid SVG at all costs as installing plugins and viewers in a
> locked down corporate environment can be more trouble than it's worth
> whereas generating .png images works for any browser with no extras
> required.
> We actually call this trace drawing code from Python which then
> generates webpages with the embedded image. 
> It also means we don't need to licence, install and maintain a trace
> viewer like Chromas.
> :-)
> 
> Russell
> 
>> -----Original Message-----
>> From: Malay [mailto:mbasu at mail.nih.gov]
>> Sent: Thursday, 15 November 2007 9:47 a.m.
>> To: Smithies, Russell
>> Cc: Lee Katz; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] chromatogram
>>
>> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
> is
>> not in Bioperl distribution. But to make the record straight, you can
>> use one step chromatogram drawing in SVG from ABI file using my BioSVG
>> module, available at:
>>
>> http://www.bioinformatics.org/~malay/biosvg/
>>
>> Malay
>>
>>
>>
>>
>> Smithies, Russell wrote:
>>> Here's my trace viewer.
>>> Please excuse my dodgy Perl and debugging code as it's still under
>>> development  :-)
>>>
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>> #!perl -w
>>> use ABI;
>>>
>>> use GD::Graph::lines;
>>> use GD::Graph::colour;
>>> use GD::Graph::Data;
>>>
>>> use Data::Dumper;
>>>
>>>
>>> use Getopt::Long;
>>>
>>> use constant HEIGHT => 300;
>>>
>>> GetOptions ('h|height=i' => \$HEIGHT,
>>>             'f|file=s' => \$FILE,
>>>             'o|out=s' => \$OUTFILE,
>>>             'l|left=s' => \$LEFT_SEQ,
>>>             'r|right=s' => \$RIGHT_SEQ,
>>>             's|size=i' => \$SIZE,
>>>             ) || die <<USAGE;
>>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
>>> test2.png -l actacgtacgta -r atgatcgtacgtac
>>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
>>> --out test2.png --left actacgtacgta --right atgatcgtacgtac
>>>
>>> Options:
>>> --height <pixels> Set height of image (${\HEIGHT} pixels default)
>>> --file <trace file-name> Filename for the ABI trace file
>>> --out <output file-name> Filename for the generated .png image
>>> --left <left end sequence>
>>> --right <right end sequence>
>>> --size <size of clipped fasta sequence>
>>>
>>> Parse an ABI trace file and render a PNG image.
>>> See http://search.cpan.org/dist/ABI/ABI.pm
>>>     or
>>>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
>>> USAGE
>>>
>>> my $height = $HEIGHT || HEIGHT;
>>> my $file = $FILE;
>>> my $outfile = $OUTFILE;
>>>
>>> my $abi = ABI->new(-file=> $file);
>>>
>>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
>>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
>>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
>>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
>>>
>>> my @base_calls = $abi->get_base_calls(); # Get the base calls
>>> my $sequence =$abi->get_sequence();
>>> @bp = split(//, $sequence);
>>>
>>>
>>>
>>> # iterate over array
>>> $size = $abi->get_trace_length();
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>      if(grep(/\b$i\b/, @base_calls)){
>>>        $bases[$i] = $bp[$count];
>>>        $count++;
>>>      }else{
>>>        $bases[$i] = ' ';
>>>      }
>>> }
>>>
>>> # create the data. see GD::Graph::Data for details of the format
>>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
>>>
>>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>>>    $graph->set(
>>>    title => $abi->get_sample_name(),
>>> #	y_max_value => $abi->get_max_trace() + 50,
>>> 	x_max_value => $abi->get_trace_length(),
>>> 	t_margin => 5,
>>>     b_margin => 5,
>>>     l_margin => 5,
>>>     r_margin => 5,
>>>     x_ticks => 0,
>>>     text_space => 0,
>>> 	line_width 	=> 1,
>>> 	transparent	=> 0,
>>> 	b_margin => 30,
>>> 	t_margin => 35,
>>> 	x_plot_values => 0,
>>> 	interlaced => 1,
>>> );
>>>
>>> # allocate some colors for drawing the bases
>>> #use colors same as Chromas
>>> $graph->set( dclrs => [ qw( green blue black red pink) ] );
>>>
>>> #plot the data
>>> my $gd = $graph->plot(\@data);
>>>
>>> $black = $gd->colorAllocate(0,0,0);       # A
>>> $blue = $gd->colorAllocate(0,0,255);      # C
>>> $red = $gd->colorAllocate(255,0,0);       # G
>>> $green = $gd->colorAllocate(0,255,0);     # T
>>> $magenta =$gd->colorAllocate(255,0,255);  # N
>>> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
>>> $gray = $gd->colorAllocate(210,210,210);
>>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
>>> $magenta, " ",$white);
>>>
>>> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
>>> $start_base = find_match($sequence,$LEFT_SEQ);
>>>
>>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
>>> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
>>> if($end_base){
>>>  $end_base += length($RIGHT_SEQ);
>>> }
>>>
>>>
>>> # get the coords of the features on the image
>>> @coords = $graph->get_hotspot(1);
>>> $size = @coords;
>>> $printed_num = 1;
>>> $basecount = 0;
>>> $numstoprint = $basecount - $start_base;
>>>
>>> # draw the colored bases and scale at top and bottom of image
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>   $c = $coords[$i];
>>>   (undef, $xs, undef, undef, undef, undef) = @$c;
>>>   $base = $bases[$i];
>>>   if($base =~ /[ACGTN]/){
>>>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>>>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>>>    if(defined($SIZE) && $start_base+$SIZE -2 ==
>>> $basecount){$end_base_coord_by_size = $xs;}
>>>    $basecount++;
>>>    $numstoprint++;
>>>    $printed_num = 0;
>>>   }
>>>   # print the bases top and bottom
>>>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>>>   $gd->string(GD::Font->Small(),$xs,$height -
> 30,$base,$colors{$base});
>>>   # print scale
>>>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>>>     if($LEFT_SEQ){
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }else{
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }
>>>   }
>>>   $top_right_corner = $xs;
>>> }
>>>
>>>
>>>
>>> # only draw the clipped region if the calculated size is + or - 6bp
>>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
> $start_base)
>>> - $SIZE >= -6 ){
>>> # draw the clipped regions as gray
>>>   #if LEFT_SEQ supplied and a match found
>>>   if($LEFT_SEQ && $start_base > 0){
>>>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
>>> 33,$red);
>>>      $clipped = 1;
>>>   }
>>>  #if RIGHT_SEQ supplied and a match found
>>>  if($RIGHT_SEQ && $end_base > 0){
>>>    print join("\t", ($end_base)),"\n";
>>>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
> -
>>> 33,$gray);
>>>    $clipped = 1;
>>>  }
>>>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
>>> length
>>>  if(!$RIGHT_SEQ || $end_base < 0){
>>>
>>>
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
>>> t - 33,$blue);
>>>   $clipped = 1;
>>>  }
>>>
>>>
>>>
>>> # set height based on max trace within clipped region
>>>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
> 50);
>>>   # need to re-plot the data over the grayed out area
>>>   $graph->plot(\@data) if $clipped;
>>>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
>>>
>>> #}
>>>
>>> #print the graph
>>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
>>> binmode OUT;
>>> print OUT $gd->png;
>>> close OUT;
>>>
>>>
>>> sub find_match{
>>>   my ($sequence,$query,$last) = @_;
>>>   return -1 if length($query) < 6;
>>>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>>>     # try exact match
>>>     $match_pos = do_regex($query, $sequence,$last); return
> $match_pos if
>>> $match_pos > 0;
>>>
>>>     # try matching every second base starting from the second base
> e.g.
>>> it will be .C.T.C.G.etc
>>>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
>>> ($query=~m/(\w\w)/g);
>>>     $match_pos = do_regex($odds, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($evens, $sequence,$last);  return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>     # try matching every third base starting from the first base
> e.g. it
>>> will be C..T..G..T etc
>>>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
>>> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>>>     $match_pos = do_regex($ones, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($twos, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($threes, $sequence,$last); return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>      # not found
>>>      return -1;
>>> }
>>>
>>> sub do_regex(){
>>> 	my ($query,$sequence,$last)= @_;
>>>     #print "trying $query \n";
>>>     my $result = -1;
>>>       $result = pos($sequence)-length($query)+1 if $last &&
> ($sequence
>>> =~ m/.*($query)/ig);
>>>       $result = pos($sequence)-length($query)+1 if($sequence =~
>>> m/.*?($query)/ig);
>>>     return $result;
>>> }
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-
>>>> bio.org] On Behalf Of Lee Katz
>>>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] chromatogram
>>>>
>>>> Hi,
>>>> I would like to know how to draw a chromatogram file.  Does anyone
>>>> have any sample code where you read in an scf file and create a
> jpeg
>>>> or other image file?
>>>> For that matter, I want to be able to customize these images with
> base
>>>> calls if possible.  I really appreciate the help, so thanks!
>>>>
>>>> --
>>>> Lee Katz
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =============================================================
>> ==========
>>> Attention: The information contained in this message and/or
> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
> privileged
>>> material. Any review, retransmission, dissemination or other use of,
> or
>>> taking of any action in reliance upon, this information by persons
> or
>>> entities other than the intended recipients is prohibited by
> AgResearch
>>> Limited. If you have received this message in error, please notify
> the
>>> sender immediately.
>>>
>> =============================================================
>> ==========
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Malay K Basu
>> www.malaybasu.net
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================


-- 
Malay K Basu
www.malaybasu.net


From tomboy at cs.huji.ac.il  Wed Nov 14 21:43:43 2007
From: tomboy at cs.huji.ac.il (Tomer Hertz)
Date: Wed, 14 Nov 2007 18:43:43 -0800
Subject: [Bioperl-l] problems in stalling bio perl
Message-ID: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>

hi
when I try to install bioperl I get the following error message:

hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
$ perl Build.PL
Can't find file lib/Module/Build.pm to determine version at
/usr/lib/perl5/site_
perl/5.8/Module/Build/Base.pm line 950.
can you please help. I have tried reinstalling the build command and that
does not seem to help as well.

many thanks
--Tomer

-- 
--------------------------------------------------------------------------------
Tomer Hertz
Postdoctoral Researcher
Machine Learning and Applied Statistics
Microsoft Research
One Microsoft Way, Redmond, WA, 98052, USA

Homepage: www.cs.huji.ac.il/~tomboy
Email: hertz at microsoft dot com
Tel: (425)-421-8313               Fax: (425) 936-7329
--------------------------------------------------------------------------------


From lskatz at gatech.edu  Thu Nov 15 08:24:02 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Thu, 15 Nov 2007 08:24:02 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B62D9.8010004@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>

Thank you all.
Are you all sure in that there is no way to go from an scf to an image
though?  I do have abi files, but I am relying on Phred output for
base calls for other things and I want to stay consistent.  This means
that if I use the fasta files that I get from Phred in another part of
my program, I need to use the scf files it produces.

If this is not possible, do you know if drawing an scf is in the works?  Thanks.

-- 
Lee Katz
http://www.lskatz.com


From cain.cshl at gmail.com  Thu Nov 15 09:21:26 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 15 Nov 2007 09:21:26 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <1195136486.2785.12.camel@localhost.localdomain>

Hi Lee,

Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses
Bio::SCF to draw trace files onto a Bio::Graphics::Panel.  Bio::SCF is
not part of bioperl, so you have to get it from CPAN and it depends on
the Staden io-lib package, so you'll need that too.  You can get GBrowse
from http://www.gmod.org/gbrowse , and you can look at the tutorial for
more information on configuring the trace glyph.

Scott


On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote:
> Thank you all.
> Are you all sure in that there is no way to go from an scf to an image
> though?  I do have abi files, but I am relying on Phred output for
> base calls for other things and I want to stay consistent.  This means
> that if I use the fasta files that I get from Phred in another part of
> my program, I need to use the scf files it produces.
> 
> If this is not possible, do you know if drawing an scf is in the works?  Thanks.
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From bosborne11 at verizon.net  Thu Nov 15 09:18:05 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 09:18:05 -0500
Subject: [Bioperl-l] problems in stalling bio perl
In-Reply-To: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>
Message-ID: <C361BF4D.103D8%bosborne11@verizon.net>

Tomer,

Interesting. When I used Cygwin I always worked entirely within the C:
drive, it looks like you're executing the script from the E: drive. Is
Cygwin installed in C:/cygwin? You can see what I'm getting at, it's
possible that you need to set $PERL5LIB to something like
/cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say?

Brian O.


On 11/14/07 9:43 PM, "Tomer Hertz" <tomboy at cs.huji.ac.il> wrote:

> hi
> when I try to install bioperl I get the following error message:
> 
> hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
> $ perl Build.PL
> Can't find file lib/Module/Build.pm to determine version at
> /usr/lib/perl5/site_
> perl/5.8/Module/Build/Base.pm line 950.
> can you please help. I have tried reinstalling the build command and that
> does not seem to help as well.
> 
> many thanks
> --Tomer


From bernd.web at gmail.com  Thu Nov 15 10:26:42 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 16:26:42 +0100
Subject: [Bioperl-l] Graphics::Panel
Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>

Hi,

Has someone been able to access '$description' for the production of
imagemaps with Graphics::Panel?
The map below does not print the "title" tag at all, '$description'
seems not available, although for the tracks ($panel->add_track) it is
available.
$map = $panel->create_web_map($mapname, $linkrule, '$description');

Replacing '$description' with a coderef for the titletag does work, if
I use the code below
my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };


I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }


Regards,
Bernd


From luciap at sas.upenn.edu  Thu Nov 15 10:44:21 2007
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Thu, 15 Nov 2007 10:44:21 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>

Hi
I was asked this question recently
and it occurred to me I must be doing things inefficiently
To produce gff file I was using SeqIO to parse the required fields, then
according to the conventions just printing out whatever was required tab
delimited, which is easy

but if I wanted to generate a genbank file, extracting features from a gff file
and a plain fasta file it was more complicated
is there support for gff in bioperl now?
anyone can contribute with  smart way to go from/to gff, genebank and embl?

thanks very much

Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From lstein at cshl.edu  Thu Nov 15 12:38:04 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Nov 2007 12:38:04 -0500
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>

Depending on which Feature object you use, you may have to use a tag named
"note" instead of "description".

Lincoln

On Nov 15, 2007 10:26 AM, Bernd Web <bernd.web at gmail.com> wrote:

> Hi,
>
> Has someone been able to access '$description' for the production of
> imagemaps with Graphics::Panel?
> The map below does not print the "title" tag at all, '$description'
> seems not available, although for the tracks ($panel->add_track) it is
> available.
> $map = $panel->create_web_map($mapname, $linkrule, '$description');
>
> Replacing '$description' with a coderef for the titletag does work, if
> I use the code below
> my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
>
>
> I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bernd.web at gmail.com  Thu Nov 15 13:03:19 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 19:03:19 +0100
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
	<6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com>

On Nov 15, 2007 6:38 PM, Lincoln Stein <lstein at cshl.edu> wrote:
> Depending on which Feature object you use, you may have to use a tag named
> "note" instead of "description".
>
> Lincoln
>
>
>
> On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote:
> >
> >
> >
> > Hi,
> >
> > Has someone been able to access '$description' for the production of
> > imagemaps with Graphics::Panel?
> > The map below does not print the "title" tag at all, '$description'
> > seems not available, although for the tracks ($panel->add_track) it is
> > available.
> > $map = $panel->create_web_map($mapname, $linkrule, '$description');
> >
> > Replacing '$description' with a coderef for the titletag does work, if
> > I use the code below
> > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
> >
> >
> > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Thu Nov 15 13:43:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Nov 2007 12:43:02 -0600
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>

There are currently many ways to get what you want, but not all are  
consistent (particularly re: GFF3).  We are aiming for more  
consistent, compliant GFF/GTF output in the next developer series  
(1.7) of Bioperl.

You can try using bp_genbank2gff or bp_genbank2gff3 (both in the  
scripts directory); these are probably the most common way when  
working directly from a seq record.  Bio::Tools::GFF is the most  
commonly used class though I'm unsure of it's status for GFF3  
output.  From within a Bio::SeqI you can call write_gff() (currently  
not very flexible) or from the SeqFeature itself gff_string().   
Bio::Graphics::Feature has the additional method gff3_string().   
Bio::FeatureIO is also an option, though I would consider it very  
experimental (it will likely undergo significant revision in the next  
bioperl dev series).

Any others anyone can think of, maybe non-BioPerl related as well?

chris

On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:

> Hi
> I was asked this question recently
> and it occurred to me I must be doing things inefficiently
> To produce gff file I was using SeqIO to parse the required fields,  
> then
> according to the conventions just printing out whatever was  
> required tab
> delimited, which is easy
>
> but if I wanted to generate a genbank file, extracting features  
> from a gff file
> and a plain fasta file it was more complicated
> is there support for gff in bioperl now?
> anyone can contribute with  smart way to go from/to gff, genebank  
> and embl?
>
> thanks very much
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Nov 15 14:19:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 14:19:41 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>
Message-ID: <C36205FD.103EA%bosborne11@verizon.net>

Chris,

There's also a genbank2gff3.PLS script in the GMOD package (
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
revision=1.5&view=markup). However, it has not been modified for a couple of
years, it may not be the "preferred" script.

See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information
on using Bioperl's bp_genbank2gff3 script.

Brian O.


On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> There are currently many ways to get what you want, but not all are
> consistent (particularly re: GFF3).  We are aiming for more
> consistent, compliant GFF/GTF output in the next developer series
> (1.7) of Bioperl.
> 
> You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> scripts directory); these are probably the most common way when
> working directly from a seq record.  Bio::Tools::GFF is the most
> commonly used class though I'm unsure of it's status for GFF3
> output.  From within a Bio::SeqI you can call write_gff() (currently
> not very flexible) or from the SeqFeature itself gff_string().
> Bio::Graphics::Feature has the additional method gff3_string().
> Bio::FeatureIO is also an option, though I would consider it very
> experimental (it will likely undergo significant revision in the next
> bioperl dev series).
> 
> Any others anyone can think of, maybe non-BioPerl related as well?
> 
> chris
> 
> On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> 
>> Hi
>> I was asked this question recently
>> and it occurred to me I must be doing things inefficiently
>> To produce gff file I was using SeqIO to parse the required fields,
>> then
>> according to the conventions just printing out whatever was
>> required tab
>> delimited, which is easy
>> 
>> but if I wanted to generate a genbank file, extracting features
>> from a gff file
>> and a plain fasta file it was more complicated
>> is there support for gff in bioperl now?
>> anyone can contribute with  smart way to go from/to gff, genebank
>> and embl?
>> 
>> thanks very much
>> 
>> Lucia Peixoto
>> Department of Biology,SAS
>> University of Pennsylvania
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Thu Nov 15 17:31:28 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 16 Nov 2007 11:31:28 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>

Just to add to this, does anyone have any code for reading .sff 'traces'
from 454 sequences?

Thanx,

Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From torsten.seemann at infotech.monash.edu.au  Thu Nov 15 20:13:22 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 16 Nov 2007 12:13:22 +1100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
Message-ID: <a79f6a4b0711151713g26905bc6g5b19202b992f4e08@mail.gmail.com>

> Just to add to this, does anyone have any code for reading .sff 'traces'
> from 454 sequences?

The .SFF files can be manipulated using the SFF tools which 454
distribute with their result data. eg. "sffinfo 454AllContigs.sff"
will list all the reads with the original flowgram values etc.
However, the SFF tools are i386.Linux binaries, so not really a
portable solution.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From mvrmakam at yahoo.com  Thu Nov 15 22:04:55 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST)
Subject: [Bioperl-l] Problem with installing bioperl on Windows
Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com>

Hi,

I have installed Perl Package Manager ver 5.8.8.822 on windows XP.  I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View.  However, I am not able to see any packages in the view box.  Can anyone help me in this matter.

Roshan


      ____________________________________________________________________________________
Get easy, one-click access to your favorites. 
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs 


From David.Messina at sbc.su.se  Fri Nov 16 03:33:04 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 16 Nov 2007 09:33:04 +0100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com>

> If this is not possible, do you know if drawing an scf is in the
> works?  Thanks.
>


One non-BioPerl solution is 4peaks:
http://mekentosj.com/4peaks/

Mac only, but really great software. I'm also a fan of their Papers journal
article PDF library program.


Dave


From neetisomaiya at gmail.com  Mon Nov 19 01:11:49 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 19 Nov 2007 11:41:49 +0530
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>

Hi,

I am using Bio::SeqIO for parsing KEGG gene ent files.

A part of my code is

foreach my $key ( $ac->get_all_annotation_keys() )
                                {
                                        if($key eq "dblink")
                                        {
                                                my %values =
$ac->get_Annotations($key);
                                                foreach my $value (
keys(%values ))
                                                {
                                                        print "\n*****VALUE
$value*****\n";
                                                }
                                        }
                                 }

Here not all dblinks present in the actual file get parsed. For eg, in the
data below,
ENTRY       116064            CDS       H.sapiens
NAME        LRRC58
DEFINITION  leucine rich repeat containing 58
POSITION    3q13.33
MOTIF       Pfam: SdiA-regulated LRR_1
            PROSITE: LEU_RICH
DBLINKS     NCBI-GI: 153792305
            NCBI-GeneID: 116064
            HGNC: 26968
            Ensembl: ENSG00000163428
            UniProt: Q96CX6

Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE,
but doesnt give me HGNC and UniProt. For other entries it gives me other
combinations of dbs.

Can anyone help me with this. Why is this happenning? I have no clue.

Thanks and Regards,
Neeti.
-- 
-Neeti
Even my blood says, B positive


From johnston at biochem.ucl.ac.uk  Mon Nov 19 06:44:59 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT)
Subject: [Bioperl-l] blast database names
Message-ID: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>

Hello,

Is there a list of the possible database names for -data =>
$dbname in RemoteBlast somwhere?

Cheers,
Cass


From cjfields at uiuc.edu  Mon Nov 19 08:44:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 07:44:46 -0600
Subject: [Bioperl-l] blast database names
In-Reply-To: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
Message-ID: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>

Here's a recent list (don't know if it's up-to-date):

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote:

> Hello,
>
> Is there a list of the possible database names for -data =>
> $dbname in RemoteBlast somwhere?
>
> Cheers,
> Cass
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Nov 19 09:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 08:33:46 -0600
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
Message-ID: <F81EBCF4-20AD-486C-A9EC-301FE9475504@uiuc.edu>

It makes sense in the light that you're (erroneously) using a hash:

    my %values = $ac->get_Annotations($key);

This assigns key-value pairs of DBLink => DBLink; you don't see an  
error b/c the number of links happens to be even (I get 8) but you  
would if the number of links returned is odd (missing value for key  
error or something along those lines).  So when you call:

    foreach my $value (keys(%values)) {....}

you only get half of the DBLinks.  You should use an array:

    my @values = $ac->get_Annotations($key);
    foreach my $value (@values) {
       print $value->as_text,"\n";
    }

Note the loop change; Bio::Annotation are no longer operator  
overloaded so your print statement wouldn't work in a bioperl 1.6 world.

chris

On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote:

> Hi,
>
> I am using Bio::SeqIO for parsing KEGG gene ent files.
>
> A part of my code is
>
> foreach my $key ( $ac->get_all_annotation_keys() )
>                                 {
>                                         if($key eq "dblink")
>                                         {
>                                                 my %values =
> $ac->get_Annotations($key);
>                                                 foreach my $value (
> keys(%values ))
>                                                 {
>                                                         print  
> "\n*****VALUE
> $value*****\n";
>                                                 }
>                                         }
>                                  }
>
> Here not all dblinks present in the actual file get parsed. For eg,  
> in the
> data below,
> ENTRY       116064            CDS       H.sapiens
> NAME        LRRC58
> DEFINITION  leucine rich repeat containing 58
> POSITION    3q13.33
> MOTIF       Pfam: SdiA-regulated LRR_1
>             PROSITE: LEU_RICH
> DBLINKS     NCBI-GI: 153792305
>             NCBI-GeneID: 116064
>             HGNC: 26968
>             Ensembl: ENSG00000163428
>             UniProt: Q96CX6
>
> Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and  
> PROSITE,
> but doesnt give me HGNC and UniProt. For other entries it gives me  
> other
> combinations of dbs.
>
> Can anyone help me with this. Why is this happenning? I have no clue.
>
> Thanks and Regards,
> Neeti.
> -- 
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From akarger at CGR.Harvard.edu  Mon Nov 19 10:38:26 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 19 Nov 2007 10:38:26 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
References: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
	<3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E0747C64A@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> Sent: Tuesday, November 13, 2007 12:42 PM
> To: Amir Karger
> Cc: Steve Chervitz; Dave Messina; bioperl-l
> Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result?
> 
> Amir,
> 
> Can you file this as a bug?  

Done.

http://bugzilla.open-bio.org/show_bug.cgi?id=2399

> Dave mentioned he would look 
> into it but  
> I think it warrants tracking to make sure it gets fixed:
> 
> http://www.bioperl.org/wiki/Bugs
> 
> Attach the example BLAST report from your last post to the report.   
> BTW, I wonder how this appears in XML output?
> 
> chris
> 
> On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:
> 
> >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
> >> Of Steve Chervitz
> >>
> >> The Bioperl blast parser should extract that value and you 
> can obtain
> >> it from an HSP object, via the HSPI::n() method, documented here:
> >>
> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> > io/Search/HSP/HSPI.html#POD23
> >
> > As I mentioned in my email:
> >
> > And does anyone know off-hand if Bioperl will tell me when 
> situations
> > like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> > subroutine
> > would help, but I just get a bunch of empty strings for that,  
> > whether or
> > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> > {"_n"} is
> > undef.)
> >
> > And the docs for n() actually say, "This value is not defined with  
> > NCBI
> > Blast2 with gapping" although they don't say why. Which may 
> explain  
> > why,
> > when I ran the following code on the blast result I included in my  
> > last
> > email, I got empty values for all of the n's. (Why is n() 
> undefined  
> > for
> > gapped blast if I'm getting n's in my results from that blast?)
> >
> > use warnings;
> > use strict;
> > use Bio::SearchIO;
> >
> > my $blast_out = $ARGV[0];
> > my $in = new Bio::SearchIO(-format => 'blast',
> >                             -file   => $blast_out,
> >                             -report_type => 'tblastn');
> >
> > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart 
> Send Frame N
> > Evalue)), "\n";
> > while(my $query = $in->next_result) {
> >     while(my $subject = $query->next_hit) {
> >         while (my $hsp = $subject->next_hsp) {
> >             print join("\t",
> >                 $query->query_name,
> >                 $hsp->start("query"),
> >                 $hsp->end("query"),
> >                 $hsp->strand("hit"),
> >                 $subject->name,
> >                 $hsp->start("hit"),
> >                 $hsp->end("hit"),
> >                 $subject->frame,
> >                 $hsp->n,
> >                 $hsp->evalue,
> >             ),"\n";
> >         }
> >     }
> > }
> >
> >> Dave's basically correct in his explanation. It's a result of the
> >> application of sum statistics by the blast algorithm. You 
> can read  
> >> all
> >> about it in Korf et al's BLAST book. Here's the relevant section:
> >
> > [snip]
> >
> > Thanks,
> >
> > -Amir
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> 


From aaron.j.mackey at gsk.com  Mon Nov 19 11:50:53 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 19 Nov 2007 11:50:53 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <C36205FD.103EA%bosborne11@verizon.net>
Message-ID: <OF0C0B3E21.611ACEBE-ON85257398.005C01A8-85257398.005C8D95@gsk.com>

While Lucia's subject line asked for genbank2gff, her message actually 
asked the reverse (gff + fasta -> genbank).

e.g. pretend you had to prepare a genome annotation for submission to 
GenBank ...

and no, I don't know of any generalized gff2genbank script out there ...

Lucia, the SeqIO::genbank module will write GenBank format, but you have 
to get all the bits and bobs together in the right way, i.e. construct the 
various AnnotationCollections and SeqFeatures (with SplitLocations for 
exons, CDS, etc.) that a GenBank record expects.  One way to do this is to 
start with a template GenBank file that you'd like to mimic, strip it down 
to only two gene models, use SeqIO::genbank to read it into memory, and 
then step through the object with the Perl debugger to see how it is 
composed.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM:

> Chris,
> 
> There's also a genbank2gff3.PLS script in the GMOD package (
> 
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
> revision=1.5&view=markup). However, it has not been modified for a 
couple of
> years, it may not be the "preferred" script.
> 
> See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
> http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more 
information
> on using Bioperl's bp_genbank2gff3 script.
> 
> Brian O.
> 
> 
> On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > There are currently many ways to get what you want, but not all are
> > consistent (particularly re: GFF3).  We are aiming for more
> > consistent, compliant GFF/GTF output in the next developer series
> > (1.7) of Bioperl.
> > 
> > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> > scripts directory); these are probably the most common way when
> > working directly from a seq record.  Bio::Tools::GFF is the most
> > commonly used class though I'm unsure of it's status for GFF3
> > output.  From within a Bio::SeqI you can call write_gff() (currently
> > not very flexible) or from the SeqFeature itself gff_string().
> > Bio::Graphics::Feature has the additional method gff3_string().
> > Bio::FeatureIO is also an option, though I would consider it very
> > experimental (it will likely undergo significant revision in the next
> > bioperl dev series).
> > 
> > Any others anyone can think of, maybe non-BioPerl related as well?
> > 
> > chris
> > 
> > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> > 
> >> Hi
> >> I was asked this question recently
> >> and it occurred to me I must be doing things inefficiently
> >> To produce gff file I was using SeqIO to parse the required fields,
> >> then
> >> according to the conventions just printing out whatever was
> >> required tab
> >> delimited, which is easy
> >> 
> >> but if I wanted to generate a genbank file, extracting features
> >> from a gff file
> >> and a plain fasta file it was more complicated
> >> is there support for gff in bioperl now?
> >> anyone can contribute with  smart way to go from/to gff, genebank
> >> and embl?
> >> 
> >> thanks very much
> >> 
> >> Lucia Peixoto
> >> Department of Biology,SAS
> >> University of Pennsylvania
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From johnston at biochem.ucl.ac.uk  Mon Nov 19 09:46:03 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT)
Subject: [Bioperl-l] blast database names
In-Reply-To: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
	<B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0711191441010.3141@localhost.localdomain>

On Mon, 19 Nov 2007, Chris Fields wrote:

> Here's a recent list (don't know if it's up-to-date):
>
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

Thanks. Perhaps I missed something in the docs, but I don't think I've
quite understood how this is supposed to work. I'm trying to blast primer
sequences against the ref genome sequence. Should I be using ref_contig?
How can I limit the blast to a single species?

cheers,
Cass.


From Kevin.M.Brown at asu.edu  Mon Nov 19 13:31:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 19 Nov 2007 11:31:38 -0700
Subject: [Bioperl-l] pSW vs dpAlign
Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu>

I was able to get the Ext package installed, just had to copy the
Align.pm file up one directory from where it was being put by the
installer.  Now I have a technician trying to use pSW (Bio::Tools::pSW)
and it appears to have been last updated back in '99 and seems to lack
certain methods to get things out of the alignment like the score.  The
test.pl script that Bio::Ext comes with actually uses
Bio::Tools::dpAlign.  Is dpAlign the replacement for pSW?


From bernd.web at gmail.com  Wed Nov 21 11:42:40 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 17:42:40 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de>
	<470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>

Hi Russell,

I came across your question. At first I thought all was well on my
system, but indeed I also have these colouring problems.
I noted that scrore in the bgcolor callback gets a different value!
Printing score during hit parsing($hit->raw_score) gives the same
score as -description
my $score = $feature->score; However, printing score in the bgcolor
sub gives 2573!
All scores in the bgcolor routine all different and higher than the
real scores. Were you able to solve this colouring issue?

Regards,
Bernd

> Hi all,
> I'm using a modified version of Lincoln's tutorial
> (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> to give a similar image to that from NCBI but for some reason, my
> colours are coming out wrong (see attached example)
> They seem to be off by one but I can't see why.
>
> Any ideas?
>
> I can't be certain but I think it's only started doing this since our
> BLAST upgrade to 2.2.17 a few weeks ago.
>
> Here's the colouring code:
> ------------------------------------------------------------------------
> -------
> my $track = $panel->add_track(
>                               -glyph       => 'segments',
>                               -label       => 1,
>                               -connector   => 'dashed',
>                               -bgcolor     => sub {
>                                 my $feature = shift;
>                                 my $score = $feature->score;
>                         return 'red'       if $score >= 200;
>                                     return 'fuchsia' if $score >= 80;
>                                     return 'lime'      if $score >= 50;
>                         return 'blue'      if $score >= 40;
>                                     return 'black';
>                                },
>                               -font2color  => 'gray',
>                               -sort_order  => 'high_score',
>                               -description => sub {
>                                 my $feature = shift;
>                                 return unless
> $feature->has_tag('description');
>                                 my ($description) =
> $feature->each_tag_value('description');
>                                 my $score = $feature->score;
>                                 "$description, score=$score";
>                                },
>                              );
> ------------------------------------------------------------------------
> ---------
>
>
> Thanx,
>
> Russell Smithies
>
>
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bernd.web at gmail.com  Wed Nov 21 12:38:30 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 18:38:30 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk>
	<47022278.7010700@web.de> <47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>

Hi,

I now found that bgcolor is using a  $feature->score that is coming
directly from the blast report, it is not the bit score.
     -bgcolor     => sub {my $feature = shift;
                                  my $score = $feature->score;
				  print "$score\n"; }
always print the score, even if the score is not set in the
Bio::SeqFeature::Generic object.

-description callbacks are somehow using the score from the SeqFeature object.

Does anyone have an idea why?

Further is is possible to get the raw_score of a hit. $hit->raw_score
actually gets the bitscore (w/o decimal point).

Bernd

On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> Hi Russell,
>
> I came across your question. At first I thought all was well on my
> system, but indeed I also have these colouring problems.
> I noted that scrore in the bgcolor callback gets a different value!
> Printing score during hit parsing($hit->raw_score) gives the same
> score as -description
> my $score = $feature->score; However, printing score in the bgcolor
> sub gives 2573!
> All scores in the bgcolor routine all different and higher than the
> real scores. Were you able to solve this colouring issue?
>
> Regards,
> Bernd
>
>
> > Hi all,
> > I'm using a modified version of Lincoln's tutorial
> > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > to give a similar image to that from NCBI but for some reason, my
> > colours are coming out wrong (see attached example)
> > They seem to be off by one but I can't see why.
> >
> > Any ideas?
> >
> > I can't be certain but I think it's only started doing this since our
> > BLAST upgrade to 2.2.17 a few weeks ago.
> >
> > Here's the colouring code:
> > ------------------------------------------------------------------------
> > -------
> > my $track = $panel->add_track(
> >                               -glyph       => 'segments',
> >                               -label       => 1,
> >                               -connector   => 'dashed',
> >                               -bgcolor     => sub {
> >                                 my $feature = shift;
> >                                 my $score = $feature->score;
> >                         return 'red'       if $score >= 200;
> >                                     return 'fuchsia' if $score >= 80;
> >                                     return 'lime'      if $score >= 50;
> >                         return 'blue'      if $score >= 40;
> >                                     return 'black';
> >                                },
> >                               -font2color  => 'gray',
> >                               -sort_order  => 'high_score',
> >                               -description => sub {
> >                                 my $feature = shift;
> >                                 return unless
> > $feature->has_tag('description');
> >                                 my ($description) =
> > $feature->each_tag_value('description');
> >                                 my $score = $feature->score;
> >                                 "$description, score=$score";
> >                                },
> >                              );
> > ------------------------------------------------------------------------
> > ---------
> >
> >
> > Thanx,
> >
> > Russell Smithies
> >
> >
> >
> >
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>


From sac at bioperl.org  Wed Nov 21 13:43:54 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 21 Nov 2007 10:43:54 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>

On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> [snip]
>
> Further is is possible to get the raw_score of a hit. $hit->raw_score
> actually gets the bitscore (w/o decimal point).

Hmmm. raw_score should not be the same as bit score. So given an
example blast hit line such as:

       Score = 60.0 bits (30), Expect = 1e-06

$hit->raw_score() should return 30, not 60, as you seem to be getting.

Could you submit a bug report for this?  http://www.bioperl.org/wiki/Bugs

Thanks,
Steve

>
> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> > Hi Russell,
> >
> > I came across your question. At first I thought all was well on my
> > system, but indeed I also have these colouring problems.
> > I noted that scrore in the bgcolor callback gets a different value!
> > Printing score during hit parsing($hit->raw_score) gives the same
> > score as -description
> > my $score = $feature->score; However, printing score in the bgcolor
> > sub gives 2573!
> > All scores in the bgcolor routine all different and higher than the
> > real scores. Were you able to solve this colouring issue?
> >
> > Regards,
> > Bernd
> >
> >
> > > Hi all,
> > > I'm using a modified version of Lincoln's tutorial
> > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > > to give a similar image to that from NCBI but for some reason, my
> > > colours are coming out wrong (see attached example)
> > > They seem to be off by one but I can't see why.
> > >
> > > Any ideas?
> > >
> > > I can't be certain but I think it's only started doing this since our
> > > BLAST upgrade to 2.2.17 a few weeks ago.
> > >
> > > Here's the colouring code:
> > > ------------------------------------------------------------------------
> > > -------
> > > my $track = $panel->add_track(
> > >                               -glyph       => 'segments',
> > >                               -label       => 1,
> > >                               -connector   => 'dashed',
> > >                               -bgcolor     => sub {
> > >                                 my $feature = shift;
> > >                                 my $score = $feature->score;
> > >                         return 'red'       if $score >= 200;
> > >                                     return 'fuchsia' if $score >= 80;
> > >                                     return 'lime'      if $score >= 50;
> > >                         return 'blue'      if $score >= 40;
> > >                                     return 'black';
> > >                                },
> > >                               -font2color  => 'gray',
> > >                               -sort_order  => 'high_score',
> > >                               -description => sub {
> > >                                 my $feature = shift;
> > >                                 return unless
> > > $feature->has_tag('description');
> > >                                 my ($description) =
> > > $feature->each_tag_value('description');
> > >                                 my $score = $feature->score;
> > >                                 "$description, score=$score";
> > >                                },
> > >                              );
> > > ------------------------------------------------------------------------
> > > ---------
> > >
> > >
> > > Thanx,
> > >
> > > Russell Smithies
> > >
> > >
> > >
> > >
> > > =======================================================================
> > > Attention: The information contained in this message and/or attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or privileged
> > > material. Any review, retransmission, dissemination or other use of, or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > > =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From binkley at genome.stanford.edu  Wed Nov 21 19:35:02 2007
From: binkley at genome.stanford.edu (Jonathan Binkley)
Date: Wed, 21 Nov 2007 16:35:02 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>

Hi,

I installed bioperl on a Mac (OS 10.4, Intel) via fink,
which put it here:

/sw/lib/perl5/5.8.6/Bio/

It seems to work fine, but I need bioperl-ext for
Smith-Waterman alignments.

So, into which directory should I download bioperl-ext and
run the Makefile?

Thanks.


From dcj at sanger.ac.uk  Thu Nov 22 09:47:09 2007
From: dcj at sanger.ac.uk (Daniel Jeffares)
Date: Thu, 22 Nov 2007 14:47:09 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
Message-ID: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>

Hi all,

Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to  
be a little 'broken', at least in my hands.
First,  $bml->set_parameter('runmode', 0); does not work (sets  
runmode to -2). setting runmode to 1 is OK.
Also,  $bml->no_param_checks(1); doesn't seem to work.

The result is that the baseml.ctl file created under /tmp is not  
runnable by baseml with runmode 0. The phylip file created is run OK  
by baeml(with another .ctl file). My script & baseml.ctl below.

Hope it can be fixed,

cheers,

Dan


#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
   use Bio::AlignIO;
   my $alignio = Bio::AlignIO->new(-format => 'phylip',-file =>  
'test.phy');
   my $aln = $alignio->next_aln;

   my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
   $bml->alignment($aln);
   $bml->save_tempfiles(1);
   my $tempdir = $bml->tempdir();


   #set the runmode to zero
   $bml->set_parameter('runmode', 0);

   my ($rc,$parser) = $bml->run();
   system "more $tempdir/baseml.ctl";

   while( my $result = $parser->next_result ) {
     my @otus = $result->get_seqs();
     my $MLmatrix = $result->get_MLmatrix();
     # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
   }
exit;


The baseml.ctl file produced:
seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA
outfile = mlb
fix_rho = 1
verbose = 0
noisy = 0
RateAncestor = 1
kappa = 2.5
model = 0
ndata = 5
Small_Diff = 1e-6
runmode = -2
alpha = 0
fix_kappa = 0
rho = 0
nhomo = 0
getSE = 0
cleandata = 1
fix_alpha = 1
clock = 0
Malpha = 0
ncatG = 5
fix_blength = -1
nparK = 0


Regards,

Daniel Jeffares

______________________________
Population and Comparative Genomics
Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
Phone: +44(0)1223 834244 x 7297
Fax: +44 (0)1223 494919
www.sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Thu Nov 22 11:06:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 22 Nov 2007 17:06:16 +0100
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>

Daniel,

I don't have bioperl-run or PAML installed on my system to test it myself,
but have you tried the latest version of bioperl-run from CVS? It looks like
that code has been worked on since 1.5.2 was released.


If that still doesn't work, could you file this as a bug to make sure it
gets followed up?


Dave


You can grab the tarball here:
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl


and if necessary file the bug here:
BioPerl Bugzilla tracking system <http://bugzilla.open-bio.org/>


From arareko at campus.iztacala.unam.mx  Thu Nov 22 11:37:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 22 Nov 2007 10:37:24 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
Message-ID: <4745B044.5090102@campus.iztacala.unam.mx>

Hi Peter,

In BioPerl, there's no such mapping for db_xref's that I'm aware of. 
Each parser handles db_xref records on its own. Take a look at the 
Bio::SeqIO::genbank code, inside the next_seq() method for example:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup

Regards,
Mauricio.

Peter wrote:
> Dear all,
> 
> I'm one of the Biopython developers.  I've recently got going with
> BioSQL and have been getting to grips with the Biopython BioSQL
> interface.  I'm aware that we need to try and be consistent with
> BioPerl and BioJava, so I'd like to pose my first question related to
> that.
> 
> When loading GenBank records, many features have db_xref qualifiers,
> e.g. from a random CDS feature in E. coli K12:
> 
>                      /db_xref="ASAP:1309"
>                      /db_xref="GI:16128366"
>                      /db_xref="ECOCYC:EG10213"
>                      /db_xref="GeneID:945313"
> 
> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
> "GeneID" before using recording these entries in the seqfeature_dbxref
> and dbxref tables.  For example, "GI" becomes "GeneIndex".
> Biopython's current mapping is as follows:
> 
> # Dictionary of database types, keyed by GenBank db_xref abbreviation
> db_dict = {'GeneID': 'Entrez',
>            'GI': 'GeneIndex',
>            'COG': 'COG',
>            'CDD': 'CDD',
>            'DDBJ': 'DNA Databank of Japan',
>            'Entrez': 'Entrez',
>            'GeneIndex': 'GeneIndex',
>            'PUBMED': 'PubMed',
>            'taxon': 'Taxon',
>            'ATCC': 'ATCC',
>            'ISFinder': 'ISFinder',
>            'GOA': 'Gene Ontology Annotation',
>            'ASAP': 'ASAP',
>            'PSEUDO': 'PSEUDO',
>            'InterPro': 'InterPro',
>            'GEO': 'Gene Expression Omnibus',
>            'EMBL': 'EMBL',
>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>            'ECOCYC': 'EcoCyc',
>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>            }
> 
> In my testing, I've found several GenBank db_xref abbreviation for
> which we don't have a mapping defined, such as "LocusID", "dbSNP",
> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
> 
> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
> similar mapping in their BioSQL code (or GenBank parser), so that
> Biopython can follow your example.
> 
> Thank you,
> 
> Peter
> 
> P.S. See also Biopython bug 2405
> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From avilella at gmail.com  Thu Nov 22 16:55:10 2007
From: avilella at gmail.com (Albert Vilella)
Date: Thu, 22 Nov 2007 21:55:10 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>

Hi,

Am I right in thinking that the '_symbols' hash in SimpleAlign is only
used if one calls the symbol_chars method?

When I comment out this line:

map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
$seq->seq; # line 257

I get a nice speed boost on loading alignments.

Can I comment this line out in the CVS HEAD?

Cheers,

    Albert.

[init] 5.96046447753906e-06 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.0022270679473877 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
2.14348912239075 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
6.91910791397095 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
15.8402290344238 secs...

avilella at magneto:~$ perl
/home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl
-dir /home/avilella/ensembl/exoseq/test -verbose
[init] 1.21593475341797e-05 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.00294303894042969 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
0.510555982589722 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
1.6192569732666 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
3.86473417282104 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta]
6.99602198600769 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta]
7.26704716682434 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta]
8.44332504272461 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta]
12.103296995163 secs...


From cjfields at uiuc.edu  Thu Nov 22 19:30:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:30:51 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu>

How are tests affected?  It might be worth going through the revision  
history to see if there was a specific reason this was implemented,  
but if it passes tests I don't see why we need it.

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 22 19:42:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:42:12 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
	<4745B044.5090102@campus.iztacala.unam.mx>
Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu>

I think SeqIO checks the name for parsing reasons only, in cases  
where the format changes based on the source (such as GenPept  
DBSOURCE data).  I don't think we go beyond that in Bioperl, probably  
b/c modifying or expanding names for data persistence would lead to  
volatile coding issues (i.e. consistency between parsers, constant  
updating to cover new crossrefs, etc).

I would definitely suggest retaining the original DB as it appears in  
the dbxref for consistency/sanity; if needed return expanded names  
using a different method if they are designated.

chris

On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote:

> Hi Peter,
>
> In BioPerl, there's no such mapping for db_xref's that I'm aware of.
> Each parser handles db_xref records on its own. Take a look at the
> Bio::SeqIO::genbank code, inside the next_seq() method for example:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
> Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup
>
> Regards,
> Mauricio.
>
> Peter wrote:
>> Dear all,
>>
>> I'm one of the Biopython developers.  I've recently got going with
>> BioSQL and have been getting to grips with the Biopython BioSQL
>> interface.  I'm aware that we need to try and be consistent with
>> BioPerl and BioJava, so I'd like to pose my first question related to
>> that.
>>
>> When loading GenBank records, many features have db_xref qualifiers,
>> e.g. from a random CDS feature in E. coli K12:
>>
>>                      /db_xref="ASAP:1309"
>>                      /db_xref="GI:16128366"
>>                      /db_xref="ECOCYC:EG10213"
>>                      /db_xref="GeneID:945313"
>>
>> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
>> "GeneID" before using recording these entries in the  
>> seqfeature_dbxref
>> and dbxref tables.  For example, "GI" becomes "GeneIndex".
>> Biopython's current mapping is as follows:
>>
>> # Dictionary of database types, keyed by GenBank db_xref abbreviation
>> db_dict = {'GeneID': 'Entrez',
>>            'GI': 'GeneIndex',
>>            'COG': 'COG',
>>            'CDD': 'CDD',
>>            'DDBJ': 'DNA Databank of Japan',
>>            'Entrez': 'Entrez',
>>            'GeneIndex': 'GeneIndex',
>>            'PUBMED': 'PubMed',
>>            'taxon': 'Taxon',
>>            'ATCC': 'ATCC',
>>            'ISFinder': 'ISFinder',
>>            'GOA': 'Gene Ontology Annotation',
>>            'ASAP': 'ASAP',
>>            'PSEUDO': 'PSEUDO',
>>            'InterPro': 'InterPro',
>>            'GEO': 'Gene Expression Omnibus',
>>            'EMBL': 'EMBL',
>>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>>            'ECOCYC': 'EcoCyc',
>>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>>            }
>>
>> In my testing, I've found several GenBank db_xref abbreviation for
>> which we don't have a mapping defined, such as "LocusID", "dbSNP",
>> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
>>
>> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
>> similar mapping in their BioSQL code (or GenBank parser), so that
>> Biopython can follow your example.
>>
>> Thank you,
>>
>> Peter
>>
>> P.S. See also Biopython bug 2405
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 22 19:49:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:49:15 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>

Albert,

Found it:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
SimpleAlign.pm.diff?r1=1.36&r2=1.37

If it slows performance that dramatically, maybe we can move this to  
a separate AlignUtils method instead.  Maybe something to ask Jason  
about?

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 23 07:29:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Nov 2007 12:29:37 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
Message-ID: <4746C7B1.1010002@sendu.me.uk>

Dave Messina wrote:
> Daniel,
> 
> I don't have bioperl-run or PAML installed on my system to test it myself,
> but have you tried the latest version of bioperl-run from CVS? It looks like
> that code has been worked on since 1.5.2 was released.

Yes, I fixed it in CVS so it should at least /run/. I don't know about 
the parsing side of things, though that may also have been fixed 
recently by someone else.


From avilella at gmail.com  Fri Nov 23 08:08:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Nov 2007 13:08:59 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <4746C7B1.1010002@sendu.me.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>

Just to mention that the new paml4 has a "basemlg" instead of a
"baseml" binary. AFAIK, Jason fixed codeml to make it work both for
paml3.xx a paml4, but I am not sure about baseml.

Also, I think if you set runmode 0, you have to provide a tree:

#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
use Bio::AlignIO;
use Bio::TreeIO;
my $alignio = Bio::AlignIO->new(-format => 'phylip',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy');
my $treeio = Bio::TreeIO->new(-format => 'newick',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree');
my $aln = $alignio->next_aln;
my $tree = $treeio->next_tree;

my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
$bml->alignment($aln);
$bml->tree($tree);
$bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml");
$bml->save_tempfiles(1);
my $tempdir = $bml->tempdir();


#set the runmode to zero
$bml->set_parameter('runmode', 0);

my ($rc,$parser) = $bml->run();
system "more $tempdir/baseml.ctl";

while ( my $result = $parser->next_result ) {
    my @otus = $result->get_seqs();
    my $MLmatrix = $result->get_MLmatrix();
    $DB::single=1;1;
    # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
}
exit;

4 50
Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC
Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC
Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC-
Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC

ACAUUUU-CCUUGCAAAG
ACAUCAU-CCUUGCAAAG
ACAUCAUCCCUCGCAGAG
ACAUCAUCCCUUGCAGAG

(((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm);
On Nov 23, 2007 12:29 PM, Sendu Bala <bix at sendu.me.uk> wrote:
> Dave Messina wrote:
> > Daniel,
> >
> > I don't have bioperl-run or PAML installed on my system to test it myself,
> > but have you tried the latest version of bioperl-run from CVS? It looks like
> > that code has been worked on since 1.5.2 was released.
>
> Yes, I fixed it in CVS so it should at least /run/. I don't know about
> the parsing side of things, though that may also have been fixed
> recently by someone else.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Fri Nov 23 11:24:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 10:24:59 -0600
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
	<358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu>

I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just  
'basemlg'), so it would need to work with both.

Do we want to put a PAML parser/wrapper overhaul on the TODO list for  
1.6?

chris

On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote:

> Just to mention that the new paml4 has a "basemlg" instead of a
> "baseml" binary. AFAIK, Jason fixed codeml to make it work both for
> paml3.xx a paml4, but I am not sure about baseml.
...


From arvindvanam at gmail.com  Fri Nov 23 16:26:06 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST)
Subject: [Bioperl-l]  run RNAfold in perl
Message-ID: <13918981.post@talk.nabble.com>


how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
my $rnafold = $factory->program('rnafold');
my $job=$rnafold->run(-rnafold =>
'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');

I installed Vienna package and then i tried using Pise to create an object
for the program but its giving the following error
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bio::Tools::Run::PiseJob terminated: URL missing
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Tools::Run::PiseJob::terminated
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
STACK: Bio::Tools::Run::PiseApplication::submit
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
STACK: Bio::Tools::Run::PiseApplication::run
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
STACK: evaluate.pl:12


how to make the program RNAfold run in perl... 
IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???

plz reply soon
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Fri Nov 23 17:49:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 16:49:43 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13918981.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>

The Pise wrappers run the programs remotely; see  
Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
mfold wrappers but haven't done so yet.  The Vienna tools do have a  
Perl-based (non-BioPerl-based) module included which uses libRNA, and  
is well worth a look.  Try 'perldoc RNA' if you have installed the  
tools locally, or look here for other Perl-based tools:

http://www.tbi.univie.ac.at/~ivo/RNA/utils.html

chris

On Nov 23, 2007, at 3:26 PM, vanam wrote:

>
> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>
> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
> my $rnafold = $factory->program('rnafold');
> my $job=$rnafold->run(-rnafold =>
> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>
> I installed Vienna package and then i tried using Pise to create an  
> object
> for the program but its giving the following error
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Tools::Run::PiseJob::terminated
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
> STACK: Bio::Tools::Run::PiseApplication::submit
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
> STACK: Bio::Tools::Run::PiseApplication::run
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
> STACK: evaluate.pl:12
>
>
> how to make the program RNAfold run in perl...
> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>
> plz reply soon
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13918981
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Sat Nov 24 02:29:11 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
Message-ID: <13922740.post@talk.nabble.com>


i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and
i used it exactly as it was mentioned in it.

i just want that instead of running its perl version "RNAfold.pl" I can use
the functions associated with RNAfold with a perl program without having to
call the program using system() command.

if you can just tell me how to use these wrapper modules it would b of gr8
help...like while using clustalw or clustalx we define the environment
variable for it ..do we have to do the same for RNAfold or Mfold


Chris Fields wrote:
> 
> The Pise wrappers run the programs remotely; see  
> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
> mfold wrappers but haven't done so yet.  The Vienna tools do have a  
> Perl-based (non-BioPerl-based) module included which uses libRNA, and  
> is well worth a look.  Try 'perldoc RNA' if you have installed the  
> tools locally, or look here for other Perl-based tools:
> 
> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
> 
> chris
> 
> On Nov 23, 2007, at 3:26 PM, vanam wrote:
> 
>>
>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>
>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>> my $rnafold = $factory->program('rnafold');
>> my $job=$rnafold->run(-rnafold =>
>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>
>> I installed Vienna package and then i tried using Pise to create an  
>> object
>> for the program but its giving the following error
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::Tools::Run::PiseJob::terminated
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>> STACK: Bio::Tools::Run::PiseApplication::submit
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>> STACK: Bio::Tools::Run::PiseApplication::run
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>> STACK: evaluate.pl:12
>>
>>
>> how to make the program RNAfold run in perl...
>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>
>> plz reply soon
>> -- 
>> View this message in context: http://www.nabble.com/run-RNAfold-in- 
>> perl-tf4863835.html#a13918981
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From avilella at gmail.com  Sun Nov 25 06:50:42 2007
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 25 Nov 2007 11:50:42 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>

cvs commited now. it is calculated anyway when calling symbol_chars so...

On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> Found it:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>
> If it slows performance that dramatically, maybe we can move this to
> a separate AlignUtils method instead.  Maybe something to ask Jason
> about?
>
> chris
>
> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>
>
> > Hi,
> >
> > Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> > used if one calls the symbol_chars method?
> >
> > When I comment out this line:
> >
> > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> > $seq->seq; # line 257
> >
> > I get a nice speed boost on loading alignments.
> >
> > Can I comment this line out in the CVS HEAD?
> >
> > Cheers,
> >
> >     Albert.
> >
> > [init] 5.96046447753906e-06 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.0022270679473877 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 2.14348912239075 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 6.91910791397095 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 15.8402290344238 secs...
> >
> > avilella at magneto:~$ perl
> > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> > ancestral_alleles.pl
> > -dir /home/avilella/ensembl/exoseq/test -verbose
> > [init] 1.21593475341797e-05 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.00294303894042969 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 0.510555982589722 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 1.6192569732666 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 3.86473417282104 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000203717.chr1.fasta]
> > 6.99602198600769 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000196188.chr1.fasta]
> > 7.26704716682434 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000025800.chr1.fasta]
> > 8.44332504272461 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000117475.chr1.fasta]
> > 12.103296995163 secs...
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Sun Nov 25 10:05:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:05:27 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13922740.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>

Again, these wrappers are for submitting data to a Pise server for  
the corresponding programs (run on a remote server).  There are no  
wrappers for running RNAfold on your computer (i.e. locally), with or  
w/o a set env. variable.  You can try instaling Pise locally and  
setting the location() as shown in POD to localhost, however I don't  
know how stable these modules are with newer versions of Pise.  These  
haven't been updated in a few years, apart from getting tests to work.

Another option is installing EMBOSS along with the EMBASSY version of  
RNAFold; this could conceivably be run through Bio::Factory::EMBOSS.

chris

On Nov 24, 2007, at 1:29 AM, vanam wrote:

>
> i have seen the documentation for  
> Bio::Tools::Run::AnalysisFactory::Pise and
> i used it exactly as it was mentioned in it.
>
> i just want that instead of running its perl version "RNAfold.pl" I  
> can use
> the functions associated with RNAfold with a perl program without  
> having to
> call the program using system() command.
>
> if you can just tell me how to use these wrapper modules it would b  
> of gr8
> help...like while using clustalw or clustalx we define the environment
> variable for it ..do we have to do the same for RNAfold or Mfold
>
>
>
>
> Chris Fields wrote:
>>
>> The Pise wrappers run the programs remotely; see
>> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a
>> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/
>> mfold wrappers but haven't done so yet.  The Vienna tools do have a
>> Perl-based (non-BioPerl-based) module included which uses libRNA, and
>> is well worth a look.  Try 'perldoc RNA' if you have installed the
>> tools locally, or look here for other Perl-based tools:
>>
>> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
>>
>> chris
>>
>> On Nov 23, 2007, at 3:26 PM, vanam wrote:
>>
>>>
>>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>>
>>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>>> my $rnafold = $factory->program('rnafold');
>>> my $job=$rnafold->run(-rnafold =>
>>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>>
>>> I installed Vienna package and then i tried using Pise to create an
>>> object
>>> for the program but its giving the following error
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::Tools::Run::PiseJob::terminated
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>>> STACK: Bio::Tools::Run::PiseApplication::submit
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>>> STACK: Bio::Tools::Run::PiseApplication::run
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>>> STACK: evaluate.pl:12
>>>
>>>
>>> how to make the program RNAfold run in perl...
>>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>>
>>> plz reply soon
>>> -- 
>>> View this message in context: http://www.nabble.com/run-RNAfold-in-
>>> perl-tf4863835.html#a13918981
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13922740
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 10:38:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:38:40 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
Message-ID: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>

Albert,

I was getting a single AlignIO.t fail which appeared to be related to  
this:

...
ok 122 - The object isa Bio::Align::AlignI
ok 123 - consensus_string on metafasta

not ok 124 - symbol_chars() using metafasta
#   Failed test 'symbol_chars() using metafasta'
#   in t/AlignIO.t at line 346.
#          got: '0'
#     expected: '23'

It was b/c the symbol hash was initialized in the constructor (so it  
was present, just empty).  I have changed that in CVS; all tests pass  
now.

chris

On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:

> cvs commited now. it is calculated anyway when calling symbol_chars  
> so...
>
> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> Found it:
>>
>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
>> Bio/
>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>
>> If it slows performance that dramatically, maybe we can move this to
>> a separate AlignUtils method instead.  Maybe something to ask Jason
>> about?
>>
>> chris
>>
>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>
>>
>>> Hi,
>>>
>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is  
>>> only
>>> used if one calls the symbol_chars method?
>>>
>>> When I comment out this line:
>>>
>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>> $seq->seq; # line 257
>>>
>>> I get a nice speed boost on loading alignments.
>>>
>>> Can I comment this line out in the CVS HEAD?
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> [init] 5.96046447753906e-06 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.0022270679473877 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 2.14348912239075 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 6.91910791397095 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 15.8402290344238 secs...
>>>
>>> avilella at magneto:~$ perl
>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>> ancestral_alleles.pl
>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>> [init] 1.21593475341797e-05 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.00294303894042969 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 0.510555982589722 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 1.6192569732666 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 3.86473417282104 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000203717.chr1.fasta]
>>> 6.99602198600769 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000196188.chr1.fasta]
>>> 7.26704716682434 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000025800.chr1.fasta]
>>> 8.44332504272461 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000117475.chr1.fasta]
>>> 12.103296995163 secs...
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Sun Nov 25 11:13:44 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 25 Nov 2007 17:13:44 +0100
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>

Hi,

I am not sure if this is related, but I remember SimpleAlign was
adapted to cope with more gap symbols that can occur in
alignments/FastA sequences, as: . _ - =
Previous versions would throw an error on 'illegal' gap characters,

Regards,
Bernd

On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> I was getting a single AlignIO.t fail which appeared to be related to
> this:
>
> ...
> ok 122 - The object isa Bio::Align::AlignI
> ok 123 - consensus_string on metafasta
>
> not ok 124 - symbol_chars() using metafasta
> #   Failed test 'symbol_chars() using metafasta'
> #   in t/AlignIO.t at line 346.
> #          got: '0'
> #     expected: '23'
>
> It was b/c the symbol hash was initialized in the constructor (so it
> was present, just empty).  I have changed that in CVS; all tests pass
> now.
>
> chris
>
>
> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>
> > cvs commited now. it is calculated anyway when calling symbol_chars
> > so...
> >
> > On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Albert,
> >>
> >> Found it:
> >>
> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >> Bio/
> >> SimpleAlign.pm.diff?r1=1.36&r2=1.37
> >>
> >> If it slows performance that dramatically, maybe we can move this to
> >> a separate AlignUtils method instead.  Maybe something to ask Jason
> >> about?
> >>
> >> chris
> >>
> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
> >>> only
> >>> used if one calls the symbol_chars method?
> >>>
> >>> When I comment out this line:
> >>>
> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> >>> $seq->seq; # line 257
> >>>
> >>> I get a nice speed boost on loading alignments.
> >>>
> >>> Can I comment this line out in the CVS HEAD?
> >>>
> >>> Cheers,
> >>>
> >>>     Albert.
> >>>
> >>> [init] 5.96046447753906e-06 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.0022270679473877 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 2.14348912239075 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 6.91910791397095 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 15.8402290344238 secs...
> >>>
> >>> avilella at magneto:~$ perl
> >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> >>> ancestral_alleles.pl
> >>> -dir /home/avilella/ensembl/exoseq/test -verbose
> >>> [init] 1.21593475341797e-05 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.00294303894042969 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 0.510555982589722 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 1.6192569732666 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 3.86473417282104 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000203717.chr1.fasta]
> >>> 6.99602198600769 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000196188.chr1.fasta]
> >>> 7.26704716682434 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000025800.chr1.fasta]
> >>> 8.44332504272461 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000117475.chr1.fasta]
> >>> 12.103296995163 secs...
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Sun Nov 25 11:39:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 10:39:01 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
	<716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
Message-ID: <B849A608-7C12-4C87-BB93-D846959F0523@uiuc.edu>

Bernd,

That would be when generating Bio::LocatableSeq instances for  
building a Bio::SimpleAlign object.  Judging by test suite results  
that doesn't appear to be affected.

chris

On Nov 25, 2007, at 10:13 AM, Bernd Web wrote:

> Hi,
>
> I am not sure if this is related, but I remember SimpleAlign was
> adapted to cope with more gap symbols that can occur in
> alignments/FastA sequences, as: . _ - =
> Previous versions would throw an error on 'illegal' gap characters,
>
> Regards,
> Bernd
>
> On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> I was getting a single AlignIO.t fail which appeared to be related to
>> this:
>>
>> ...
>> ok 122 - The object isa Bio::Align::AlignI
>> ok 123 - consensus_string on metafasta
>>
>> not ok 124 - symbol_chars() using metafasta
>> #   Failed test 'symbol_chars() using metafasta'
>> #   in t/AlignIO.t at line 346.
>> #          got: '0'
>> #     expected: '23'
>>
>> It was b/c the symbol hash was initialized in the constructor (so it
>> was present, just empty).  I have changed that in CVS; all tests pass
>> now.
>>
>> chris
>>
>>
>> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>>
>>> cvs commited now. it is calculated anyway when calling symbol_chars
>>> so...
>>>
>>> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>>>> Albert,
>>>>
>>>> Found it:
>>>>
>>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
>>>> Bio/
>>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>>>
>>>> If it slows performance that dramatically, maybe we can move  
>>>> this to
>>>> a separate AlignUtils method instead.  Maybe something to ask Jason
>>>> about?
>>>>
>>>> chris
>>>>
>>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
>>>>> only
>>>>> used if one calls the symbol_chars method?
>>>>>
>>>>> When I comment out this line:
>>>>>
>>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>>>> $seq->seq; # line 257
>>>>>
>>>>> I get a nice speed boost on loading alignments.
>>>>>
>>>>> Can I comment this line out in the CVS HEAD?
>>>>>
>>>>> Cheers,
>>>>>
>>>>>     Albert.
>>>>>
>>>>> [init] 5.96046447753906e-06 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.0022270679473877 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 2.14348912239075 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 6.91910791397095 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 15.8402290344238 secs...
>>>>>
>>>>> avilella at magneto:~$ perl
>>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>>>> ancestral_alleles.pl
>>>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>>>> [init] 1.21593475341797e-05 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.00294303894042969 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 0.510555982589722 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 1.6192569732666 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 3.86473417282104 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000203717.chr1.fasta]
>>>>> 6.99602198600769 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000196188.chr1.fasta]
>>>>> 7.26704716682434 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000025800.chr1.fasta]
>>>>> 8.44332504272461 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000117475.chr1.fasta]
>>>>> 12.103296995163 secs...
>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 13:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 12:51:42 -0600
Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live
Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu>

I have been making some significant changes to  
Bio::SeqIO::staden::read over the last few months which incorporate  
code from Bugzilla (bugs 2074 and 2329, very kindly donated from  
Chris Bailey and Joel Martin, cheers!).

Significant Changes:

* All Inline code in staden::read are now XS-based
* A new method has been added to Bio::SeqIO::staden::read for  
optionally getting trace data (i.e. for drawing graphs).

The method ode is now implemented in Bio::SeqIO::abi, with example  
code in examples/quality/svgtrace.pl.  These changes should allow  
newer versions of Staden io_lib as well (the code is tested with  
io_lib 1.9.2), though they haven't been tested extensively as I am  
having problems compiling newer io_lib versions on my MacBook.  It's  
very likely more changes will need to be made along the way; some  
issues were found with XS compilation which appear harmless but need  
to be investigated, and trace data from other formats need to be  
evaluated.  The possibility exists that many of these changes break  
backward compatibility with older bioperl releases, though tests  
passed with bioperl 1.5.2.

Any feedback re: platform issues, test results using newer io_lib  
versions, older bioperl-versions, etc would be greatly appreciated.   
I'm hoping this will stimulate more interest in getting other bioperl- 
ext modules up-to-date with bioperl-live.

chris


From cjfields at uiuc.edu  Mon Nov 26 13:59:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 12:59:23 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
Message-ID: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>

Steve, Bernd, (and Jason, since you may have some input on this as  
well),

I am now looking into the bug Bernd submitted and it seems there is a  
serious discrepancy with the way the hit raw_score, bits, and  
significance is determined for Hit objects.  Unless I am mistaken  
these should always come from the best HSP when they are present,  
falling back to the hit table data only when no HSP alignments are  
present.  Under the latter conditions a minimal Hit object is made  
from data in the hit table, which reports the rounded bit score, not  
the raw score, so in those cases the raw score would be undefined  
(and you probably should get a nasty warning indicating there are no  
HSPs present to get the data from).

What is occurring now, though, is the raw_score and significance is  
explicitly set from the hit table in the BLAST parser for the Hit  
object at all times, while the bits are correctly derived from the  
best HSP (no fallback to the hit table).  Changing to the behavior  
above results in several tests failing via SearchIO.t, with each  
failed test reporting the expected (read:correct) raw score.

I'll look through the tests just in case, but I am planning on  
committing changes to the BLAST parsers, GenericHit, and SearchIO.t  
(to reflect the correct expected data) in the next day or two unless  
there are any objections.

chris

On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:

> On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
>> [snip]
>>
>> Further is is possible to get the raw_score of a hit. $hit->raw_score
>> actually gets the bitscore (w/o decimal point).
>
> Hmmm. raw_score should not be the same as bit score. So given an
> example blast hit line such as:
>
>        Score = 60.0 bits (30), Expect = 1e-06
>
> $hit->raw_score() should return 30, not 60, as you seem to be getting.
>
> Could you submit a bug report for this?  http://www.bioperl.org/ 
> wiki/Bugs
>
> Thanks,
> Steve
>
>>
>> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
>>> Hi Russell,
>>>
>>> I came across your question. At first I thought all was well on my
>>> system, but indeed I also have these colouring problems.
>>> I noted that scrore in the bgcolor callback gets a different value!
>>> Printing score during hit parsing($hit->raw_score) gives the same
>>> score as -description
>>> my $score = $feature->score; However, printing score in the bgcolor
>>> sub gives 2573!
>>> All scores in the bgcolor routine all different and higher than the
>>> real scores. Were you able to solve this colouring issue?
>>>
>>> Regards,
>>> Bernd
>>>
>>>
>>>> Hi all,
>>>> I'm using a modified version of Lincoln's tutorial
>>>> (http://www.bioperl.org/wiki/ 
>>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
>>>> and I'm colouring the HSPs by setting the -bgcolor by score with  
>>>> a sub
>>>> to give a similar image to that from NCBI but for some reason, my
>>>> colours are coming out wrong (see attached example)
>>>> They seem to be off by one but I can't see why.
>>>>
>>>> Any ideas?
>>>>
>>>> I can't be certain but I think it's only started doing this  
>>>> since our
>>>> BLAST upgrade to 2.2.17 a few weeks ago.
>>>>
>>>> Here's the colouring code:
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> -------
>>>> my $track = $panel->add_track(
>>>>                               -glyph       => 'segments',
>>>>                               -label       => 1,
>>>>                               -connector   => 'dashed',
>>>>                               -bgcolor     => sub {
>>>>                                 my $feature = shift;
>>>>                                 my $score = $feature->score;
>>>>                         return 'red'       if $score >= 200;
>>>>                                     return 'fuchsia' if $score  
>>>> >= 80;
>>>>                                     return 'lime'      if $score  
>>>> >= 50;
>>>>                         return 'blue'      if $score >= 40;
>>>>                                     return 'black';
>>>>                                },
>>>>                               -font2color  => 'gray',
>>>>                               -sort_order  => 'high_score',
>>>>                               -description => sub {
>>>>                                 my $feature = shift;
>>>>                                 return unless
>>>> $feature->has_tag('description');
>>>>                                 my ($description) =
>>>> $feature->each_tag_value('description');
>>>>                                 my $score = $feature->score;
>>>>                                 "$description, score=$score";
>>>>                                },
>>>>                              );
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> ---------
>>>>
>>>>
>>>> Thanx,
>>>>
>>>> Russell Smithies
>>>>
>>>>
>>>>
>>>>
>>>> =================================================================== 
>>>> ====
>>>> Attention: The information contained in this message and/or  
>>>> attachments
>>>> from AgResearch Limited is intended only for the persons or  
>>>> entities
>>>> to which it is addressed and may contain confidential and/or  
>>>> privileged
>>>> material. Any review, retransmission, dissemination or other use  
>>>> of, or
>>>> taking of any action in reliance upon, this information by  
>>>> persons or
>>>> entities other than the intended recipients is prohibited by  
>>>> AgResearch
>>>> Limited. If you have received this message in error, please  
>>>> notify the
>>>> sender immediately.
>>>> =================================================================== 
>>>> ====
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Mon Nov 26 14:08:41 2007
From: arvindvanam at gmail.com (vanam)
Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
Message-ID: <13955209.post@talk.nabble.com>


i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m
unable to find a downloadable version.all ther is a web interface for it.
can u tell frm wher to fdownload it????

or can you just tell me how to set the location in piseapplication to
localhost n wat to enter in the $email variable????
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Nov 26 15:08:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 14:08:24 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13955209.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
	<13955209.post@talk.nabble.com>
Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu>


On Nov 26, 2007, at 1:08 PM, vanam wrote:

> i searches for the embassy version of RNAFOLD (i guess its  
> vrnafold) but i m
> unable to find a downloadable version.all ther is a web interface  
> for it.
> can u tell frm wher to fdownload it????

You will need to install EMBOSS as well as the EMBASSY version of  
VIENNA (something which is documented in the docs that come along  
with the distributions and I will not go into detail on):

ftp://emboss.open-bio.org/pub/EMBOSS/

This would be your best bet.  Understand that there is no specific  
class framework for dealing with RNA secondary structure in BioPerl,  
so you will be on your own for now.

My suggestion for using Pise had the very important caveats that (1)  
it very well may not work, (2) I have no experience with Pise, let  
alone setting it up locally, therefore (3) I haven't tested it (and  
don't intend to, as I don't have the time).

> or can you just tell me how to set the location in piseapplication to
> localhost n wat to enter in the $email variable????

I have pointed out documentation previously which comes with the  
modules in question.  Remember perldoc is your friend; consulting it  
saves me (and everyone else) time.

 From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise':

----------------------------------------------

DESCRIPTION

Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli-
cation objects, that let you submit jobs on a Pise server.

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                             -email => 'me at myhome');

The email is optional (there is default one). It can be useful, though.
Your program might enter infinite loops, or just run many jobs: the
Pise server maintainer needs a contact (s/he could of course cancel any
requests from your address...). And if you plan to run a lot of heavy
jobs, or to do a course with many students, please ask the maintainer
before.

The location parameter stands for the actual CGI location, except when
set at the factory creation step, where it is rather the root of all
CGI.  There are default values for most of Pise programs.

You can either set location at:

1 factory creation:
      my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                     -location => 'http://somewhere/ 
Pise/cgi-bin',
                                     -email => 'me at myhome');

2 program creation:
      my $program = $factory->program('water',
                               -location => 'http://somewhere/Pise/ 
cgi-bin/water.pl'
                                      );

3 any time before running:
      $program->location('http://somewhere/Pise/cgi-bin/water.pl');
      $job = $program->run();

4 when running:
      $job = $program->run(-location => 'http://somewhere/Pise/cgi- 
bin/water.pl');

You can also retrieve a previous job results by providing its url:

   $job = $factory->job($url);

You get the url of a job by:

   $job->jobid;

----------------------------------------------

chris


From sac at bioperl.org  Mon Nov 26 20:41:59 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 17:41:59 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>

Chris,

Cood catch. You're on track here with one exception: WU blast and NCBI
blast behave differently in what they report in the hit table: WU
blast puts the raw score in the table not the bit score as NCBI blast
does (see example below for reference). WU blast also swaps their
location in the HSP header relative to how NCBI reports it. It would
be good to verify that the blast parser isn't befuddled by this. A
quick look at SearchIO::blast and it appears that data from the hit
table is always getting stored as score, not bits for WU blast. Not
sure if the HSP section data are parsed correctly. I'd recommend
looking into these things when you do your fixes.

So in the end, WU blast HSPs that are built from the hit table should
report a value for raw_score and punt on bits, but NCBI HSPs so
constructed should do the opposite. The downside to this arrangement
is that code that works for NCBI blast hits will need modification to
work for WU blast hits, but that is just the nature of the data. It
shouldn't be an issue for the majority of users that stick with one
flavor of blast and don't switch back and forth, or for users that get
their HSP scoring data from HSP sections rather than relying on the
hit table.

Ideally, the HSP object would know whether it was NCBI or WU-based and
issue an informative warning when attempting to access data it doesn't
have. One solution might be for the parser to put a 'WU-' in front of
the algorithm name for WU blast reports, so it would then be available
for the contained hit/hsp objects. This could break anything dependent
on algorithm name, so it would need some testing.

Steve

Example WU blast table and HSP header:
                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh...  4141  0.0       1
gb|AAC76922.1| (AE000468) aspartokinase II and homoserine...   844  3.1e-86   1
gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi...   483  2.8e-47   1
gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c...    97  0.0010    1

>gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I
            [Escherichia coli]
        Length = 820

 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0
 Identities = 820/820 (100%), Positives = 820/820 (100%)


Example NCBI blast table and HSP header:

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189...
115   8e-26

>ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397
           transcript:ENST00000357569
          Length = 425

 Score =  120 bits (301), Expect = 3e-27
 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%)


On Nov 26, 2007 10:59 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Steve, Bernd, (and Jason, since you may have some input on this as
> well),
>
> I am now looking into the bug Bernd submitted and it seems there is a
> serious discrepancy with the way the hit raw_score, bits, and
> significance is determined for Hit objects.  Unless I am mistaken
> these should always come from the best HSP when they are present,
> falling back to the hit table data only when no HSP alignments are
> present.  Under the latter conditions a minimal Hit object is made
> from data in the hit table, which reports the rounded bit score, not
> the raw score, so in those cases the raw score would be undefined
> (and you probably should get a nasty warning indicating there are no
> HSPs present to get the data from).
>
> What is occurring now, though, is the raw_score and significance is
> explicitly set from the hit table in the BLAST parser for the Hit
> object at all times, while the bits are correctly derived from the
> best HSP (no fallback to the hit table).  Changing to the behavior
> above results in several tests failing via SearchIO.t, with each
> failed test reporting the expected (read:correct) raw score.
>
> I'll look through the tests just in case, but I am planning on
> committing changes to the BLAST parsers, GenericHit, and SearchIO.t
> (to reflect the correct expected data) in the next day or two unless
> there are any objections.
>
> chris
>
>
> On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:
>
> > On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> >> [snip]
> >>
> >> Further is is possible to get the raw_score of a hit. $hit->raw_score
> >> actually gets the bitscore (w/o decimal point).
> >
> > Hmmm. raw_score should not be the same as bit score. So given an
> > example blast hit line such as:
> >
> >        Score = 60.0 bits (30), Expect = 1e-06
> >
> > $hit->raw_score() should return 30, not 60, as you seem to be getting.
> >
> > Could you submit a bug report for this?  http://www.bioperl.org/
> > wiki/Bugs
> >
> > Thanks,
> > Steve
> >
> >>
> >> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> >>> Hi Russell,
> >>>
> >>> I came across your question. At first I thought all was well on my
> >>> system, but indeed I also have these colouring problems.
> >>> I noted that scrore in the bgcolor callback gets a different value!
> >>> Printing score during hit parsing($hit->raw_score) gives the same
> >>> score as -description
> >>> my $score = $feature->score; However, printing score in the bgcolor
> >>> sub gives 2573!
> >>> All scores in the bgcolor routine all different and higher than the
> >>> real scores. Were you able to solve this colouring issue?
> >>>
> >>> Regards,
> >>> Bernd
> >>>
> >>>
> >>>> Hi all,
> >>>> I'm using a modified version of Lincoln's tutorial
> >>>> (http://www.bioperl.org/wiki/
> >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
> >>>> and I'm colouring the HSPs by setting the -bgcolor by score with
> >>>> a sub
> >>>> to give a similar image to that from NCBI but for some reason, my
> >>>> colours are coming out wrong (see attached example)
> >>>> They seem to be off by one but I can't see why.
> >>>>
> >>>> Any ideas?
> >>>>
> >>>> I can't be certain but I think it's only started doing this
> >>>> since our
> >>>> BLAST upgrade to 2.2.17 a few weeks ago.
> >>>>
> >>>> Here's the colouring code:
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> -------
> >>>> my $track = $panel->add_track(
> >>>>                               -glyph       => 'segments',
> >>>>                               -label       => 1,
> >>>>                               -connector   => 'dashed',
> >>>>                               -bgcolor     => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 my $score = $feature->score;
> >>>>                         return 'red'       if $score >= 200;
> >>>>                                     return 'fuchsia' if $score
> >>>> >= 80;
> >>>>                                     return 'lime'      if $score
> >>>> >= 50;
> >>>>                         return 'blue'      if $score >= 40;
> >>>>                                     return 'black';
> >>>>                                },
> >>>>                               -font2color  => 'gray',
> >>>>                               -sort_order  => 'high_score',
> >>>>                               -description => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 return unless
> >>>> $feature->has_tag('description');
> >>>>                                 my ($description) =
> >>>> $feature->each_tag_value('description');
> >>>>                                 my $score = $feature->score;
> >>>>                                 "$description, score=$score";
> >>>>                                },
> >>>>                              );
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> ---------
> >>>>
> >>>>
> >>>> Thanx,
> >>>>
> >>>> Russell Smithies
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ===================================================================
> >>>> ====
> >>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>> from AgResearch Limited is intended only for the persons or
> >>>> entities
> >>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>> material. Any review, retransmission, dissemination or other use
> >>>> of, or
> >>>> taking of any action in reliance upon, this information by
> >>>> persons or
> >>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>> Limited. If you have received this message in error, please
> >>>> notify the
> >>>> sender immediately.
> >>>> ===================================================================
> >>>> ====
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From sac at bioperl.org  Mon Nov 26 22:27:09 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 19:27:09 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com>

Hi Jon,

I'd recommend downloading it into a separate location of your choosing
(~/lib/bioperl-ext for example) and running the installer as specified
in the docs that come with the download. Then you can include the
location you installed it into via a "use lib '~/lib/bioperl-ext'"
statement at the top of your script. It may be handy to install it as
a non-root user so that you don't alter the main perl installation.

This way your ext install will stay separate from your main bioperl
and perl installations.

There are some docs about the ext packages you might want to check out
at http://www.bioperl.org/wiki/Ext_package.

Steve

On Nov 21, 2007 4:35 PM, Jonathan Binkley <binkley at genome.stanford.edu> wrote:
> Hi,
>
> I installed bioperl on a Mac (OS 10.4, Intel) via fink,
> which put it here:
>
> /sw/lib/perl5/5.8.6/Bio/
>
> It seems to work fine, but I need bioperl-ext for
> Smith-Waterman alignments.
>
> So, into which directory should I download bioperl-ext and
> run the Makefile?
>
> Thanks.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From a_arya2000 at yahoo.com  Tue Nov 27 09:51:41 2007
From: a_arya2000 at yahoo.com (a_arya2000)
Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST)
Subject: [Bioperl-l] Bioperl-ext test fails
Message-ID: <615478.1036.qm@web60113.mail.yahoo.com>

Hello,
I downloaded latest bioperl-ext from bioperl website,
and I have io_lib v1.8.11 installed, and I was trying
to install Bio::SeqIO::staden::read (of bioperl-ext).
It compiled fine without any error but when I run make
test I got following output. 


ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib/lib', 'blib/arch')" t/*.t
t/staden_read....ok 3/94# Test 7 got: "0"
(t/staden_read.t at line 110 *TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
#  t/staden_read.t line 110 is:         ok(0, undef,
"We don't have the ability to write files for $format
format") for 1..7;
# Test 8 got: "0" (t/staden_read.t at line 110 fail #2
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 9 got: "0" (t/staden_read.t at line 110 fail #3
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 10 got: "0" (t/staden_read.t at line 110 fail
#4 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 11 got: "0" (t/staden_read.t at line 110 fail
#5 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 12 got: "0" (t/staden_read.t at line 110 fail
#6 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 13 got: "0" (t/staden_read.t at line 110 fail
#7 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 14 got: "0" (t/staden_read.t at line 62 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
#  t/staden_read.t line 62 is:  ok(0, undef, "Still
missing test files for $format format") for
(1..$formatlooptests);
# Test 15 got: "0" (t/staden_read.t at line 62 fail #2
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 16 got: "0" (t/staden_read.t at line 62 fail #3
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 17 got: "0" (t/staden_read.t at line 62 fail #4
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 18 got: "0" (t/staden_read.t at line 62 fail #5
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 19 got: "0" (t/staden_read.t at line 62 fail #6
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 20 got: "0" (t/staden_read.t at line 62 fail #7
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 21 got: "0" (t/staden_read.t at line 62 fail #8
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 22 got: "0" (t/staden_read.t at line 62 fail #9
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 23 got: "0" (t/staden_read.t at line 62 fail
#10 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 24 got: "0" (t/staden_read.t at line 62 fail
#11 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 25 got: "0" (t/staden_read.t at line 62 fail
#12 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 31 got: "0" (t/staden_read.t at line 107
*TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
#  t/staden_read.t line 107 is:             ok(0,
undef, "Can't write valid ctf files until we have a
trace object") for 1..7;
# Test 32 got: "0" (t/staden_read.t at line 107 fail
#2 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 33 got: "0" (t/staden_read.t at line 107 fail
#3 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 34 got: "0" (t/staden_read.t at line 107 fail
#4 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 35 got: "0" (t/staden_read.t at line 107 fail
#5 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 36 got: "0" (t/staden_read.t at line 107 fail
#6 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 37 got: "0" (t/staden_read.t at line 107 fail
#7 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
t/staden_read....ok                                   
                      
All tests successful.
Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
0.15 csys =  1.71 CPU)


Anyone has any idea what might be going wrong here? By
the way, my OS is Linux. Thank you very much.

Arya


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/


From bix at sendu.me.uk  Tue Nov 27 10:41:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 15:41:38 +0000
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com>
References: <615478.1036.qm@web60113.mail.yahoo.com>
Message-ID: <474C3AB2.5050208@sendu.me.uk>

a_arya2000 wrote:
> Hello,
> I downloaded latest bioperl-ext from bioperl website,
> and I have io_lib v1.8.11 installed, and I was trying
> to install Bio::SeqIO::staden::read (of bioperl-ext).
> It compiled fine without any error but when I run make
> test I got following output. 
[...]
> All tests successful.
> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
> 0.15 csys =  1.71 CPU)
> 
> 
> Anyone has any idea what might be going wrong here? By
> the way, my OS is Linux. Thank you very much.

Not being familiar with the test script or ext, I can at least say that 
nothing actually went wrong: 'All tests successful'. Apparently there 
are some things in the test script that are known by the author to not 
work quite right, so he marked them as 'todo'. The problems seem 
harmless in any case, with things returning 0 instead of undef.

So, unless you've reason to believe there is something significant going 
on, all is well.


From alison.waller at utoronto.ca  Mon Nov 26 16:06:35 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Mon, 26 Nov 2007 16:06:35 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
Message-ID: <005a01c83070$3a814580$d81efea9@AWALL>

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs help
with bioinformatics so I can analyze huge amounts of data to finish my
thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want to
have the flexibility to obtain information on say the top5.  I am pretty
sure I have done this wrong, any advice on how to correct my script to do
this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many hits
to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From bix at sendu.me.uk  Tue Nov 27 12:01:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 17:01:36 +0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <474C4D70.2010206@sendu.me.uk>

alison waller wrote:
> I am trying to write a script that will parse large blast files (usually
> blastx) I also want to be able to specify how many hits I want to report
> information on.
> 
> Most of the time I will only want information on the top hit, but I want to
> have the flexibility to obtain information on say the top5.  I am pretty
> sure I have done this wrong, any advice on how to correct my script to do
> this, would be great.

[snip]

>       if ($top_hit=$result->next_hit) # this might be wrong - I want to
> specify how many hits to print results for

I didn't really pay attention to the rest of your code, but assuming it 
all works except for only ever giving you info for the top hit, you just 
need to change this 'if' to a loop of some kind.

# ...
my $hits = 0;

while (my $hit = $result->next_hit) {
  $hits++;
  last if $hits > $tophit;
  # ...
}


From David.Messina at sbc.su.se  Tue Nov 27 12:55:44 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 27 Nov 2007 18:55:44 +0100
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <474C4D70.2010206@sendu.me.uk>
References: <005a01c83070$3a814580$d81efea9@AWALL>
	<474C4D70.2010206@sendu.me.uk>
Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>

Hi Alison,
As Sendu mentioned, the key bit is adding a condition to the hit loop to
limit the number of hits that are printed. I didn't test the below
extensively, but give it a try...


Dave


#!/usr/local/bin/perl -w

# Parsing BLAST reports with BioPerl's Bio::SearchIO module
# alison waller November 2007

use strict;
use warnings;
use Bio::SearchIO;

my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }

my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query

#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";

my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);

print OUT
  "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

# Go through BLAST reports one by one
while ( my $result = $report->next_result ) {
    my $i = 0;
    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
        while ( my $hsp = $hit->next_hsp ) {

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t";
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t";
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t";
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n";
        }
    }

    if ($i == 0) { print OUT "no hits\n"; }
}


From Russell.Smithies at agresearch.co.nz  Tue Nov 27 14:31:29 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 28 Nov 2007 08:31:29 +1300
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>

Do the hits need to be sorted first or is this done automagicly?
I ask this as I know Megablast doesn't provide sorted output for most of
it's formats.

Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Dave Messina
> Sent: Wednesday, 28 November 2007 6:56 a.m.
> To: alison waller
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
> 
> Hi Alison,
> As Sendu mentioned, the key bit is adding a condition to the hit loop
to
> limit the number of hits that are printed. I didn't test the below
> extensively, but give it a try...
> 
> 
> Dave
> 
> 
> 
> #!/usr/local/bin/perl -w
> 
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
> 
> use strict;
> use warnings;
> use Bio::SearchIO;
> 
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
hits>\n";
> if (@ARGV != 2) { die $usage; }
> 
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                         # to report for each query
> 
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
$!\n";
> 
> my $report = new Bio::SearchIO(
>     -file   => "$infile",
>     -format => "blast"
> );
> 
> print OUT
>
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
> Qstrand\tHstrand\n";
> 
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>     my $i = 0;
>     while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>         while ( my $hsp = $hit->next_hsp ) {
> 
>             # Print some tab-delimited data about this hit
>             print OUT $result->query_name,     "\t";
>             print OUT $hit->name,              "\t";
>             print OUT $hit->significance,      "\t";
>             print OUT $hit->bits,              "\t";
>             print OUT $hsp->evalue,            "\t";
>             print OUT $hsp->percent_identity,  "\t";
>             print OUT $hsp->length('total'),   "\t";
>             print OUT $hsp->num_identical,     "\t";
>             print OUT $hsp->gaps,              "\t";
>             print OUT $hsp->strand('query'),   "\t";
>             print OUT $hsp->strand('hit'),     "\n";
>         }
>     }
> 
>     if ($i == 0) { print OUT "no hits\n"; }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Nov 27 16:09:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:09:43 -0600
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <474C3AB2.5050208@sendu.me.uk>
References: <615478.1036.qm@web60113.mail.yahoo.com>
	<474C3AB2.5050208@sendu.me.uk>
Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu>

You can always test it within the bioperl suite after it's installed;  
several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read.  In general  
though if it's passing tests it should be fine.

chris

On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote:

> a_arya2000 wrote:
>> Hello,
>> I downloaded latest bioperl-ext from bioperl website,
>> and I have io_lib v1.8.11 installed, and I was trying
>> to install Bio::SeqIO::staden::read (of bioperl-ext).
>> It compiled fine without any error but when I run make
>> test I got following output.
> [...]
>> All tests successful.
>> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr +
>> 0.15 csys =  1.71 CPU)
>>
>>
>> Anyone has any idea what might be going wrong here? By
>> the way, my OS is Linux. Thank you very much.
>
> Not being familiar with the test script or ext, I can at least say  
> that
> nothing actually went wrong: 'All tests successful'. Apparently there
> are some things in the test script that are known by the author to not
> work quite right, so he marked them as 'todo'. The problems seem
> harmless in any case, with things returning 0 instead of undef.
>
> So, unless you've reason to believe there is something significant  
> going
> on, all is well.


From cjfields at uiuc.edu  Tue Nov 27 16:00:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:00:33 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Tue Nov 27 20:06:30 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT)
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
Message-ID: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>

Hello,

I was playing around with Primer3, and I hit a problem. Not sure if it's a
bug or if I was doing something I wasn't supposed to, but if it's the
latter, I thought it might save someone else half an hour of banging their
head of a keyboard if I mentioned it:

What I was doing was roughly:

# create a primer3 obj
my $p3 = ...Primer3->new();

# loop through some sequences generating primers for
# each of them using the same primer3 obj
while (@some_bio_seqs){
  my $res = $p3->run;
  ...
}

This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC,
at which point it worked for a few sequences then I got a "can't place
primer on sequence"  error.

After a bit of faffing about, I think the problem occurs when no primers
are found. In which case $p3 still has the primers from the previous run,
which don't come from the current sequence, so can't be placed on it. I
tried calling $p3->cleanup in the loop, but that didn't work either.
Creating a new $p3 every time works fine.

Are you supposed to create a new Primer3 object for every sequence?
(Apologies if I missed the relevant bit of the docs).

Cheers,
Cass xx


From alison.waller at utoronto.ca  Tue Nov 27 16:32:07 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Tue, 27 Nov 2007 16:32:07 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>
Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>

Thanks Everyone,

Your edits worked Dave, however after looking at the output I realized that
I only want information on the top hsp per query returned.  For example some
of the querys the top hit has two hsps so it returned both.

I tried to further edit it, but after 3 attempts they are all failing, I
think due to me using the loops wrong.

I also have another problem, I also want to retrieve the gi, this doesn't
seem to be straight forward as it should.  I found another method
_get_seq_identifiers, but this looks awkward, isn't there and object for the
gi?

I've pasted my non-working script below if there are any suggestions on how
to get it to print out just the first hsp per hit, that would be great.

Thanks,


#!/usr/local/bin/perl -w


# Parsing BLAST reports with BioPerl's Bio::SearchIO module 
# alison waller November 2007


use strict;
use warnings;
use Bio::SearchIO;


my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }


my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query


#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";


my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);


print OUT
 
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
strand\tHstrand\n";


# Go through BLAST reports one by one
while (my $result = $report->next_result) {
	my $i=0;
	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
        

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t"; 
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t"; 
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t"; 
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n"; 
        }
}
    if ($i == 0) { print OUT "no hits\n"; } 

}

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, November 27, 2007 4:01 PM
To: Smithies, Russell
Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dennis.prickett at bbsrc.ac.uk  Wed Nov 28 05:18:26 2007
From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C))
Date: Wed, 28 Nov 2007 10:18:26 -0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk>

Dear Alison
 
Or, if you are absolutely only interested in the top hit you could limit
it to that in the blast  command by adding the parameters  " -b 1 ".  

This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps,
etc).  Your blasts run faster and then you won't have to worry about how
to parse out the top blast hit(s).

However, if there are any caveats for using this parameter that I am not
aware of please let us know. 

Dennis Prickett
Institute of Animal Health
Compton, nr Newbury
RG2 9FS
United Kingdom


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller
Sent: 26 November 2007 21:07
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] help using SEARCH IO to parse blast results

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs
help with bioinformatics so I can analyze huge amounts of data to finish
my thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want
to have the flexibility to obtain information on say the top5.  I am
pretty sure I have done this wrong, any advice on how to correct my
script to do this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many
hits to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From t.nugent at cs.ucl.ac.uk  Wed Nov 28 08:10:41 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Wed, 28 Nov 2007 13:10:41 +0000
Subject: [Bioperl-l] Helical Wheel module
Message-ID: <474D68D1.3080602@cs.ucl.ac.uk>

Hi everyone,

I've been drawing a lot of helical wheels recently so put all my code 
into a module. I don't think there's anything in bioperl to do this yet 
though there are a few programs written in perl and flash on the web to 
do the same thing. I was thinking this could fit into biographics. Has 
lots of options to adjust labels, colours, ttf fonts and can output to 
png & svg.

Tim

...

Here's the output, converted to jpg from svg:
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg

Module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz

Works like this:

use DrawHelicalWheel;

my $im = DrawHelicalWheel->new(-title=>$title,
                               -sequence=>$sequence,
                               -helices=>\@helices,
                               -ttf_font=>$font);
open(OUTPUT, ">$svg");
binmode OUTPUT;
print OUTPUT $im->svg;
close OUTPUT;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From tristan.lefebure at gmail.com  Wed Nov 28 10:46:11 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:46:11 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281046.11146.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan


From bix at sendu.me.uk  Wed Nov 28 11:19:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Nov 2007 16:19:36 +0000
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <200711281046.11146.tnl7@cornell.edu>
References: <200711281046.11146.tnl7@cornell.edu>
Message-ID: <474D9518.7010201@sendu.me.uk>

Tristan Lefebure wrote:
> Hello!
> 
> I was wondering if there was a function to remove sites/columns of an 
> alignment. Something like: $aln->remove_sites(@sites_to_remove)
> I looked around Bio::SimpleAlign but did not find exactly that. There is 
> remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

You might want to take a second look at the docs. You can supply column 
number ranges to remove_columns(), so it does exactly what you want.


From tnl7 at cornell.edu  Wed Nov 28 10:44:17 2007
From: tnl7 at cornell.edu (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:44:17 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281044.17770.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan


From cjfields at uiuc.edu  Wed Nov 28 08:57:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:57:27 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
Message-ID: <B3E0F9EA-9452-483E-AA17-5174B743B164@uiuc.edu>

I had some code which does this which I committed yesterday to CVS; it  
catches the GI for the query and the hits:

$result->query_gi;
$hit->ncbi_gi;

I am in the midst of fixing additional problems with WU-BLAST parsing  
but you are more than welcome to try it.

chris

On Nov 27, 2007, at 3:32 PM, alison waller wrote:

> Thanks Everyone,
>
> Your edits worked Dave, however after looking at the output I  
> realized that
> I only want information on the top hsp per query returned.  For  
> example some
> of the querys the top hit has two hsps so it returned both.
>
> I tried to further edit it, but after 3 attempts they are all  
> failing, I
> think due to me using the loops wrong.
>
> I also have another problem, I also want to retrieve the gi, this  
> doesn't
> seem to be straight forward as it should.  I found another method
> _get_seq_identifiers, but this looks awkward, isn't there and object  
> for the
> gi?
>
> I've pasted my non-working script below if there are any suggestions  
> on how
> to get it to print out just the first hsp per hit, that would be  
> great.
>
> Thanks,
>
>
> #!/usr/local/bin/perl -w
>
>
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
>
>
> use strict;
> use warnings;
> use Bio::SearchIO;
>
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits>\n";
> if (@ARGV != 2) { die $usage; }
>
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                        # to report for each query
>
>
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
>
> my $report = new Bio::SearchIO(
>    -file   => "$infile",
>    -format => "blast"
> );
>
>
> print OUT
>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tgaps\t
> strand\tHstrand\n";
>
>
> # Go through BLAST reports one by one
> while (my $result = $report->next_result) {
> 	my $i=0;
> 	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
>    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
>
>
>            # Print some tab-delimited data about this hit
>            print OUT $result->query_name,     "\t";
>            print OUT $hit->name,              "\t";
>            print OUT $hit->significance,      "\t";
>            print OUT $hit->bits,              "\t";
>            print OUT $hsp->evalue,            "\t";
>            print OUT $hsp->percent_identity,  "\t";
>            print OUT $hsp->length('total'),   "\t";
>            print OUT $hsp->num_identical,     "\t";
>            print OUT $hsp->gaps,              "\t";
>            print OUT $hsp->strand('query'),   "\t";
>            print OUT $hsp->strand('hit'),     "\n";
>        }
> }
>    if ($i == 0) { print OUT "no hits\n"; }
>
> }
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, November 27, 2007 4:01 PM
> To: Smithies, Russell
> Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>
> The hits/HSPs are generally in the order they appear in the report.
>
> If you are looking for best/worst HSP after parsing you can use the
> $hit->hsp() method:
>
> # best and worst
> my $best = $hit->hsp('best'); # also 'first'
> my $worst = $hit->hsp('worst'); # also last
>
> The SearchIO text BLAST parser also has several options implemented
> for finer control:
>
>     -inclusion_threshold => e-value threshold for inclusion in the
>                             PSI-BLAST score matrix model (blastpgp)
>     -signif      => float or scientific notation number to be used
>                     as a P- or Expect value cutoff
>     -score       => integer or scientific notation number to be used
>                     as a blast score value cutoff
>     -bits        => integer or scientific notation number to be used
>                     as a bit score value cutoff
>     -hit_filter  => reference to a function to be used for
>                     filtering hits based on arbitrary criteria.
>                     All hits of each BLAST report must satisfy
>                     this criteria to be retained.
>                     If a hit fails this test, it is ignored.
>                     This function should take a
>                     Bio::Search::Hit::BlastHit.pm object as its first
>                     argument and return true
>                     if the hit should be retained.
>                     Sample filter function:
>                        -hit_filter => sub { $hit = shift;
>                                             $hit->gaps == 0; },
>                     (Note: -filt_func is synonymous with -hit_filter)
>     -overlap     => integer. The amount of overlap to permit between
>                     adjacent HSPs when tiling HSPs. A reasonable
> value is 2.
>                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.
>
> chris
>
> On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:
>
>> Do the hits need to be sorted first or is this done automagicly?
>> I ask this as I know Megablast doesn't provide sorted output for
>> most of
>> it's formats.
>>
>> Russell
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-
>>> bio.org] On Behalf Of Dave Messina
>>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>>> To: alison waller
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>>
>>> Hi Alison,
>>> As Sendu mentioned, the key bit is adding a condition to the hit  
>>> loop
>> to
>>> limit the number of hits that are printed. I didn't test the below
>>> extensively, but give it a try...
>>>
>>>
>>> Dave
>>>
>>>
>>>
>>> #!/usr/local/bin/perl -w
>>>
>>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>>> # alison waller November 2007
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SearchIO;
>>>
>>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
>> hits>\n";
>>> if (@ARGV != 2) { die $usage; }
>>>
>>> my $infile  = $ARGV[0];
>>> my $outfile = $infile . '.parsed';
>>> my $tophit  = $ARGV[1]; # to specify in the command line how many
>>> hits
>>>                       # to report for each query
>>>
>>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!
>>> \n";
>>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
>> $!\n";
>>>
>>> my $report = new Bio::SearchIO(
>>>   -file   => "$infile",
>>>   -format => "blast"
>>> );
>>>
>>> print OUT
>>>
>> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent
>> \tga
>> ps\t
>>> Qstrand\tHstrand\n";
>>>
>>> # Go through BLAST reports one by one
>>> while ( my $result = $report->next_result ) {
>>>   my $i = 0;
>>>   while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>>       while ( my $hsp = $hit->next_hsp ) {
>>>
>>>           # Print some tab-delimited data about this hit
>>>           print OUT $result->query_name,     "\t";
>>>           print OUT $hit->name,              "\t";
>>>           print OUT $hit->significance,      "\t";
>>>           print OUT $hit->bits,              "\t";
>>>           print OUT $hsp->evalue,            "\t";
>>>           print OUT $hsp->percent_identity,  "\t";
>>>           print OUT $hsp->length('total'),   "\t";
>>>           print OUT $hsp->num_identical,     "\t";
>>>           print OUT $hsp->gaps,              "\t";
>>>           print OUT $hsp->strand('query'),   "\t";
>>>           print OUT $hsp->strand('hit'),     "\n";
>>>       }
>>>   }
>>>
>>>   if ($i == 0) { print OUT "no hits\n"; }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =
>> = 
>> =====================================================================
>> Attention: The information contained in this message and/or
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or
>> privileged
>> material. Any review, retransmission, dissemination or other use of,
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by
>> AgResearch
>> Limited. If you have received this message in error, please notify  
>> the
>> sender immediately.
>> =
>> = 
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 08:54:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:54:39 -0600
Subject: [Bioperl-l] Helical Wheel module
In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk>
References: <474D68D1.3080602@cs.ucl.ac.uk>
Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu>

Looks good!  We recently added in your transmembrane module, so we  
could definitely add this in.

chris

On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've been drawing a lot of helical wheels recently so put all my code
> into a module. I don't think there's anything in bioperl to do this  
> yet
> though there are a few programs written in perl and flash on the web  
> to
> do the same thing. I was thinking this could fit into biographics. Has
> lots of options to adjust labels, colours, ttf fonts and can output to
> png & svg.
>
> Tim
>
> ...
>
> Here's the output, converted to jpg from svg:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg
>
> Module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz
>
> Works like this:
>
> use DrawHelicalWheel;
>
> my $im = DrawHelicalWheel->new(-title=>$title,
>                               -sequence=>$sequence,
>                               -helices=>\@helices,
>                               -ttf_font=>$font);
> open(OUTPUT, ">$svg");
> binmode OUTPUT;
> print OUTPUT $im->svg;
> close OUTPUT;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> http://www.cs.ucl.ac.uk/staff/T.Nugent
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 13:43:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 12:43:58 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
	<8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu>


On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote:

> Chris,
>
> Cood catch. You're on track here with one exception: WU blast and NCBI
> blast behave differently in what they report in the hit table: WU
> blast puts the raw score in the table not the bit score as NCBI blast
> does (see example below for reference). WU blast also swaps their
> location in the HSP header relative to how NCBI reports it. It would
> be good to verify that the blast parser isn't befuddled by this. A
> quick look at SearchIO::blast and it appears that data from the hit
> table is always getting stored as score, not bits for WU blast. Not
> sure if the HSP section data are parsed correctly. I'd recommend
> looking into these things when you do your fixes.

What I have now after commits is:

GenericHit - use the best HSP when possible for bits, score/raw_score,  
significance.  When there is no HSP, construct a minimal Hit object  
using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST  
maps to bits(), both map evalue/pvalue to significance).  HSP mapping  
seems to be correct.

One issue that has popped up is GenericHit::significance  
preferentially uses the best HSP.  However, GenericHSP::significance  
uses evalues preferentially over pvalues; both Expect and P appear to  
be parsed for WU-BLAST HSPs now (so the evalue is reported); this  
apparently wasn't always the case if I read the GenericHit docs  
correctly.  As NCBI BLAST doesn't report pvalues we could change that  
so it preferentially returns a pvalue if present, falling back to an  
evalue.   This would match what is found hit table more closely and  
resembles what is documented for the method (for significance(), WU- 
BLAST gets pvalues, NCBI BLAST gets evalues).

> So in the end, WU blast HSPs that are built from the hit table should
> report a value for raw_score and punt on bits, but NCBI HSPs so
> constructed should do the opposite. The downside to this arrangement
> is that code that works for NCBI blast hits will need modification to
> work for WU blast hits, but that is just the nature of the data. It
> shouldn't be an issue for the majority of users that stick with one
> flavor of blast and don't switch back and forth, or for users that get
> their HSP scoring data from HSP sections rather than relying on the
> hit table.

In general I get my data from the HSPs, so this shouldn't be a  
significant issue (bad pun).  I did find that changing it so that Hit  
objects use HSP data pointed out issues with test data; hit table raw/ 
bit scores were rounded from the HSP score data or vice versa since  
all data came from the hit table, so tests flunked.

I think changing the way minimal hit objects report data (particularly  
for NCBI BLAST) will lead to a lot of confusion unless we clarify  
warnings when one or the other is missing (as you also indicated).   
I'm working on that now.

> Ideally, the HSP object would know whether it was NCBI or WU-based and
> issue an informative warning when attempting to access data it doesn't
> have. One solution might be for the parser to put a 'WU-' in front of
> the algorithm name for WU blast reports, so it would then be available
> for the contained hit/hsp objects. This could break anything dependent
> on algorithm name, so it would need some testing.
>
> Steve


I can probably work around as noted above that unless you think it's  
warranted to add a 'WU' designation (the version info in the Result  
object has 'WashU' attached, so one could feasibly use that for  
distinguishing the two report types).

Anyway, I'm committing my first batch of fixes, the significance test  
will fail for at least a day until I can look into it more.

chris


From tristan.lefebure at gmail.com  Wed Nov 28 14:03:44 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 14:03:44 -0500
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <474D9518.7010201@sendu.me.uk>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
Message-ID: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>

Hoops. I was reading the BioPerl 1.4 documentation. Actually,
http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to
http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be
the 1.4documentation...

Thank you, it works great.


On Nov 28, 2007 11:19 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Tristan Lefebure wrote:
> > Hello!
> >
> > I was wondering if there was a function to remove sites/columns of an
> > alignment. Something like: $aln->remove_sites(@sites_to_remove)
> > I looked around Bio::SimpleAlign but did not find exactly that. There is
> > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch'
> criteria.
>
> You might want to take a second look at the docs. You can supply column
> number ranges to remove_columns(), so it does exactly what you want.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Russell.Smithies at agresearch.co.nz  Wed Nov 28 16:57:14 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 29 Nov 2007 10:57:14 +1300
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>

Has anyone got a good example of parsing ASN.1 with
Bio::SeqIO::entrezgene?
I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
and my Perl isn't that good  :-(

Russell
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From stefan.kirov at bms.com  Wed Nov 28 17:16:18 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time)
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
Message-ID: <Pine.WNT.4.64.0711281708590.21768@A103728.hpw.stf.bms.com>

Here is an example for GO, will send the one for KEGG later:
my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene',
 	-service_record=>'yes');#, -locuslink=>'convert');
while (my $seq=$eio->next_seq) {
 	my $gid=$seq->accession_number;
 	foreach my $ot ($ann->get_Annotations('OntologyTerm')) {
     		next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers
     		my $evid=$ot->comment;
     		$evid=~s/evidence: //i;
     		my @ref=$ot->term->get_references; #Really there should be just one?
     		my $id=$ot->identifier;
     		my $fid='GO:' . sprintf("%07u",$id);
     		print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n";
 	}
}
Please note there is a bug in the parser that makes it suck a lot of RAM. 
I am fixing this and will commit probably by the week's end- you will have 
to update at that point. If you work with few records this should not 
matter.
Stefan


On Thu, 29 Nov 2007, Smithies, Russell wrote:

> Has anyone got a good example of parsing ASN.1 with
> Bio::SeqIO::entrezgene?
> I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
> and my Perl isn't that good  :-(
>
> Russell
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Nov 29 18:06:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 17:06:42 -0600
Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml
Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu>

For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST  
parsing in Bio::SearchIO::blastxml (though it appears to be pretty  
stable!).  Since there isn't any easy way to distinguish between  
normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to  
BLAST, you have to indicate how the report is to be parsed by passing  
in a '-blasttype' parameter:

$searchio = Bio::SearchIO->new('-tempfile' => 1,
        '-format' => 'blastxml',
        '-file'   => 'psiblast.xml',
        '-blasttype' => 'psiblast');

Otherwise it chunks the individual iterations out as separate BLAST  
reports and parses them as separate reports.

Tests have also been added to SearchIO.t.  I will update the HOWTO and  
blastxml docs soon.

chris


From cjfields at uiuc.edu  Thu Nov 29 21:41:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 20:41:49 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
In-Reply-To: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
References: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu>

It's probably safer to create a new instance each time but it really  
shouldn't be necessary for a wrapper module; this sounds like a bug to  
me.  Could you file it in Bugzilla?

On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote:

> Hello,
>
> I was playing around with Primer3, and I hit a problem. Not sure if  
> it's a
> bug or if I was doing something I wasn't supposed to, but if it's the
> latter, I thought it might save someone else half an hour of banging  
> their
> head of a keyboard if I mentioned it:
>
> What I was doing was roughly:
>
> # create a primer3 obj
> my $p3 = ...Primer3->new();
>
> # loop through some sequences generating primers for
> # each of them using the same primer3 obj
> while (@some_bio_seqs){
>  my $res = $p3->run;
>  ...
> }
>
> This worked fine for a while, but broke when I tried to set  
> PRIMER_MIN_GC,
> at which point it worked for a few sequences then I got a "can't place
> primer on sequence"  error.
>
> After a bit of faffing about, I think the problem occurs when no  
> primers
> are found. In which case $p3 still has the primers from the previous  
> run,
> which don't come from the current sequence, so can't be placed on  
> it. I
> tried calling $p3->cleanup in the loop, but that didn't work either.
> Creating a new $p3 every time works fine.
>
> Are you supposed to create a new Primer3 object for every sequence?
> (Apologies if I missed the relevant bit of the docs).
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From paulhengen at coh.org  Wed Nov 28 20:20:42 2007
From: paulhengen at coh.org (Paul N. Hengen)
Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST)
Subject: [Bioperl-l]  Collecting genomic DNA sequences using Entrez IDs
Message-ID: <14017289.post@talk.nabble.com>


Hi.

I have a number of gene IDs from Entrez and I want to find the
start and end locations in the human genome. This seemed simple
enough, so I started working through some of the examples for
using the EntrezGene module at www.bioperl.org  Of course this
did not work because the core installation does not include this
module. So, I think I have two choices (1) install the module (how?),
or (2) find an easier way to get the locations in the human genome.
I want to use the locations to grab sequences out of the genome.
Can anyone offer advice on this? Thanks.

-Paul.

--
Paul N. Hengen, Ph.D.
Hematopoietic Stem Cell and Leukemia Research
City of Hope National Medical Center
1500 E. Duarte Road, Duarte, CA 91010 USA
mailto:paulhengen at coh.org

-- 
View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Viktor.Martyanov at Dartmouth.EDU  Thu Nov 29 15:20:19 2007
From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov)
Date: 29 Nov 2007 15:20:19 -0500
Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases
Message-ID: <193573097@newdonner.Dartmouth.EDU>

A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 445 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071129/a6380324/attachment-0002.bin>

From alison.waller at utoronto.ca  Thu Nov 29 11:20:59 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Thu, 29 Nov 2007 11:20:59 -0500
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from
	CVS)
Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL>

Hi all,

 
I would like to install the CVS version of bioperl  as I know of some code
changes that will be useful to me.  However, I am having problems installing
it.  

I am trying to install bioperl in my home directly on a linux cluster.  

 
I used

 
> cd bioperl-live

*       perl Build.PL -install /home/awaller

 
However after the build command I got a lot of errors.  Do I have to also
have perl installed in my home directory??  There is perl installed on the
cluster in /usr/bin.  Do I need to point to this or does Build.PL
automatically look there?  I noticed a few errors about not having
permission and a few about not being able to connect. I've copied a portion
of the messages after my Build.pl command.  

 
Any help would be appreciated,

 
alison 

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/02packages.details.txt.gz

Trying to get away with old file:

3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
/root/.cpan/sources/modules/02packages.details.txt.gz

Going to read /root/.cpan/sources/modules/02packages.details.txt.gz

  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT

 
  There's a new CPAN.pm version (v1.9204) available!

  [Current version is v1.7601]

  You might want to try

    install Bundle::CPAN

    reload cpan

  without quitting the current session. It should be a seamless upgrade

  while we are running...

 
Warning: You are not allowed to write into directory
"/root/.cpan/sources/modules".

    I'll continue, but if you encounter problems, they may be due

    to insufficient permissions.

Fetching with LWP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[Cannot write to
'/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied]

Fetching with Net::FTP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from ftp.nrc.ca

Fetching with LWP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[FTP close response: 500 Network seems to
have barfed - Let's all phone our ISP and go postal!

Unknown command.

]

Fetching with Net::FTP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca

Fetching with LWP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'cpan.mirror.cygnal.ca']

Fetching with Net::FTP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Fetching with LWP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'mirror.isurf.ca']

Fetching with Net::FTP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

 
Trying with "/usr/bin/lynx -source" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL
ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: cpan.mirror.cygnal.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/03modlist.data.gz

Trying to get away with old file:

3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
/root/.cpan/sources/modules/03modlist.data.gz

Going to read /root/.cpan/sources/modules/03modlist.data.gz

Going to write /root/.cpan/Metadata

can't create /root/.cpan/Metadata: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 3432

Running install for module Test::Harness

Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz

mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 2342

******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From cjfields at uiuc.edu  Thu Nov 29 23:53:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:53:09 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <D344C28E-BC9B-4226-AD15-149EA001FCAB@uiuc.edu>

Alison,

There are directions on how to do this here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA

(TinyURL link)
http://tinyurl.com/3263dd

Note the additional configuration for CPAN in that section; you'll  
need to set up CPAN so it installs everything locally.

chris

On Nov 29, 2007, at 10:20 AM, alison waller wrote:

> Hi all,
>
>
>
> I would like to install the CVS version of bioperl  as I know of  
> some code
> changes that will be useful to me.  However, I am having problems  
> installing
> it.
>
> I am trying to install bioperl in my home directly on a linux cluster.
>
>
>
> I used
>
>
>
>> cd bioperl-live
>
> *       perl Build.PL -install /home/awaller
>
>
>
> However after the build command I got a lot of errors.  Do I have to  
> also
> have perl installed in my home directory??  There is perl installed  
> on the
> cluster in /usr/bin.  Do I need to point to this or does Build.PL
> automatically look there?  I noticed a few errors about not having
> permission and a few about not being able to connect. I've copied a  
> portion
> of the messages after my Build.pl command.
>
>
>
> Any help would be appreciated,
>
>
>
> alison
>
>
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/02packages.details.txt.gz
>
> Trying to get away with old file:
>
> 3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
> /root/.cpan/sources/modules/02packages.details.txt.gz
>
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
>
>  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT
>
>
>
>  There's a new CPAN.pm version (v1.9204) available!
>
>  [Current version is v1.7601]
>
>  You might want to try
>
>    install Bundle::CPAN
>
>    reload cpan
>
>  without quitting the current session. It should be a seamless upgrade
>
>  while we are running...
>
>
>
> Warning: You are not allowed to write into directory
> "/root/.cpan/sources/modules".
>
>    I'll continue, but if you encounter problems, they may be due
>
>    to insufficient permissions.
>
> Fetching with LWP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[Cannot write to
> '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission  
> denied]
>
> Fetching with Net::FTP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from ftp.nrc.ca
>
> Fetching with LWP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[FTP close response: 500 Network  
> seems to
> have barfed - Let's all phone our ISP and go postal!
>
> Unknown command.
>
> ]
>
> Fetching with Net::FTP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca
>
> Fetching with LWP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'cpan.mirror.cygnal.ca']
>
> Fetching with Net::FTP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Fetching with LWP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'mirror.isurf.ca']
>
> Fetching with Net::FTP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: cpan.mirror.cygnal.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz 
> .
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/03modlist.data.gz
>
> Trying to get away with old file:
>
> 3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
> /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to write /root/.cpan/Metadata
>
> can't create /root/.cpan/Metadata: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 3432
>
> Running install for module Test::Harness
>
> Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz
>
> mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 2342
>
> ******************************************
> Alison S. Waller  M.A.Sc.
> Doctoral Candidate
> awaller at chem-eng.utoronto.ca
> 416-978-4222 (lab)
> Department of Chemical Engineering
> Wallberg Building
> 200 College st.
> Toronto, ON
> M5S 3E5
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 29 23:57:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:57:36 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>

Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
core (I think they were added prior to the 1.5.1 release, but I'm not  
positive).  If possible you should try installing bioperl 1.5.2 or the  
latest code from CVS.

For directions on installing Bioperl for most OS's go here:

http://www.bioperl.org/wiki/Installing_BioPerl

 From CVS:

http://www.bioperl.org/wiki/Using_CVS

chris

On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:

>
> Hi.
>
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
>
> -Paul.
>
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org
>
> -- 
> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 30 03:45:57 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Nov 2007 08:45:57 +0000
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
 from	CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <474FCDC5.5020100@sendu.me.uk>

alison waller wrote:
> I would like to install the CVS version of bioperl  as I know of some code
> changes that will be useful to me.  However, I am having problems installing
> it.  
> 
> I am trying to install bioperl in my home directly on a linux cluster.  
[...]
> Please check, if the URLs I found in your configuration file
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are
> valid. The urllist can be edited. E.g. with 'o conf urllist push
> ftp://myurl/'

Either these urls are invalid as suggested (try setting the urllist to 
nothing), or your linux cluster doesn't have internet access. You can't 
do a 'proper' install of BioPerl and its dependencies without internet 
access.

However, for most purposes simply downloading the BioPerl modules (ie. 
from a different machine with internet access) and pointing your 
PERL5LIB to their location is sufficient. You can download CVS modules 
from the BioPerl website individually, or as a tarball or everything.


From MEC at stowers-institute.org  Fri Nov 30 09:12:09 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 30 Nov 2007 08:12:09 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>

How many, how often?

Use ensembl biomart!

First time interactively.

Then if you to pipeline it, take the perl code it generates for you and
run it - of course you'll have to install the Ensembl Perl API....


Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Paul N. Hengen
> Sent: Wednesday, November 28, 2007 7:21 PM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
> 
> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find 
> the start and end locations in the human genome. This seemed 
> simple enough, so I started working through some of the 
> examples for using the EntrezGene module at www.bioperl.org  
> Of course this did not work because the core installation 
> does not include this module. So, I think I have two choices 
> (1) install the module (how?), or (2) find an easier way to 
> get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research City of Hope 
> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 
> USA mailto:paulhengen at coh.org
> 
> --
> View this message in context: 
> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
> ntrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bosborne11 at verizon.net  Fri Nov 30 09:38:58 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 30 Nov 2007 09:38:58 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
Message-ID: <C3758AB2.10609%bosborne11@verizon.net>

Paul,

Have you taken a look at this page?

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

There's code there that looks similar to what you're proposing.


Brian O.


On 11/28/07 8:20 PM, "Paul N. Hengen" <paulhengen at coh.org> wrote:

> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org


From cjfields at uiuc.edu  Fri Nov 30 10:47:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 09:47:32 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47502C75.60809@bms.com>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>

My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
Mingyi Liu if he would like to include this parser with BioPerl (since  
it requires it, makes sense to me, and it avoids the circular  
dependency that has plagued these modules).

chris

On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:

> Chris Fields wrote:
> Chris,
> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
> low-level parser and is not part of bioperl. There is a circular
> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
> Paul, you can get it from CPAN and this should make
> Bio::SeqIO::entrezgene functional for you.
> Stefan
>
>
>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>> core (I think they were added prior to the 1.5.1 release, but I'm not
>> positive).  If possible you should try installing bioperl 1.5.2 or  
>> the
>> latest code from CVS.
>>
>> For directions on installing Bioperl for most OS's go here:
>>
>> http://www.bioperl.org/wiki/Installing_BioPerl
>>
>> From CVS:
>>
>> http://www.bioperl.org/wiki/Using_CVS
>>
>> chris
>>
>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>
>>
>>> Hi.
>>>
>>> I have a number of gene IDs from Entrez and I want to find the
>>> start and end locations in the human genome. This seemed simple
>>> enough, so I started working through some of the examples for
>>> using the EntrezGene module at www.bioperl.org  Of course this
>>> did not work because the core installation does not include this
>>> module. So, I think I have two choices (1) install the module  
>>> (how?),
>>> or (2) find an easier way to get the locations in the human genome.
>>> I want to use the locations to grab sequences out of the genome.
>>> Can anyone offer advice on this? Thanks.
>>>
>>> -Paul.
>>>
>>> --
>>> Paul N. Hengen, Ph.D.
>>> Hematopoietic Stem Cell and Leukemia Research
>>> City of Hope National Medical Center
>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>> mailto:paulhengen at coh.org
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Fri Nov 30 11:12:22 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 11:12:22 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
Message-ID: <47503666.8090004@bms.com>

Chris Fields wrote:
> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
> Mingyi Liu if he would like to include this parser with BioPerl (since  
> it requires it, makes sense to me, and it avoids the circular  
> dependency that has plagued these modules).
>   
Yes, I think this would be a good step.
Stefan
> chris
>
> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>
>   
>> Chris Fields wrote:
>> Chris,
>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>> low-level parser and is not part of bioperl. There is a circular
>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>> Paul, you can get it from CPAN and this should make
>> Bio::SeqIO::entrezgene functional for you.
>> Stefan
>>
>>
>>     
>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>> the
>>> latest code from CVS.
>>>
>>> For directions on installing Bioperl for most OS's go here:
>>>
>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>
>>> From CVS:
>>>
>>> http://www.bioperl.org/wiki/Using_CVS
>>>
>>> chris
>>>
>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>
>>>
>>>       
>>>> Hi.
>>>>
>>>> I have a number of gene IDs from Entrez and I want to find the
>>>> start and end locations in the human genome. This seemed simple
>>>> enough, so I started working through some of the examples for
>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>> did not work because the core installation does not include this
>>>> module. So, I think I have two choices (1) install the module  
>>>> (how?),
>>>> or (2) find an easier way to get the locations in the human genome.
>>>> I want to use the locations to grab sequences out of the genome.
>>>> Can anyone offer advice on this? Thanks.
>>>>
>>>> -Paul.
>>>>
>>>> --
>>>> Paul N. Hengen, Ph.D.
>>>> Hematopoietic Stem Cell and Leukemia Research
>>>> City of Hope National Medical Center
>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>> mailto:paulhengen at coh.org
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From stefan.kirov at bms.com  Fri Nov 30 10:29:57 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 10:29:57 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
Message-ID: <47502C75.60809@bms.com>

Chris Fields wrote:
Chris,
Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
low-level parser and is not part of bioperl. There is a circular
dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
Paul, you can get it from CPAN and this should make
Bio::SeqIO::entrezgene functional for you.
Stefan


> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
> core (I think they were added prior to the 1.5.1 release, but I'm not  
> positive).  If possible you should try installing bioperl 1.5.2 or the  
> latest code from CVS.
>
> For directions on installing Bioperl for most OS's go here:
>
> http://www.bioperl.org/wiki/Installing_BioPerl
>
>  From CVS:
>
> http://www.bioperl.org/wiki/Using_CVS
>
> chris
>
> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>
>   
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find the
>> start and end locations in the human genome. This seemed simple
>> enough, so I started working through some of the examples for
>> using the EntrezGene module at www.bioperl.org  Of course this
>> did not work because the core installation does not include this
>> module. So, I think I have two choices (1) install the module (how?),
>> or (2) find an easier way to get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research
>> City of Hope National Medical Center
>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>> mailto:paulhengen at coh.org
>>
>> -- 
>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From arareko at campus.iztacala.unam.mx  Fri Nov 30 12:01:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 30 Nov 2007 11:01:29 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47503666.8090004@bms.com>
References: <14017289.post@talk.nabble.com>	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>	<47502C75.60809@bms.com>	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
	<47503666.8090004@bms.com>
Message-ID: <475041E9.8050909@campus.iztacala.unam.mx>

I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the 
past, he mentioned he doesn't track the list closely).

Mauricio.

Stefan Kirov wrote:
> Chris Fields wrote:
>> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
>> Mingyi Liu if he would like to include this parser with BioPerl (since  
>> it requires it, makes sense to me, and it avoids the circular  
>> dependency that has plagued these modules).
>>   
> Yes, I think this would be a good step.
> Stefan
>> chris
>>
>> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>>
>>   
>>> Chris Fields wrote:
>>> Chris,
>>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>>> low-level parser and is not part of bioperl. There is a circular
>>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>>> Paul, you can get it from CPAN and this should make
>>> Bio::SeqIO::entrezgene functional for you.
>>> Stefan
>>>
>>>
>>>     
>>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>>> the
>>>> latest code from CVS.
>>>>
>>>> For directions on installing Bioperl for most OS's go here:
>>>>
>>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>>
>>>> From CVS:
>>>>
>>>> http://www.bioperl.org/wiki/Using_CVS
>>>>
>>>> chris
>>>>
>>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>>
>>>>
>>>>       
>>>>> Hi.
>>>>>
>>>>> I have a number of gene IDs from Entrez and I want to find the
>>>>> start and end locations in the human genome. This seemed simple
>>>>> enough, so I started working through some of the examples for
>>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>>> did not work because the core installation does not include this
>>>>> module. So, I think I have two choices (1) install the module  
>>>>> (how?),
>>>>> or (2) find an easier way to get the locations in the human genome.
>>>>> I want to use the locations to grab sequences out of the genome.
>>>>> Can anyone offer advice on this? Thanks.
>>>>>
>>>>> -Paul.
>>>>>
>>>>> --
>>>>> Paul N. Hengen, Ph.D.
>>>>> Hematopoietic Stem Cell and Leukemia Research
>>>>> City of Hope National Medical Center
>>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>>> mailto:paulhengen at coh.org
>>>>>
>>>>> -- 
>>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>         
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Fri Nov 30 15:21:13 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Nov 2007 12:21:13 -0800
Subject: [Bioperl-l] Trying to find multiple homologs in multiple
	databases
In-Reply-To: <193573097@newdonner.Dartmouth.EDU>
References: <193573097@newdonner.Dartmouth.EDU>
Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org>

Viktor -
Bio::SearchIO helps you parse BLAST reports, but don't underestimate  
the power of going as low-tech as possible and outputting scores with  
the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular  
format that is parseable with the 'split' function in Perl.

See the wiki http://bioperl.org/wiki for HOWTOs and examples of using  
the parsers.

You might also consider already-written tools like OrthoMCL,  
InParanoid, and other that help you define relationships like   
orthologs and paralogs among species. There also exist a few  
published web resources that have pre-computed homologs for you,  
might take a look around first unless the point of the project is to  
learn how to run these kinds of searches.

For general Perl help consider Perlmonks.org and some of  the  
introductory books that are available.
-jason
--
Jason Stajich
jason at bioperl.org

On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote:

> Hello,
>
> My name is Viktor Martyanov and I am a Ph.D. student in biology at  
> Dartmouth.
>
> I need to be able to use a set of genes or FASTA sequences from S.  
> cerevisiae and retrieve a set of corresponding homologs from other  
> fungal species via BLASTP searches.
>
> I would like to find out if there are Perl scripts that approach  
> this problem. By the way, is there a Perl community or forum where  
> I could post this question?
>
> Thanks very much.  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri Nov 30 17:03:23 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 30 Nov 2007 15:03:23 -0700
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
References: <14017289.post@talk.nabble.com>
	<CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
Message-ID: <B839F4C3-C225-40B2-B7B0-C2940A35B964@genetics.utah.edu>

Paul,

One other alternative is to use the UCSC table browser (http:// 
genome.ucsc.edu/cgi-bin/hgTables?command=start).  Select your  
organism, upload your ID list.  Select you output options.  You can  
download the coordinates or the fasta directly.  You have options for  
including or excluding various parts of the gene, and upstream/ 
downstream sequences.  This is similar the solution that Malcom  
suggested except the Ensembl option can be run repeatedly as perl  
code as he pointed out.  UCSC allows you to do remote connections to  
their MySQL server so you could set up a repeated task and more  
complex queries that way with the UCSC method.

Barry

On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote:

> How many, how often?
>
> Use ensembl biomart!
>
> First time interactively.
>
> Then if you to pipeline it, take the perl code it generates for you  
> and
> run it - of course you'll have to install the Ensembl Perl API....
>
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Paul N. Hengen
>> Sent: Wednesday, November 28, 2007 7:21 PM
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez  
>> IDs
>>
>>
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find
>> the start and end locations in the human genome. This seemed
>> simple enough, so I started working through some of the
>> examples for using the EntrezGene module at www.bioperl.org
>> Of course this did not work because the core installation
>> does not include this module. So, I think I have two choices
>> (1) install the module (how?), or (2) find an easier way to
>> get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research City of Hope
>> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010
>> USA mailto:paulhengen at coh.org
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
>> ntrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Nov 30 23:37:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 22:37:50 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from	CVS)
In-Reply-To: <000901c833bf$33d53500$0a02a8c0@AWALL>
References: <000901c833bf$33d53500$0a02a8c0@AWALL>
Message-ID: <75FF7E93-1633-4D43-9BC0-8BE2A6A7711D@uiuc.edu>

Make sure to keep this on the list.

ncbi_gi() is only in bioperl-live (CVS); my guess is you either  
somehow got 1.5.2 instead or the bioperl-live version is not found in  
your path.  It's very likely the latter, as perl's looking for  
whatever else is present (which appears to be an older version of  
bioperl). That should give you a hint that the problem may be with  
your lib path.  Try changing the 'Use lib '/home/awaller/bioperl-live/ 
Bio'' to:

use lib '/home/awaller/bioperl-live';

chris

On Nov 30, 2007, at 8:09 PM, alison waller wrote:

> Okay so Now I'm really confused.
> I edited > #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live/Bio.
> I ran the script below with the *special hit->ncbi from Chris.  It  
> worked,
> it was great, I got the gi! No errors, no bugs that I saw in  
> checking the
> output.  Then I went back in, edited the script to retrieve further  
> info
> (specifically the strand).  Saved it, now when I try to run it I get  
> the
> same error message that I was previously getting.
>
> barrett ~ $ perl blast_parse_awcf.pl OldMoBlastxGiTest.txt 1
> Can't locate object method "ncbi_gi" via package
> "Bio::Search::Hit::BlastHit" at blast_parse_awcf.pl line 50, <GEN1>  
> line
> 189.
>
> Thanks soo much,
>
>
> #!usr/bin/perl
>
> use strict;
> use warnings;
> use lib "/home/awaller/bioperl-live/Bio";
> use Bio::Perl;
> use Bio::SearchIO;
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits per
> query> \n"; if (@ARGV != 2) { die $usage; }
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                      # to report for each query
>
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
> my $report = Bio::SearchIO->new(
>  -file   => $infile,
>  -format => "blast"
> );
>
> print OUT join("\t",qw(
>              Query
>              HitDesc
>              HitAccess
>              HitGi
> 		HitBits
>              Evalue
>              %id
>              AlignLen
>              NumIdent
>              NumPos
>              gaps
>              Qframe
>              Qstrand
>              Hframe
> 		Hstrand))."\n";
>
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>  my $ct = 0;
>  my @tophits = grep {$ct++ < $tophit } $result->hits;
>  if (scalar(@tophits) == 0) {
>     print OUT "no hits\n";
>  }
>  for my $hit (@tophits) {
>     my $tophsp=$hit->hsp('best');
>     # Print some tab-delimited data about this hit
>     print OUT join("\t",
>                    $result->query_name,
>                    $hit->description,
>                    $hit->accession,
>                    $hit->ncbi_gi,
>                    $hit->bits,
>                    $tophsp->evalue,
>                    $tophsp->percent_identity,
>                    $tophsp->length('total'),
>                    $tophsp->num_identical,
>                    $tophsp->num_conserved,
>                    $tophsp->gaps,
>                    $tophsp->query->frame,
> 		      $tophsp->strand('query'),	
>                    $tophsp->hit->frame,
> 		      $tophsp->strand('hit'),	
>                   )."\n";
>  }
> }
>
>
>
>
> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Friday, November 30, 2007 6:24 PM
> To: alison waller
> Subject: Re: [Bioperl-l] Problems installing bioperl (bioperl-live  
> tarball
> from CVS)
>
> alison waller wrote:
>> Thank you Sendu,
>>
>> So I'm trying the second option.  I have downloaded the bioperl-live
> tarball
>> from the CVS on my windows laptop, and then moved it to my home  
>> directory
> in
>> the linux cluster where I unzipped and tared it.  So I now have a
> directory
>> /home/awaller/bioperl-live.
>>
>> I edited my .bashrc file as below:
>> Export PERL5LIB='/home/awaller/bioperl-live'
>>
>> I also edited a sample script to include:
>> #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live'
>
> Does this directory contain a 'Bio' directory with all the BioPerl
> modules inside it?
>
>
>> But it still isn't working.
>> At the prompt I typed$ perl script.pl
>> It gave me the warning - can't locate object method ncbi_gi which  
>> is why
> I'm
>> trying to download the CVS version as Chris Fields added code to  
>> make the
>> ncbi-gi object.
>
> You'll have to give me the complete, unedited error message and  
> ideally
> the script itself before I can help you further.
>
>
>> Don't I have to do something similar to what the Build.PL file does?
>
> Probably not. It doesn't matter where your perl executable is, btw, as
> long as the system knows how to run perl, which it obviously does.
> <OldMoBlastxGiTest.txt.parsed><OldMoBlastxGiTest.txt>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From barry.moore at genetics.utah.edu  Thu Nov  1 00:03:01 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 31 Oct 2007 22:03:01 -0600
Subject: [Bioperl-l] BLAST output parsing
In-Reply-To: <a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
References: <13519112.post@talk.nabble.com>
	<a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu>

Swapna-

If you are using NCBI fasta files you can use files from NCBIs gene  
database to map your gene IDs to names and organisms.  Look in  
particular at the files gene2accession, gene2refseq, and gene_info.   
For example, if you had RefSeq protein IDs like NP_123456, you could  
use gene2refseq to map those RefSeq accessions to gene IDs and then  
gene_info to map the gene IDs to organisms and gene name.

B

On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote:

> Swapna,
>
>> I am new to bioperl.  I did BLAST search of ~4000 genes and I need  
>> to parse
>> it.  I did use -m 9 option to get a tabular information of the  
>> blast data.
>> But it does not include the gene names or the names of the  
>> organisms of each
>> hit.  Are there any parsers that can do this job ??
>
> The -m 9 tabular output does not include gene descriptions and
> organisms. It only includes the "gene id" that was present immediately
> after the ">" sign in the FASTA file that was used to create the BLAST
> database you specified with the -d option when you ran BLAST.
>
> Hence, no parser will help you. You either have to re-do the BLAST
> with a different -m value that includes the information you desire, or
> write code to convert your gene IDs into what you want.
>
> --
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 05:45:43 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 10:45:43 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on
	windows
Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de>

Dear all,

I have emboss installed on a windows machine. (Embosswin). I can run
this from the dos command line and the path is present. However, when I 
try to call
an emboss application from bioperl I get a "Application not found error"


  my $f = Bio::Factory::EMBOSS->new();
  # get an EMBOSS application  object from the factory
  my $fuzznuc = $f->program('fuzznuc');
    $fuzznuc->run(
                  { -sequence  => $infile,
                        -pattern   => $motif,
                       -outfile   => $outfile                       
              });
 gives the following error

-------------------- WARNING ---------------------
MSG: Application [fuzznuc] is not available!
---------------------------------------------------
Can't call method "run" on an undefined value at searchPatterns.pl line 
102.

Can somebody help me fix this ?

best regards
Rohit

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 10:22:14 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:22:14 -0400
Subject: [Bioperl-l] PAML/Codeml parsing
Message-ID: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>

PAML4 breaks our PAML parser right now because the order of things in  
the result file has changed.  Now sequences precede the information  
about the version or the program run.  This means that $result- 
 >get_seqs() fails because we don't parse the sequences.

We'll see what we can do, but as usual with supporting 3rd party  
programs it is brittle when file formats change.  Th

-jason

--
Jason Stajich
jason at bioperl.org


From jason at bioperl.org  Thu Nov  1 10:32:06 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:32:06 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>

Presumably the PATH is not getting set properly - you should play  
around printing the $ENV{PATH} variable in a perl script to see if  
actually contains the directory where the emboss programs are  
installed.  Bioperl can only guess so much as to where to find an  
application.  It is also possible that we aren't creating the proper  
path to the executable - you can print the executable path with
print $fuzznuc->executable
I believe unless it is throwing an error at the program() line.

It looks like the code in the Factory object is a little fragile  
assuming that the programs HAVE to be in your $PATH.  I don't know if  
windows+perl is special in any way that it run things so I can't  
really tell if there is specific things you have to do here. You may  
have to run this through cygwin in case PATH and such are just not  
available properly to windowsPerl.

-jason
On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:

> Dear all,
>
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. However,  
> when I
> try to call
> an emboss application from bioperl I get a "Application not found  
> error"
>
>
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile
>               });
>  gives the following error
>
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at searchPatterns.pl  
> line
> 102.
>
> Can somebody help me fix this ?
>
> best regards
> Rohit
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Thu Nov  1 10:54:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 09:54:09 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu>

This worked for me previously when I tested with WinXP on my old  
machine using EMBOSS v5:

ftp://emboss.open-bio.org/pub/EMBOSS/windows

I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably  
better to use the latest EMBOSS version anyway so I suggest trying  
the version in the above link.  I'll test it again today and let you  
know what I find.

chris

On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote:

> Presumably the PATH is not getting set properly - you should play
> around printing the $ENV{PATH} variable in a perl script to see if
> actually contains the directory where the emboss programs are
> installed.  Bioperl can only guess so much as to where to find an
> application.  It is also possible that we aren't creating the proper
> path to the executable - you can print the executable path with
> print $fuzznuc->executable
> I believe unless it is throwing an error at the program() line.
>
> It looks like the code in the Factory object is a little fragile
> assuming that the programs HAVE to be in your $PATH.  I don't know if
> windows+perl is special in any way that it run things so I can't
> really tell if there is specific things you have to do here. You may
> have to run this through cygwin in case PATH and such are just not
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However,
>> when I
>> try to call
>> an emboss application from bioperl I get a "Application not found
>> error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl
>> line
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Thu Nov  1 11:31:40 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 11:31:40 -0400
Subject: [Bioperl-l] PAML3 vs 4
Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org>

Small tweaks were needed to parse PAML4 results.

Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly  
now on both PAML 3 and 4.
You'll need to get the latest code from CVS in order to see the  
changes to Bio/Tools/Phylo/PAML.pm

I've added tests for PAML4 in the parser and the run code.

If you have scripts that use codeml please give it a try.  I have not  
attempted to play with BASEML or AAML results at this point so if you  
also have codes that use those programs, please try it out and  
provide bugreports if we need to fix things.

-jason

--
Jason Stajich
jason at bioperl.org


From Kevin.M.Brown at asu.edu  Thu Nov  1 13:25:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 1 Nov 2007 10:25:30 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	onwindows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu>

Sounds like a path issue.  Try to tell bioperl the full path to the
executable rather than just the executable name. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 2:46 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl: cannot run emboss programs 
> using bioperl onwindows
> 
> Dear all,
> 
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. 
> However, when I 
> try to call
> an emboss application from bioperl I get a "Application not 
> found error"
> 
> 
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile                       
>               });
>  gives the following error
> 
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at 
> searchPatterns.pl line 
> 102.
> 
> Can somebody help me fix this ?
> 
> best regards
> Rohit
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 14:06:48 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:06:48 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de>


Thanks for all the suggestions... but I unfortunately still cannot run 
emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
and the
path is set correctly. I printed $ENV{$PATH} and this contains 
C:\EMBOSSwin which is the correct location.
I also tried setting the path directly but I'm not sure how to do this, 
so I tried this...

my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');

this also did not work.

Also tried printing...
$fuzznuc->executable()

gave the following error again
-------------------- WARNING ---------------------
MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
---------------------------------------------------

Any more ideas ?

thanks !
Rohit


here's the code...

use strict;
use Bio::Factory::EMBOSS;
use Data::Dumper;

#
# print "PATH=$ENV{PATH}\n";
# path contains C:\EMBOSSwin which is the correct location
# embossversion is 2.10.0-Win-0.8

 my $f = Bio::Factory::EMBOSS->new();
 # get an EMBOSS application  object from the factory
 print Dumper ($f);
 my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
as well,
 print Dump ($fuzznuc);
 
 #dump of fuzznuc
 #$VAR1 = bless( {
 #                '_programgroup' => {},
 #                '_programs' => {},
 #                '_groups' => {}
 #              }, 'Bio::Factory::EMBOSS' );
 
 #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
 
 my $infile = "temp.fasta";
 my $motif  = "ATGTCGATC";
 my $outfile = "test.out";

 
 $fuzznuc->run(
                  { -sequence  => $infile,
                    -pattern   => $motif,
                    -outfile   => $outfile                      
              });
    
Here's the error again....

#-------------------- WARNING ---------------------
#MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
#---------------------------------------------------


Jason Stajich wrote:
> Presumably the PATH is not getting set properly - you should play 
> around printing the $ENV{PATH} variable in a perl script to see if 
> actually contains the directory where the emboss programs are 
> installed.  Bioperl can only guess so much as to where to find an 
> application.  It is also possible that we aren't creating the proper 
> path to the executable - you can print the executable path with 
> print $fuzznuc->executable 
> I believe unless it is throwing an error at the program() line.  
>
> It looks like the code in the Factory object is a little fragile 
> assuming that the programs HAVE to be in your $PATH.  I don't know if 
> windows+perl is special in any way that it run things so I can't 
> really tell if there is specific things you have to do here. You may 
> have to run this through cygwin in case PATH and such are just not 
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However, when I 
>> try to call
>> an emboss application from bioperl I get a "Application not found error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile                       
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl line 
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 14:37:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 14:37:24 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>

You could try this - can't test it though so not sure.
my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\fuzznuc');

-jason
On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:

>
>
> Thanks for all the suggestions... but I unfortunately still cannot run
> emboss. I am running the latest version of embosswin  (2.10.0- 
> Win-0.8),
> and the
> path is set correctly. I printed $ENV{$PATH} and this contains
> C:\EMBOSSwin which is the correct location.
> I also tried setting the path directly but I'm not sure how to do  
> this,
> so I tried this...
>
> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>
> this also did not work.
>
> Also tried printing...
> $fuzznuc->executable()
>
> gave the following error again
> -------------------- WARNING ---------------------
> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> ---------------------------------------------------
>
> Any more ideas ?
>
> thanks !
> Rohit
>
>
> here's the code...
>
> use strict;
> use Bio::Factory::EMBOSS;
> use Data::Dumper;
>
> #
> # print "PATH=$ENV{PATH}\n";
> # path contains C:\EMBOSSwin which is the correct location
> # embossversion is 2.10.0-Win-0.8
>
>  my $f = Bio::Factory::EMBOSS->new();
>  # get an EMBOSS application  object from the factory
>  print Dumper ($f);
>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
> fuzznuc.exe
> as well,
>  print Dump ($fuzznuc);
>
>  #dump of fuzznuc
>  #$VAR1 = bless( {
>  #                '_programgroup' => {},
>  #                '_programs' => {},
>  #                '_groups' => {}
>  #              }, 'Bio::Factory::EMBOSS' );
>
>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>
>  my $infile = "temp.fasta";
>  my $motif  = "ATGTCGATC";
>  my $outfile = "test.out";
>
>
>  $fuzznuc->run(
>                   { -sequence  => $infile,
>                     -pattern   => $motif,
>                     -outfile   => $outfile
>               });
>
> Here's the error again....
>
> #-------------------- WARNING ---------------------
> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> #---------------------------------------------------
>
>
>
>
> Jason Stajich wrote:
>> Presumably the PATH is not getting set properly - you should play
>> around printing the $ENV{PATH} variable in a perl script to see if
>> actually contains the directory where the emboss programs are
>> installed.  Bioperl can only guess so much as to where to find an
>> application.  It is also possible that we aren't creating the proper
>> path to the executable - you can print the executable path with
>> print $fuzznuc->executable
>> I believe unless it is throwing an error at the program() line.
>>
>> It looks like the code in the Factory object is a little fragile
>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>> windows+perl is special in any way that it run things so I can't
>> really tell if there is specific things you have to do here. You may
>> have to run this through cygwin in case PATH and such are just not
>> available properly to windowsPerl.
>>
>> -jason
>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>
>>> Dear all,
>>>
>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>> this from the dos command line and the path is present. However,  
>>> when I
>>> try to call
>>> an emboss application from bioperl I get a "Application not found  
>>> error"
>>>
>>>
>>>   my $f = Bio::Factory::EMBOSS->new();
>>>   # get an EMBOSS application  object from the factory
>>>   my $fuzznuc = $f->program('fuzznuc');
>>>     $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                         -pattern   => $motif,
>>>                        -outfile   => $outfile
>>>               });
>>>  gives the following error
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Application [fuzznuc] is not available!
>>> ---------------------------------------------------
>>> Can't call method "run" on an undefined value at  
>>> searchPatterns.pl line
>>> 102.
>>>
>>> Can somebody help me fix this ?
>>>
>>> best regards
>>> Rohit
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 14:41:41 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:41:41 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de>

Hi Jason

I tried this as well. This also gives the same error message.

-Rohit

Jason Stajich wrote:
> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>
>>
>>
>> Thanks for all the suggestions... but I unfortunately still cannot run 
>> emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
>> and the
>> path is set correctly. I printed $ENV{$PATH} and this contains 
>> C:\EMBOSSwin which is the correct location.
>> I also tried setting the path directly but I'm not sure how to do this, 
>> so I tried this...
>>
>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>
>> this also did not work.
>>
>> Also tried printing...
>> $fuzznuc->executable()
>>
>> gave the following error again
>> -------------------- WARNING ---------------------
>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> ---------------------------------------------------
>>
>> Any more ideas ?
>>
>> thanks !
>> Rohit
>>
>>
>> here's the code...
>>
>> use strict;
>> use Bio::Factory::EMBOSS;
>> use Data::Dumper;
>>
>> #
>> # print "PATH=$ENV{PATH}\n";
>> # path contains C:\EMBOSSwin which is the correct location
>> # embossversion is 2.10.0-Win-0.8
>>
>>  my $f = Bio::Factory::EMBOSS->new();
>>  # get an EMBOSS application  object from the factory
>>  print Dumper ($f);
>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,
>>  print Dump ($fuzznuc);
>>
>>  #dump of fuzznuc
>>  #$VAR1 = bless( {
>>  #                '_programgroup' => {},
>>  #                '_programs' => {},
>>  #                '_groups' => {}
>>  #              }, 'Bio::Factory::EMBOSS' );
>>
>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>
>>  my $infile = "temp.fasta";
>>  my $motif  = "ATGTCGATC";
>>  my $outfile = "test.out";
>>
>>
>>  $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                     -pattern   => $motif,
>>                     -outfile   => $outfile                      
>>               });
>>
>> Here's the error again....
>>
>> #-------------------- WARNING ---------------------
>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> #---------------------------------------------------
>>
>>
>>
>>
>> Jason Stajich wrote:
>>> Presumably the PATH is not getting set properly - you should play 
>>> around printing the $ENV{PATH} variable in a perl script to see if 
>>> actually contains the directory where the emboss programs are 
>>> installed.  Bioperl can only guess so much as to where to find an 
>>> application.  It is also possible that we aren't creating the proper 
>>> path to the executable - you can print the executable path with 
>>> print $fuzznuc->executable 
>>> I believe unless it is throwing an error at the program() line.  
>>>
>>> It looks like the code in the Factory object is a little fragile 
>>> assuming that the programs HAVE to be in your $PATH.  I don't know if 
>>> windows+perl is special in any way that it run things so I can't 
>>> really tell if there is specific things you have to do here. You may 
>>> have to run this through cygwin in case PATH and such are just not 
>>> available properly to windowsPerl.
>>>
>>> -jason
>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>> this from the dos command line and the path is present. However, 
>>>> when I 
>>>> try to call
>>>> an emboss application from bioperl I get a "Application not found 
>>>> error"
>>>>
>>>>
>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>   # get an EMBOSS application  object from the factory
>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>     $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                         -pattern   => $motif,
>>>>                        -outfile   => $outfile                       
>>>>               });
>>>>  gives the following error
>>>>
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>> Can't call method "run" on an undefined value at searchPatterns.pl 
>>>> line 
>>>> 102.
>>>>
>>>> Can somebody help me fix this ?
>>>>
>>>> best regards
>>>> Rohit
>>>>
>>>> -- 
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de> 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From MEC at stowers-institute.org  Thu Nov  1 14:57:33 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 1 Nov 2007 13:57:33 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs
	usingbioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>


in the code
http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 

there is a call to `wossname` (c.f.
http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
)

is wossname in your path?

Maybe it needs to be wossname.exe under windows?


Malcolm Cook
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 1:42 PM
> To: Jason Stajich
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
> usingbioperlonwindows
> 
> Hi Jason
> 
> I tried this as well. This also gives the same error message.
> 
> -Rohit
> 
> Jason Stajich wrote:
> > You could try this - can't test it though so not sure.
> > my $fuzznuc = $f->program('fuzznuc');
> > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> >
> > -jason
> > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> >
> >>
> >>
> >> Thanks for all the suggestions... but I unfortunately still cannot 
> >> run emboss. I am running the latest version of embosswin  
> >> (2.10.0-Win-0.8), and the path is set correctly. I printed 
> >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
> >> location.
> >> I also tried setting the path directly but I'm not sure how to do 
> >> this, so I tried this...
> >>
> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >>
> >> this also did not work.
> >>
> >> Also tried printing...
> >> $fuzznuc->executable()
> >>
> >> gave the following error again
> >> -------------------- WARNING ---------------------
> >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> ---------------------------------------------------
> >>
> >> Any more ideas ?
> >>
> >> thanks !
> >> Rohit
> >>
> >>
> >> here's the code...
> >>
> >> use strict;
> >> use Bio::Factory::EMBOSS;
> >> use Data::Dumper;
> >>
> >> #
> >> # print "PATH=$ENV{PATH}\n";
> >> # path contains C:\EMBOSSwin which is the correct location # 
> >> embossversion is 2.10.0-Win-0.8
> >>
> >>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
> application  
> >> object from the factory  print Dumper ($f);  my $fuzznuc = 
> >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
> as well,  
> >> print Dump ($fuzznuc);
> >>
> >>  #dump of fuzznuc
> >>  #$VAR1 = bless( {
> >>  #                '_programgroup' => {},
> >>  #                '_programs' => {},
> >>  #                '_groups' => {}
> >>  #              }, 'Bio::Factory::EMBOSS' );
> >>
> >>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
> doesn't work
> >>
> >>  my $infile = "temp.fasta";
> >>  my $motif  = "ATGTCGATC";
> >>  my $outfile = "test.out";
> >>
> >>
> >>  $fuzznuc->run(
> >>                   { -sequence  => $infile,
> >>                     -pattern   => $motif,
> >>                     -outfile   => $outfile                      
> >>               });
> >>
> >> Here's the error again....
> >>
> >> #-------------------- WARNING ---------------------
> >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> #---------------------------------------------------
> >>
> >>
> >>
> >>
> >> Jason Stajich wrote:
> >>> Presumably the PATH is not getting set properly - you should play 
> >>> around printing the $ENV{PATH} variable in a perl script 
> to see if 
> >>> actually contains the directory where the emboss programs are 
> >>> installed.  Bioperl can only guess so much as to where to find an 
> >>> application.  It is also possible that we aren't creating 
> the proper 
> >>> path to the executable - you can print the executable path with 
> >>> print $fuzznuc->executable I believe unless it is 
> throwing an error 
> >>> at the program() line.
> >>>
> >>> It looks like the code in the Factory object is a little fragile 
> >>> assuming that the programs HAVE to be in your $PATH.  I 
> don't know 
> >>> if
> >>> windows+perl is special in any way that it run things so I can't
> >>> really tell if there is specific things you have to do 
> here. You may 
> >>> have to run this through cygwin in case PATH and such are 
> just not 
> >>> available properly to windowsPerl.
> >>>
> >>> -jason
> >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> I have emboss installed on a windows machine. (Embosswin). I can 
> >>>> run this from the dos command line and the path is present. 
> >>>> However, when I try to call an emboss application from bioperl I 
> >>>> get a "Application not found error"
> >>>>
> >>>>
> >>>>   my $f = Bio::Factory::EMBOSS->new();
> >>>>   # get an EMBOSS application  object from the factory
> >>>>   my $fuzznuc = $f->program('fuzznuc');
> >>>>     $fuzznuc->run(
> >>>>                   { -sequence  => $infile,
> >>>>                         -pattern   => $motif,
> >>>>                        -outfile   => $outfile            
>            
> >>>>               });
> >>>>  gives the following error
> >>>>
> >>>> -------------------- WARNING ---------------------
> >>>> MSG: Application [fuzznuc] is not available!
> >>>> ---------------------------------------------------
> >>>> Can't call method "run" on an undefined value at 
> searchPatterns.pl 
> >>>> line 102.
> >>>>
> >>>> Can somebody help me fix this ?
> >>>>
> >>>> best regards
> >>>> Rohit
> >>>>
> >>>> --
> >>>>
> >>>> Dr. Rohit Ghai
> >>>> Institute of Medical Microbiology
> >>>> Faculty of Medicine
> >>>> Justus-Liebig University
> >>>> Frankfurter Strasse 107
> >>>> 35392 - Giessen
> >>>> GERMANY
> >>>>
> >>>> Tel  : 0049 (0)641-9946413
> >>>> Fax  : 0049 (0)641-9946409
> >>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org 
> <mailto:Bioperl-l at lists.open-bio.org>
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> --
> >>> Jason Stajich
> >>> jason at bioperl.org <mailto:jason at bioperl.org>
> >>>
> >>
> >> --
> >>
> >> Dr. Rohit Ghai
> >> Institute of Medical Microbiology
> >> Faculty of Medicine
> >> Justus-Liebig University
> >> Frankfurter Strasse 107
> >> 35392 - Giessen
> >> GERMANY
> >>
> >> Tel  : 0049 (0)641-9946413
> >> Fax  : 0049 (0)641-9946409
> >> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> >
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From arareko at campus.iztacala.unam.mx  Thu Nov  1 15:51:41 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Nov 2007 13:51:41 -0600
Subject: [Bioperl-l] bioperl: cannot run emboss
	programs	usingbioperlonwindows
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx>

Doesn't EMBOSS binaries live under 'bin'? Perhaps setting 
PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this:

my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc');

Adding .exe might be worth trying as well.

Mauricio.

Cook, Malcolm wrote:
> in the code
> http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 
> 
> there is a call to `wossname` (c.f.
> http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
> )
> 
> is wossname in your path?
> 
> Maybe it needs to be wossname.exe under windows?
> 
> 
> Malcolm Cook
>   
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
>> Sent: Thursday, November 01, 2007 1:42 PM
>> To: Jason Stajich
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
>> usingbioperlonwindows
>>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot 
>>>> run emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8), and the path is set correctly. I printed 
>>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
>>>> location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this, so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location # 
>>>> embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
>> application  
>>>> object from the factory  print Dumper ($f);  my $fuzznuc = 
>>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,  
>>>> print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
>> doesn't work
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile                      
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play 
>>>>> around printing the $ENV{PATH} variable in a perl script 
>> to see if 
>>>>> actually contains the directory where the emboss programs are 
>>>>> installed.  Bioperl can only guess so much as to where to find an 
>>>>> application.  It is also possible that we aren't creating 
>> the proper 
>>>>> path to the executable - you can print the executable path with 
>>>>> print $fuzznuc->executable I believe unless it is 
>> throwing an error 
>>>>> at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile 
>>>>> assuming that the programs HAVE to be in your $PATH.  I 
>> don't know 
>>>>> if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do 
>> here. You may 
>>>>> have to run this through cygwin in case PATH and such are 
>> just not 
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can 
>>>>>> run this from the dos command line and the path is present. 
>>>>>> However, when I try to call an emboss application from bioperl I 
>>>>>> get a "Application not found error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile            
>>            
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at 
>> searchPatterns.pl 
>>>>>> line 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Dr. Rohit Ghai
>>>>>> Institute of Medical Microbiology
>>>>>> Faculty of Medicine
>>>>>> Justus-Liebig University
>>>>>> Frankfurter Strasse 107
>>>>>> 35392 - Giessen
>>>>>> GERMANY
>>>>>>
>>>>>> Tel  : 0049 (0)641-9946413
>>>>>> Fax  : 0049 (0)641-9946409
>>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org 
>> <mailto:Bioperl-l at lists.open-bio.org>
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> --
>>>>> Jason Stajich
>>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>>
>>>> --
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Nov  1 16:07:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 15:07:39 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>

I did a little investigating using my old PC and was able to get  
fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a  
hoop or two but I managed to get it working.

First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.   
You need to remove EMBOSSWin and install the one I linked to  
previously (this is an actual EMBOSS beta release).  It's possible  
older EMBOSSWin can be configured, but I don't plan on checking it  
out myself.

Next, you need to ensure the binaries are in your PATH env. variable  
(test by running 'wossname' on the command line), then set  
EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like  
path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me  
and WinXP recognizes the UNIX'y form as a valid path.  If you don't  
know how to set env. variables go here:

http://vlaurie.com/computers2/Articles/environment.htm

Once that is set up you should be able to run the script using the  
latest (greatest?) EMBOSS.

chris

On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:

> Hi Jason
>
> I tried this as well. This also gives the same error message.
>
> -Rohit
>
> Jason Stajich wrote:
>> You could try this - can't test it though so not sure.
>> my $fuzznuc = $f->program('fuzznuc');
>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>
>> -jason
>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>
>>>
>>>
>>> Thanks for all the suggestions... but I unfortunately still  
>>> cannot run
>>> emboss. I am running the latest version of embosswin  (2.10.0- 
>>> Win-0.8),
>>> and the
>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>> C:\EMBOSSwin which is the correct location.
>>> I also tried setting the path directly but I'm not sure how to do  
>>> this,
>>> so I tried this...
>>>
>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>
>>> this also did not work.
>>>
>>> Also tried printing...
>>> $fuzznuc->executable()
>>>
>>> gave the following error again
>>> -------------------- WARNING ---------------------
>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> ---------------------------------------------------
>>>
>>> Any more ideas ?
>>>
>>> thanks !
>>> Rohit
>>>
>>>
>>> here's the code...
>>>
>>> use strict;
>>> use Bio::Factory::EMBOSS;
>>> use Data::Dumper;
>>>
>>> #
>>> # print "PATH=$ENV{PATH}\n";
>>> # path contains C:\EMBOSSwin which is the correct location
>>> # embossversion is 2.10.0-Win-0.8
>>>
>>>  my $f = Bio::Factory::EMBOSS->new();
>>>  # get an EMBOSS application  object from the factory
>>>  print Dumper ($f);
>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
>>> fuzznuc.exe
>>> as well,
>>>  print Dump ($fuzznuc);
>>>
>>>  #dump of fuzznuc
>>>  #$VAR1 = bless( {
>>>  #                '_programgroup' => {},
>>>  #                '_programs' => {},
>>>  #                '_groups' => {}
>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>
>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't  
>>> work
>>>
>>>  my $infile = "temp.fasta";
>>>  my $motif  = "ATGTCGATC";
>>>  my $outfile = "test.out";
>>>
>>>
>>>  $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                     -pattern   => $motif,
>>>                     -outfile   => $outfile
>>>               });
>>>
>>> Here's the error again....
>>>
>>> #-------------------- WARNING ---------------------
>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> #---------------------------------------------------
>>>
>>>
>>>
>>>
>>> Jason Stajich wrote:
>>>> Presumably the PATH is not getting set properly - you should play
>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>> actually contains the directory where the emboss programs are
>>>> installed.  Bioperl can only guess so much as to where to find an
>>>> application.  It is also possible that we aren't creating the  
>>>> proper
>>>> path to the executable - you can print the executable path with
>>>> print $fuzznuc->executable
>>>> I believe unless it is throwing an error at the program() line.
>>>>
>>>> It looks like the code in the Factory object is a little fragile
>>>> assuming that the programs HAVE to be in your $PATH.  I don't  
>>>> know if
>>>> windows+perl is special in any way that it run things so I can't
>>>> really tell if there is specific things you have to do here. You  
>>>> may
>>>> have to run this through cygwin in case PATH and such are just not
>>>> available properly to windowsPerl.
>>>>
>>>> -jason
>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have emboss installed on a windows machine. (Embosswin). I  
>>>>> can run
>>>>> this from the dos command line and the path is present. However,
>>>>> when I
>>>>> try to call
>>>>> an emboss application from bioperl I get a "Application not found
>>>>> error"
>>>>>
>>>>>
>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>   # get an EMBOSS application  object from the factory
>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>     $fuzznuc->run(
>>>>>                   { -sequence  => $infile,
>>>>>                         -pattern   => $motif,
>>>>>                        -outfile   => $outfile
>>>>>               });
>>>>>  gives the following error
>>>>>
>>>>> -------------------- WARNING ---------------------
>>>>> MSG: Application [fuzznuc] is not available!
>>>>> ---------------------------------------------------
>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>> line
>>>>> 102.
>>>>>
>>>>> Can somebody help me fix this ?
>>>>>
>>>>> best regards
>>>>> Rohit
>>>>>
>>>>> -- 
>>>>>
>>>>> Dr. Rohit Ghai
>>>>> Institute of Medical Microbiology
>>>>> Faculty of Medicine
>>>>> Justus-Liebig University
>>>>> Frankfurter Strasse 107
>>>>> 35392 - Giessen
>>>>> GERMANY
>>>>>
>>>>> Tel  : 0049 (0)641-9946413
>>>>> Fax  : 0049 (0)641-9946409
>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From neetisomaiya at gmail.com  Fri Nov  2 00:20:27 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 2 Nov 2007 09:50:27 +0530
Subject: [Bioperl-l] need help
Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>

Hi,

This is a perl question, not bioperl.
Can anyone point me to a perl program/code/function which can calculate the
number of days between any two given dates.
Any help will be deeply appreciated.
Thanks.

-- 
-Neeti
Even my blood says, B positive


From whs at ebi.ac.uk  Fri Nov  2 01:01:20 2007
From: whs at ebi.ac.uk (Will Spooner)
Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT)
Subject: [Bioperl-l] need help
In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0711020459530.17670@parrot.ebi.ac.uk>

Hi Neeti,

A non-bioperl answer to your perl questio; Date::Calc should do the trick.

Will

On Fri, 2 Nov 2007, neeti somaiya wrote:

> Hi,
>
> This is a perl question, not bioperl.
> Can anyone point me to a perl program/code/function which can calculate the
> number of days between any two given dates.
> Any help will be deeply appreciated.
> Thanks.
>
>


From smarkel at accelrys.com  Sat Nov  3 02:01:38 2007
From: smarkel at accelrys.com (Scott Markel)
Date: Fri, 2 Nov 2007 23:01:38 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlon	windows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <OFD3D05334.F9E235EF-ON88257388.00209BED-88257388.00211BD7@accelrys.com>

I set multiple environment variables in my code.

    $ENV{EMBOSS_ROOT}    = $embossPath;
    $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); 
    $ENV{EMBOSS_DB_DIR}  = File::Spec->catdir($embossPath, "test");
    $ENV{EMBOSS_DATA}    = File::Spec->catdir($embossPath, "data"); 
    $ENV{PATH}           = $embossPath; 

I found it necessary to set both PATH and EMBOSS_ROOT.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com


bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24:

> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> 
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> 
> >
> >
> > Thanks for all the suggestions... but I unfortunately still cannot run
> > emboss. I am running the latest version of embosswin  (2.10.0- 
> > Win-0.8),
> > and the
> > path is set correctly. I printed $ENV{$PATH} and this contains
> > C:\EMBOSSwin which is the correct location.
> > I also tried setting the path directly but I'm not sure how to do 
> > this,
> > so I tried this...
> >
> > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >
> > this also did not work.
> >
> > Also tried printing...
> > $fuzznuc->executable()
> >
> > gave the following error again
> > -------------------- WARNING ---------------------
> > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > ---------------------------------------------------
> >
> > Any more ideas ?
> >
> > thanks !
> > Rohit
> >
> >
> > here's the code...
> >
> > use strict;
> > use Bio::Factory::EMBOSS;
> > use Data::Dumper;
> >
> > #
> > # print "PATH=$ENV{PATH}\n";
> > # path contains C:\EMBOSSwin which is the correct location
> > # embossversion is 2.10.0-Win-0.8
> >
> >  my $f = Bio::Factory::EMBOSS->new();
> >  # get an EMBOSS application  object from the factory
> >  print Dumper ($f);
> >  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
> > fuzznuc.exe
> > as well,
> >  print Dump ($fuzznuc);
> >
> >  #dump of fuzznuc
> >  #$VAR1 = bless( {
> >  #                '_programgroup' => {},
> >  #                '_programs' => {},
> >  #                '_groups' => {}
> >  #              }, 'Bio::Factory::EMBOSS' );
> >
> >  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
> >
> >  my $infile = "temp.fasta";
> >  my $motif  = "ATGTCGATC";
> >  my $outfile = "test.out";
> >
> >
> >  $fuzznuc->run(
> >                   { -sequence  => $infile,
> >                     -pattern   => $motif,
> >                     -outfile   => $outfile
> >               });
> >
> > Here's the error again....
> >
> > #-------------------- WARNING ---------------------
> > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > #---------------------------------------------------
> >
> >
> >
> >
> > Jason Stajich wrote:
> >> Presumably the PATH is not getting set properly - you should play
> >> around printing the $ENV{PATH} variable in a perl script to see if
> >> actually contains the directory where the emboss programs are
> >> installed.  Bioperl can only guess so much as to where to find an
> >> application.  It is also possible that we aren't creating the proper
> >> path to the executable - you can print the executable path with
> >> print $fuzznuc->executable
> >> I believe unless it is throwing an error at the program() line.
> >>
> >> It looks like the code in the Factory object is a little fragile
> >> assuming that the programs HAVE to be in your $PATH.  I don't know if
> >> windows+perl is special in any way that it run things so I can't
> >> really tell if there is specific things you have to do here. You may
> >> have to run this through cygwin in case PATH and such are just not
> >> available properly to windowsPerl.
> >>
> >> -jason
> >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>
> >>> Dear all,
> >>>
> >>> I have emboss installed on a windows machine. (Embosswin). I can run
> >>> this from the dos command line and the path is present. However, 
> >>> when I
> >>> try to call
> >>> an emboss application from bioperl I get a "Application not found 
> >>> error"
> >>>
> >>>
> >>>   my $f = Bio::Factory::EMBOSS->new();
> >>>   # get an EMBOSS application  object from the factory
> >>>   my $fuzznuc = $f->program('fuzznuc');
> >>>     $fuzznuc->run(
> >>>                   { -sequence  => $infile,
> >>>                         -pattern   => $motif,
> >>>                        -outfile   => $outfile
> >>>               });
> >>>  gives the following error
> >>>
> >>> -------------------- WARNING ---------------------
> >>> MSG: Application [fuzznuc] is not available!
> >>> ---------------------------------------------------
> >>> Can't call method "run" on an undefined value at 
> >>> searchPatterns.pl line
> >>> 102.
> >>>
> >>> Can somebody help me fix this ?
> >>>
> >>> best regards
> >>> Rohit
> >>>
> >>> -- 
> >>>
> >>> Dr. Rohit Ghai
> >>> Institute of Medical Microbiology
> >>> Faculty of Medicine
> >>> Justus-Liebig University
> >>> Frankfurter Strasse 107
> >>> 35392 - Giessen
> >>> GERMANY
> >>>
> >>> Tel  : 0049 (0)641-9946413
> >>> Fax  : 0049 (0)641-9946409
> >>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason at bioperl.org <mailto:jason at bioperl.org>
> >>
> >
> > -- 
> >
> > Dr. Rohit Ghai
> > Institute of Medical Microbiology
> > Faculty of Medicine
> > Justus-Liebig University
> > Frankfurter Strasse 107
> > 35392 - Giessen
> > GERMANY
> >
> > Tel  :   0049 (0)641-9946413
> > Fax  :   0049 (0)641-9946409
> > Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Sat Nov  3 10:07:52 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Sat, 03 Nov 2007 15:07:52 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but
actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any
path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it
runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or
am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct 
version 5.0 (uses embossversion -auto internally, and seems to know 
where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ 
or with exe suffix doesn't work
    #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a 
Bio::Tools::Run::EMBOSSApplication object.
   
   
    #however, creating a EMBOSSApplication object directly makes it 
possible to run the program
    #
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


Chris Fields wrote:
> I did a little investigating using my old PC and was able to get 
> fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a 
> hoop or two but I managed to get it working.
>
> First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.  
> You need to remove EMBOSSWin and install the one I linked to 
> previously (this is an actual EMBOSS beta release).  It's possible 
> older EMBOSSWin can be configured, but I don't plan on checking it out 
> myself.
>
> Next, you need to ensure the binaries are in your PATH env. variable 
> (test by running 'wossname' on the command line), then set EMBOSS_DATA 
> to point at the EMBOSS data directory using a UNIX-like path (i.e. 
> 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP 
> recognizes the UNIX'y form as a valid path.  If you don't know how to 
> set env. variables go here:
>
> http://vlaurie.com/computers2/Articles/environment.htm
>
> Once that is set up you should be able to run the script using the 
> latest (greatest?) EMBOSS.
>
> chris
>
> On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:
>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot run
>>>> emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8),
>>>> and the
>>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>>> C:\EMBOSSwin which is the correct location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this,
>>>> so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location
>>>> # embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();
>>>>  # get an EMBOSS application  object from the factory
>>>>  print Dumper ($f);
>>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
>>>> fuzznuc.exe
>>>> as well,
>>>>  print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>>>
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play
>>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>>> actually contains the directory where the emboss programs are
>>>>> installed.  Bioperl can only guess so much as to where to find an
>>>>> application.  It is also possible that we aren't creating the proper
>>>>> path to the executable - you can print the executable path with
>>>>> print $fuzznuc->executable
>>>>> I believe unless it is throwing an error at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile
>>>>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do here. You may
>>>>> have to run this through cygwin in case PATH and such are just not
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>>>> this from the dos command line and the path is present. However,
>>>>>> when I
>>>>>> try to call
>>>>>> an emboss application from bioperl I get a "Application not found
>>>>>> error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>>> line
>>>>>> 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> -- 
>>>>>>
>
>


From hlapp at gmx.net  Sun Nov  4 12:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 4 Nov 2007 12:42:13 -0500
Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript
In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net>

Hi Stefanie,

sorry for taking so long to respond - your email got buried in a pile  
while I was away on travel. The Bio::SeqFeature::Gene::* modules were  
written mostly with the motivation to have a model that can represent  
the results of gene predictors.

GenBank AFAIK doesn't annotate introns explicitly, though they should  
be implicit from cDNA (or mRNA? or gene, as you say) features on  
genomic sequence. The Bioperl SeqIO parsers won't transform those  
into a Bio::SeqFeature::Gene-based model, but instead will yield just  
plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent  
processing to build these into more hierarchical models.

I'm not sure whether someone's done this already for GenBank-type  
feature tables. There is a Unflattener that at least attempts to  
build a feature hierarchy from the flat array that's compliant with  
the Sequence Ontology (or so I recall).

I'm copying the list in case others have additional suggestions.

	-hilmar

On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote:

>
>
> Hello Hilmar,
>
> I have a question about your bioperl module  
> Bio::SeqFeature::Gene::Transcript:
>
> I can't figure out how to generate the $gene object for use in this  
> line:
> @introns = $gene->introns();
>
> The data I'm working with is a local file in genbank format, and  
> I'm interested in extracting intron sequences (and maybe flanking  
> exons) for certain genes. I have been trying to get the introns via  
> the sequence features ('CDS' or 'gene'), but this has not been  
> working. Which approach will I have to take?
> I'd be very grateful if you could point me into the right direction!
>
> Hope things are going well in Durham! And thank you in advance!
>
> Stefanie
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From downloadondemand at gmail.com  Sun Nov  4 13:39:42 2007
From: downloadondemand at gmail.com (download on demand)
Date: Sun, 4 Nov 2007 20:39:42 +0200
Subject: [Bioperl-l] Help with Bio::SeqIO
Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>

Hi to all.

I have a problem with a simplest script:


         use Bio::SeqIO;
         # get command-line arguments, or die with a usage statement
         my $usage = "x2y.pl infile infileformat outfile outfileformat\n";
         my $infile = shift or die $usage;
         my $infileformat = shift or die $usage;
#         my $outfile = shift or die $usage;
         my $outfileformat = shift or die $usage;

         # create one SeqIO object to read in,and another to write out
         my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => $infileformat);
         my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
                                       '-format' => $outfileformat);

         # write each entry in the input file to the output file
         while (my $inseq = $seq_in->next_seq) {

#            $seq_out->write_seq($inseq); # Whole sequence not needed

for my $feat_object ($inseq->get_SeqFeatures)
    {
    if ($feat_object->primary_tag eq "CDS")
        {
        print $feat_object->get_tag_values('product'),"\n";
        print
$feat_object->location->start,"..",$feat_object->location->end,"\n";
        print $feat_object->spliced_seq->seq,"\n\n";
        }
    }


The result seems OK to me, but in case of first CDS of NC_005213.gbk from
here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/> the
output is wrong:

It is:
hypothetical protein
1..490885
TAAATGCGATTGCTATTAGAA..................................Truncated
sequence...................................

Should be:
hypothetical protein
879..490883
ATGCGATTGCTATTAGAA...................................Truncated
sequence....................................TAA


This CDS have an unnatural location string:
CDS             complement(join(490883..490885,1..879)), but spliced_seq
should handle these things?

Please help me!
Best regards, N.


From cjfields at uiuc.edu  Sun Nov  4 19:08:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Nov 2007 18:08:34 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>

Pass in (-nosort => 1) to spliced_seq:

print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";

This ensures no sorting of sublocations occurs, if you want for  
instance typical GenBank/EMBL 'join' behavior.

To the other devs: shouldn't -nosort be the default behavior when the  
split location is a 'join'?  In other words, should spliced_seq() be  
modified to take into account the split location type when returning  
sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly  
indicates the order of the sequences is important when joined  
together; the current behavior is more like that for 'order'.

chris

On Nov 4, 2007, at 12:39 PM, download on demand wrote:

> Hi to all.
>
> I have a problem with a simplest script:
>
>
>
>          use Bio::SeqIO;
>          # get command-line arguments, or die with a usage statement
>          my $usage = "x2y.pl infile infileformat outfile  
> outfileformat\n";
>          my $infile = shift or die $usage;
>          my $infileformat = shift or die $usage;
> #         my $outfile = shift or die $usage;
>          my $outfileformat = shift or die $usage;
>
>          # create one SeqIO object to read in,and another to write out
>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => $infileformat);
>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>                                        '-format' => $outfileformat);
>
>          # write each entry in the input file to the output file
>          while (my $inseq = $seq_in->next_seq) {
>
> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>
> for my $feat_object ($inseq->get_SeqFeatures)
>     {
>     if ($feat_object->primary_tag eq "CDS")
>         {
>         print $feat_object->get_tag_values('product'),"\n";
>         print
> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>         print $feat_object->spliced_seq->seq,"\n\n";
>         }
>     }
>
>
>
> The result seems OK to me, but in case of first CDS of  
> NC_005213.gbk from
> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/ 
> > the
> output is wrong:
>
> It is:
> hypothetical protein
> 1..490885
> TAAATGCGATTGCTATTAGAA..................................Truncated
> sequence...................................
>
> Should be:
> hypothetical protein
> 879..490883
> ATGCGATTGCTATTAGAA...................................Truncated
> sequence....................................TAA
>
>
>
> This CDS have an unnatural location string:
> CDS             complement(join(490883..490885,1..879)), but  
> spliced_seq
> should handle these things?
>
> Please help me!
> Best regards, N.
> _______________________________________________
>


From jean-luc.jany at univ-brest.fr  Mon Nov  5 03:26:52 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Mon, 05 Nov 2007 09:26:52 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to
	blastall
Message-ID: <472ED3CC.2050305@univ-brest.fr>

Dear Bioperl and Mac users,

I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables.

I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?)

Actually, my blast file is in myname directory and comprises a /bin and  a /data file. I have got my blastall and other executables in myname/blast/bin/blastall.

Thank you in anticipation for your help.

Jean-Luc


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Mon Nov  5 06:36:16 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Mon, 05 Nov 2007 12:36:16 +0100
Subject: [Bioperl-l] bioperl and emboss on windows
Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #
    # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work
    # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object.
    #
    #
    #
    # however, creating a EMBOSSApplication object directly makes it possible to run the program
    #
    
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


From neetisomaiya at gmail.com  Mon Nov  5 07:20:04 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 5 Nov 2007 17:50:04 +0530
Subject: [Bioperl-l] perl question
Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>

Again a perl question, and maybe a very trivial one.
How do I terminate a number like 3.1232010098 to only 3 decimal places in
perl?

-- 
-Neeti
Even my blood says, B positive


From biology0046 at hotmail.com  Mon Nov  5 07:16:13 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Mon, 05 Nov 2007 12:16:13 +0000
Subject: [Bioperl-l] how to extract intron information from gff files.
Message-ID: <BLU108-F34DC66B7BB1B9063DA2BC8B4880@phx.gbl>

Dear all:

i got a poplar genome gff file like this:
LG_I	src	exon	2598	3280	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	2598	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 4
LG_I	src	start_codon	3278	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	stop_codon	2598	2600	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	exon	3544	3918	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	3544	3918	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 3
LG_I	src	exon	4258	4740	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	4258	4740	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 2
LG_I	src	exon	5344	6388	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	5344	6388	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 1
LG_I	src	exon	8259	8528	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8259	8528	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 3
LG_I	src	stop_codon	8259	8261	.	-	0	name "fgenesh1_pg.C_LG_I000002"
LG_I	src	exon	8897	8987	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8897	8987	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 2
LG_I	src	exon	9831	9892	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	9831	9892	.	-	1	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 1
LG_I	src	start_codon	9890	9892	.	-	0	name "fgenesh1_pg.C_LG_I000002"

I try to use Bio::DB::GFF, but this module only applies to methods given in 
the gff file.
what i want to get is "intron, 5utr, 3utr", but this information do not 
contain in this gff file.

how can i get these information through bioperl? This file do not contain 
intron information
if i consider gaps between exons as introns, non cds parts of the first and 
last exon as utrs, how can i extract them through this gff file.

Thanks~~

Wenkai

_________________________________________________________________
?????????????????????????????? MSN Hotmail??  http://www.hotmail.com  


From spiros at lokku.com  Mon Nov  5 07:36:36 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 5 Nov 2007 12:36:36 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <bba689ec0711050436r6016ae57le78db531f9eab55b@mail.gmail.com>

Hey,

use the `sprintf` function. More information can be found at ,
http://perldoc.perl.org/functions/sprintf.html.

For more proper rounding, you could use the Math::Round module,
http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm.

hope this helps,
spiros

On 11/5/07, neeti somaiya <neetisomaiya at gmail.com> wrote:
>
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ak at ebi.ac.uk  Mon Nov  5 07:43:06 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 12:43:06 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <20071105124305.GC4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

When displaying:

  printf( "The number is %.3f\n", $number );

When making a string:

  my $string = sprintf( "%.3f", $number );


BTW, this is cutting, not rounding.


Cheers,
Andreas


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------


From t.nugent at cs.ucl.ac.uk  Mon Nov  5 07:37:15 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 05 Nov 2007 12:37:15 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F0E7B.60303@cs.ucl.ac.uk>

Use Math:Round and nearest_ceil:

http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
>   

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From bix at sendu.me.uk  Mon Nov  5 07:47:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 12:47:17 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F10D5.5060006@sendu.me.uk>

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

Please don't use this list to ask general Perl questions.
See these instead:

http://perldoc.perl.org/perlfaq4.html
http://lists.cpan.org/
http://www.perlmonks.org/


$rounded = sprintf("%.3f", $number);


From Marc.Logghe at DEVGEN.com  Mon Nov  5 07:39:36 2007
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Mon, 5 Nov 2007 13:39:36 +0100
Subject: [Bioperl-l] perl question
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com>

Hi,
Have a look at
http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w
idth

In your particular case:
my $f = 3.1232010098;
printf "%0.3f", $f;


HTH,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> neeti somaiya
> Sent: Monday, November 05, 2007 1:20 PM
> To: bioperl-l
> Subject: [Bioperl-l] perl question
> 
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 
> decimal places in perl?
> 
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From bix at sendu.me.uk  Mon Nov  5 08:24:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 13:24:25 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <20071105124305.GC4491@ebi.ac.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk>
Message-ID: <472F1989.90105@sendu.me.uk>

Andreas Kahari wrote:
> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
>> Again a perl question, and maybe a very trivial one.
>> How do I terminate a number like 3.1232010098 to only 3 decimal places in
>> perl?
> 
> When displaying:
> 
>   printf( "The number is %.3f\n", $number );
> 
> When making a string:
> 
>   my $string = sprintf( "%.3f", $number );
> 
> 
> BTW, this is cutting, not rounding.

(s)printf rounds (ie. doesn't simply truncate), though for critical 
applications you should use your own rounding algorithm.


From ak at ebi.ac.uk  Mon Nov  5 08:56:24 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 13:56:24 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <472F1989.90105@sendu.me.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk>
Message-ID: <20071105135624.GD4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote:
> Andreas Kahari wrote:
> > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> >> Again a perl question, and maybe a very trivial one.
> >> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> >> perl?
> > 
> > When displaying:
> > 
> >   printf( "The number is %.3f\n", $number );
> > 
> > When making a string:
> > 
> >   my $string = sprintf( "%.3f", $number );
> > 
> > 
> > BTW, this is cutting, not rounding.
> 
> (s)printf rounds (ie. doesn't simply truncate), though for critical 
> applications you should use your own rounding algorithm.

They do indeed.  Mea culpa.


Andreas

-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------


From jay at jays.net  Mon Nov  5 10:14:17 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 10:14:17 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net>

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
> To the other devs: shouldn't -nosort be the default behavior when the
> split location is a 'join'?

I certainly think so.

> In other words, should spliced_seq() be
> modified to take into account the split location type when returning
> sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly
> indicates the order of the sequences is important when joined
> together; the current behavior is more like that for 'order'.

I don't see any value to the sorting algorithm. All tests invoke - 
nosort => 1 (except a phase test where nosort doesn't matter anyway).  
In my limited experience the sorting only serves to break real-world  
splicing.

If there is no valid use then we can remove ~20 lines from  
SeqFeatureI.pm circa line 505. If there is a valid use and someone  
would be so kind as to educate me I'd be happy to add tests which  
demonstrate them.  :)

P.S.  CSHL is neato. I plan on understanding some of this stuff some  
day.  :)

j
http://www.bioperl.org/wiki/User:Jhannah


From hlapp at duke.edu  Mon Nov  5 11:03:16 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 11:03:16 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>

I agree that there should be a meaningful default that results in  
"doing the right thing" in most cases if the user doesn't intervene.  
I'm not sure I understand all the details, but it sounds sorting or  
not sorting should depend on the split location type unless the user  
overrides it by argument. That's what you're suggesting, right?

	-hilmar

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:

> Pass in (-nosort => 1) to spliced_seq:
>
> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>
> This ensures no sorting of sublocations occurs, if you want for  
> instance typical GenBank/EMBL 'join' behavior.
>
> To the other devs: shouldn't -nosort be the default behavior when  
> the split location is a 'join'?  In other words, should spliced_seq 
> () be modified to take into account the split location type when  
> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'  
> explicitly indicates the order of the sequences is important when  
> joined together; the current behavior is more like that for 'order'.
>
> chris
>
> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>
>> Hi to all.
>>
>> I have a problem with a simplest script:
>>
>>
>>
>>          use Bio::SeqIO;
>>          # get command-line arguments, or die with a usage statement
>>          my $usage = "x2y.pl infile infileformat outfile  
>> outfileformat\n";
>>          my $infile = shift or die $usage;
>>          my $infileformat = shift or die $usage;
>> #         my $outfile = shift or die $usage;
>>          my $outfileformat = shift or die $usage;
>>
>>          # create one SeqIO object to read in,and another to write  
>> out
>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>                                       '-format' => $infileformat);
>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>                                        '-format' => $outfileformat);
>>
>>          # write each entry in the input file to the output file
>>          while (my $inseq = $seq_in->next_seq) {
>>
>> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>>
>> for my $feat_object ($inseq->get_SeqFeatures)
>>     {
>>     if ($feat_object->primary_tag eq "CDS")
>>         {
>>         print $feat_object->get_tag_values('product'),"\n";
>>         print
>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>         print $feat_object->spliced_seq->seq,"\n\n";
>>         }
>>     }
>>
>>
>>
>> The result seems OK to me, but in case of first CDS of  
>> NC_005213.gbk from
>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ 
>> Nanoarchaeum_equitans/> the
>> output is wrong:
>>
>> It is:
>> hypothetical protein
>> 1..490885
>> TAAATGCGATTGCTATTAGAA..................................Truncated
>> sequence...................................
>>
>> Should be:
>> hypothetical protein
>> 879..490883
>> ATGCGATTGCTATTAGAA...................................Truncated
>> sequence....................................TAA
>>
>>
>>
>> This CDS have an unnatural location string:
>> CDS             complement(join(490883..490885,1..879)), but  
>> spliced_seq
>> should handle these things?
>>
>> Please help me!
>> Best regards, N.
>> _______________________________________________
>>
>
>
>


From bernd.web at gmail.com  Mon Nov  5 11:53:01 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 5 Nov 2007 17:53:01 +0100
Subject: [Bioperl-l] PSI-BLAST
Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com>

Hi,

Is it possible with SearchIO to select a specific iteration (Results
from round i) part of the PSI-blast report, when parsing this with
SearchIO::blast?
It seems the parser parses the complete report. If not implemented I
could of course extract the specific part of the psi-blast report and
then give it too SearchIO (e.g. with IO::String), but maybe I am
missing a built-in option?


Regards,
Bernd


From jay at jays.net  Mon Nov  5 11:54:13 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 11:54:13 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>

On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?

If someone knows why spliced_seq() should ever sort then I'm  
suggesting we add a test demonstrating a useful example of that.

If no one has a useful example of when you would want spliced_seq()  
to sort then I'm suggesting we remove the sorting altogether and  
nosort goes away.

I can provide/add many examples where sorting is bad. I do not know  
of a case where sorting is good.

j
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Mon Nov  5 12:07:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Nov 2007 12:07:10 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>


At one point the location order was not respected/saved I believe. I  
guess we will just assume the user will build up a SplitLocation in  
order (i.e. add_SubLocation).  I'll try and remember if there were  
any other particular reasons.


-jason
On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar
>
> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>
>> Pass in (-nosort => 1) to spliced_seq:
>>
>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>
>> This ensures no sorting of sublocations occurs, if you want for
>> instance typical GenBank/EMBL 'join' behavior.
>>
>> To the other devs: shouldn't -nosort be the default behavior when
>> the split location is a 'join'?  In other words, should spliced_seq
>> () be modified to take into account the split location type when
>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>> explicitly indicates the order of the sequences is important when
>> joined together; the current behavior is more like that for 'order'.
>>
>> chris
>>
>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>
>>> Hi to all.
>>>
>>> I have a problem with a simplest script:
>>>
>>>
>>>
>>>          use Bio::SeqIO;
>>>          # get command-line arguments, or die with a usage statement
>>>          my $usage = "x2y.pl infile infileformat outfile
>>> outfileformat\n";
>>>          my $infile = shift or die $usage;
>>>          my $infileformat = shift or die $usage;
>>> #         my $outfile = shift or die $usage;
>>>          my $outfileformat = shift or die $usage;
>>>
>>>          # create one SeqIO object to read in,and another to write
>>> out
>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>                                       '-format' => $infileformat);
>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>                                        '-format' => $outfileformat);
>>>
>>>          # write each entry in the input file to the output file
>>>          while (my $inseq = $seq_in->next_seq) {
>>>
>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>> needed
>>>
>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>     {
>>>     if ($feat_object->primary_tag eq "CDS")
>>>         {
>>>         print $feat_object->get_tag_values('product'),"\n";
>>>         print
>>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>         }
>>>     }
>>>
>>>
>>>
>>> The result seems OK to me, but in case of first CDS of
>>> NC_005213.gbk from
>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>> Nanoarchaeum_equitans/> the
>>> output is wrong:
>>>
>>> It is:
>>> hypothetical protein
>>> 1..490885
>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>> sequence...................................
>>>
>>> Should be:
>>> hypothetical protein
>>> 879..490883
>>> ATGCGATTGCTATTAGAA...................................Truncated
>>> sequence....................................TAA
>>>
>>>
>>>
>>> This CDS have an unnatural location string:
>>> CDS             complement(join(490883..490885,1..879)), but
>>> spliced_seq
>>> should handle these things?
>>>
>>> Please help me!
>>> Best regards, N.
>>> _______________________________________________
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Mon Nov  5 12:16:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:16:10 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>

Yes, we would sort based on the splittype() and default to a  
particular behavior ('join') if one isn't designated, maybe with a  
warning indicating the splittype() isn't defined.  Using an 'order'  
or other defined types could also delineate a default sort/nosort  
behavior (probably the previous as it would replicate prior behavior).

chris

On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar


From cjfields at uiuc.edu  Mon Nov  5 12:20:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:20:35 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu>


On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote:

> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>
> If someone knows why spliced_seq() should ever sort then I'm
> suggesting we add a test demonstrating a useful example of that.
>
> If no one has a useful example of when you would want spliced_seq()
> to sort then I'm suggesting we remove the sorting altogether and
> nosort goes away.
>
> I can provide/add many examples where sorting is bad. I do not know
> of a case where sorting is good.
>
> j
> http://www.bioperl.org/wiki/User:Jhannah

The behavior would be based on the current use of 'join', 'order',  
and 'bond' (the latter in GenPept records).  I documented some cases  
here a while back:

http://www.bioperl.org/wiki/BioPerl_Locations#Split

chris


From hlapp at duke.edu  Mon Nov  5 12:32:24 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 12:32:24 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu>

Sounds good to me. -hilmar

On Nov 5, 2007, at 12:16 PM, Chris Fields wrote:

> Yes, we would sort based on the splittype() and default to a  
> particular behavior ('join') if one isn't designated, maybe with a  
> warning indicating the splittype() isn't defined.  Using an 'order'  
> or other defined types could also delineate a default sort/nosort  
> behavior (probably the previous as it would replicate prior behavior).
>
> chris
>
> On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 12:41:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:41:27 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
Message-ID: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>

It may have something to do with remote locations or setting strand()  
in sublocations.  This may have popped up in relation to a LocationI  
code audit I proposed a while back on the list which I never got  
around to.  Oh well...

I at least managed getting a wiki page started in case we decided to  
make changes, with the intention of making it a HOWTO at some point:

http://www.bioperl.org/wiki/BioPerl_Locations

If we go through with the changes to spliced_seq(), should it be  
implemented for inclusion in v1.6 or wait until v1.7?

chris

On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote:

>
> At one point the location order was not respected/saved I believe.  
> I guess we will just assume the user will build up a SplitLocation  
> in order (i.e. add_SubLocation).  I'll try and remember if there  
> were any other particular reasons.
>
>
> -jason
> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>>
>> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>>
>>> Pass in (-nosort => 1) to spliced_seq:
>>>
>>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>>
>>> This ensures no sorting of sublocations occurs, if you want for
>>> instance typical GenBank/EMBL 'join' behavior.
>>>
>>> To the other devs: shouldn't -nosort be the default behavior when
>>> the split location is a 'join'?  In other words, should spliced_seq
>>> () be modified to take into account the split location type when
>>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>>> explicitly indicates the order of the sequences is important when
>>> joined together; the current behavior is more like that for 'order'.
>>>
>>> chris
>>>
>>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>>
>>>> Hi to all.
>>>>
>>>> I have a problem with a simplest script:
>>>>
>>>>
>>>>
>>>>          use Bio::SeqIO;
>>>>          # get command-line arguments, or die with a usage  
>>>> statement
>>>>          my $usage = "x2y.pl infile infileformat outfile
>>>> outfileformat\n";
>>>>          my $infile = shift or die $usage;
>>>>          my $infileformat = shift or die $usage;
>>>> #         my $outfile = shift or die $usage;
>>>>          my $outfileformat = shift or die $usage;
>>>>
>>>>          # create one SeqIO object to read in,and another to write
>>>> out
>>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>>                                       '-format' => $infileformat);
>>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>>                                        '-format' =>  
>>>> $outfileformat);
>>>>
>>>>          # write each entry in the input file to the output file
>>>>          while (my $inseq = $seq_in->next_seq) {
>>>>
>>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>>> needed
>>>>
>>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>>     {
>>>>     if ($feat_object->primary_tag eq "CDS")
>>>>         {
>>>>         print $feat_object->get_tag_values('product'),"\n";
>>>>         print
>>>> $feat_object->location->start,"..",$feat_object->location- 
>>>> >end,"\n";
>>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>
>>>> The result seems OK to me, but in case of first CDS of
>>>> NC_005213.gbk from
>>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>>> Nanoarchaeum_equitans/> the
>>>> output is wrong:
>>>>
>>>> It is:
>>>> hypothetical protein
>>>> 1..490885
>>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>>> sequence...................................
>>>>
>>>> Should be:
>>>> hypothetical protein
>>>> 879..490883
>>>> ATGCGATTGCTATTAGAA...................................Truncated
>>>> sequence....................................TAA
>>>>
>>>>
>>>>
>>>> This CDS have an unnatural location string:
>>>> CDS             complement(join(490883..490885,1..879)), but
>>>> spliced_seq
>>>> should handle these things?
>>>>
>>>> Please help me!
>>>> Best regards, N.
>>>> _______________________________________________
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Mon Nov  5 11:05:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 05 Nov 2007 12:05:41 -0400
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <472ED3CC.2050305@univ-brest.fr>
Message-ID: <C354B795.10231%bosborne11@verizon.net>

Jean-luc,

>From what you written it sounds like you're using bash and not some other
shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
in your home directory, as well as a .ncbirc file. This should work.

I'm no Unix expert but I've always configured tcsh on the Mac in the same
ways I'd configure it on Linux machines. Similarly, if you're using bash
then it will read its .bashrc file, regardless of what flavor of Unix you
use (and the same thing holds true for zsh or csh or ...).

Brian O.


On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:

> Dear Bioperl and Mac users,
> 
> I am a Mac user and would like to run a script I made using
> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
> to Bioperl the pathway to Blastall and other executables.
> 
> I read carefully the following link
> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
> path to Blast, but I guess the way to proceed is slightly different in Mac and
> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
> .profile file instead of .bashrc?)
> 
> Actually, my blast file is in myname directory and comprises a /bin and  a
> /data file. I have got my blastall and other executables in
> myname/blast/bin/blastall.
> 
> Thank you in anticipation for your help.
> 
> Jean-Luc
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Nov  5 13:35:56 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 05 Nov 2007 12:35:56 -0600
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <C354B795.10231%bosborne11@verizon.net>
References: <C354B795.10231%bosborne11@verizon.net>
Message-ID: <472F628C.2000506@campus.iztacala.unam.mx>

If the ~/.bashrc file doesn't work for you, try renaming it to 
~/.bash_profile and re-login, that might work best.

~/.bashrc works as an individual per-interactive-shell startup file, 
whereas ~/.bash_profile is a personal initialization file, executed for 
login shells.

Hope this helps.

Regards,
Mauricio.


Brian Osborne wrote:
> Jean-luc,
> 
>>From what you written it sounds like you're using bash and not some other
> shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
> in your home directory, as well as a .ncbirc file. This should work.
> 
> I'm no Unix expert but I've always configured tcsh on the Mac in the same
> ways I'd configure it on Linux machines. Similarly, if you're using bash
> then it will read its .bashrc file, regardless of what flavor of Unix you
> use (and the same thing holds true for zsh or csh or ...).
> 
> Brian O.
> 
> 
> On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:
> 
>> Dear Bioperl and Mac users,
>>
>> I am a Mac user and would like to run a script I made using
>> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
>> to Bioperl the pathway to Blastall and other executables.
>>
>> I read carefully the following link
>> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
>> path to Blast, but I guess the way to proceed is slightly different in Mac and
>> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
>> .profile file instead of .bashrc?)
>>
>> Actually, my blast file is in myname directory and comprises a /bin and  a
>> /data file. I have got my blastall and other executables in
>> myname/blast/bin/blastall.
>>
>> Thank you in anticipation for your help.
>>
>> Jean-Luc
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at duke.edu  Mon Nov  5 16:04:11 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 16:04:11 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
Message-ID: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>


On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:

> If we go through with the changes to spliced_seq(), should it be  
> implemented for inclusion in v1.6 or wait until v1.7?

I would say they should be implemented ASAP because they 1) should  
not change behavior for those for which the current default behavior  
was already broken (and who therefore pass in --no_sort), and 2) fix  
the behavior for those who erroneously assumed that the code was  
going to do the right thing by default.

I.e., it sounds mostly like a bugfix to me. Am I overlooking something?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 17:12:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 16:12:23 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
	<EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu>


On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote:

>
> On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:
>
>> If we go through with the changes to spliced_seq(), should it be  
>> implemented for inclusion in v1.6 or wait until v1.7?
>
> I would say they should be implemented ASAP because they 1) should  
> not change behavior for those for which the current default  
> behavior was already broken (and who therefore pass in --no_sort),  
> and 2) fix the behavior for those who erroneously assumed that the  
> code was going to do the right thing by default.
>
> I.e., it sounds mostly like a bugfix to me. Am I overlooking  
> something?
>
> 	-hilmar
> -- 

Okay; I'll try to get this in soon.

chris


From jean-luc.jany at univ-brest.fr  Tue Nov  6 04:00:07 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Tue, 06 Nov 2007 10:00:07 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
Message-ID: <47302D17.2030500@univ-brest.fr>

Thanks Brian. Yes I use bash. I am going to follow your advice as soon 
as possible (for some reasons I am unable to run bioperl) and come back 
to you to tell you if it runs.
Jean-Luc


From jason at bioperl.org  Tue Nov  6 16:18:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 16:18:35 -0500
Subject: [Bioperl-l] lightweight sequence features
Message-ID: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>

I started a branch for implementing and playing with lightweight  
feature object. The branch is called 'lightweight_feature_branch'.

Right now it is about 70% faster just in object creation based on  
parsing features using Bio::Tools::GFF and swapping the types of  
features that are created.  It uses arrays instead of hashes under  
the hood.

So the objects don't have locations under the hood.  My hope is if  
this works okay we could use it for creating objects where we KNOW  
the underlying features have simple locations so such as parsing in  
GFF data.

-jason
--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Tue Nov  6 16:57:17 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Nov 2007 15:57:17 -0600
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>

Bravo!  I once benchmarked Location instance creation once and found  
it contributed quite a bit of overhead so the speedup with that and  
the use of arrays makes quite a bit of sense to me.

You mention only simple locations; I'm guessing this doesn't handle  
'fuzzy' ends?  If it did I could see layering the feature data from  
the get-go, so it could be used just about anywhere in the place of  
SF::Generic.  Maybe something to test out in 1.7?

chris

On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:

> I started a branch for implementing and playing with lightweight
> feature object. The branch is called 'lightweight_feature_branch'.
>
> Right now it is about 70% faster just in object creation based on
> parsing features using Bio::Tools::GFF and swapping the types of
> features that are created.  It uses arrays instead of hashes under
> the hood.
>
> So the objects don't have locations under the hood.  My hope is if
> this works okay we could use it for creating objects where we KNOW
> the underlying features have simple locations so such as parsing in
> GFF data.
>
> -jason
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Tue Nov  6 23:14:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 23:14:55 -0500
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
	<5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
Message-ID: <A021EE94-8DF8-467E-8303-E80127E3AEE2@bioperl.org>

Right - only for simple locations.  I've got a bunch more tests and  
fixes to put in.

I am hoping this can be fast replacement in the case where we're  
dealing with this "unflattened" data (i.e. GFF in FeatureIO &  
Gbrowse).  This is sort of a playground until I feel like it can  
really get  it tested a bit more.  I'll give an all clear when the  
dust settles in terms of the design if anyone wants to play/help.

-jason
On Nov 6, 2007, at 4:57 PM, Chris Fields wrote:

> Bravo!  I once benchmarked Location instance creation once and  
> found it contributed quite a bit of overhead so the speedup with  
> that and the use of arrays makes quite a bit of sense to me.
>
> You mention only simple locations; I'm guessing this doesn't handle  
> 'fuzzy' ends?  If it did I could see layering the feature data from  
> the get-go, so it could be used just about anywhere in the place of  
> SF::Generic.  Maybe something to test out in 1.7?
>
> chris
>
> On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:
>
>> I started a branch for implementing and playing with lightweight
>> feature object. The branch is called 'lightweight_feature_branch'.
>>
>> Right now it is about 70% faster just in object creation based on
>> parsing features using Bio::Tools::GFF and swapping the types of
>> features that are created.  It uses arrays instead of hashes under
>> the hood.
>>
>> So the objects don't have locations under the hood.  My hope is if
>> this works okay we could use it for creating objects where we KNOW
>> the underlying features have simple locations so such as parsing in
>> GFF data.
>>
>> -jason
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From heikki at sanbi.ac.za  Wed Nov  7 05:05:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Nov 2007 12:05:59 +0200
Subject: [Bioperl-l] Bio::Tools::Run::Mdust
Message-ID: <200711071205.59576.heikki@sanbi.ac.za>

Hi Donald,

I started using your Mdust module in bioperl-run and run into problems 
immediately.

* Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects,
  although the docs say otherwise
* Sequences are modified in place. That is really bad, because that 
  means that the user has to know to create a copy before 
  running Mdust on it.
* The docs say that you have to set MDUSTDIR envvar to tell the program 
  where to find the binary. That is actually optional if the 
  binary is on your path.
* The tests do not cover any of the options to the program


As a quick fix, I suggest that we:

* leave the current way of working for Bio::SeqI objects:
  sequence string is not masked but seqfeatures to that effect are added
* Modify run() to return the new masked sequence object when 
  the target is a Bio::PrimarySeqI.
* fix the documentation


After that it will be possible to simply write:

use Bio::Tools::Run::Mdust;
$mdust = Bio::Tools::Run::Mdust->new();
$seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI);


Are you happy for me to do this or do you want to do it yourself?


Yours,
	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    
    _/_/_/_/_/  heikki at_sanbi _ac _za    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Kevin.M.Brown at asu.edu  Wed Nov  7 13:04:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 7 Nov 2007 11:04:50 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>

I installed bioperl-ext from CVS, but can't figure out what else is
missing to utilize Bio::Tools::pSW.  The error I get from the example
script in the wiki is:

The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align)
has not been installed.
 Please read the install the bioperl-ext package

BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
Compilation failed in require at ./align_test.pl line 3.
BEGIN failed--compilation aborted at ./align_test.pl line 3.

In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
Align, but no Align.pm file.

I followed the directions in the wiki to install 1.5.2_102 (think I had
_100 installed previously).  Any thoughts on what I'm missing?


From jason at bioperl.org  Wed Nov  7 14:52:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 14:52:16 -0500
Subject: [Bioperl-l] (no subject)
Message-ID: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>

The array-based Bio::SeqFeature::Slim is only about 7% faster than  
Bio::Graphics::Feature so I suspect most of the speedup comes from  
removing location objects.

Generic     6.75        --      -37%      -41%
GraphicsF   4.26       58%        --       -7%
Slim        3.98       70%        7%        --

this is using code on the lightweight_feature_branch so
cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r  
lightweight_feature_branch -d core_lwf bioperl-live

http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl
and the GFF3 file I used to parse
http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2

-jason


From lstein at cshl.edu  Wed Nov  7 15:04:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Nov 2007 15:04:24 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>

I wonder if it is worth moving to the array-based version more generally,
then.

How does the array based feature object deal with tags?

Lincoln

On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:

> The array-based Bio::SeqFeature::Slim is only about 7% faster than
> Bio::Graphics::Feature so I suspect most of the speedup comes from removing
> location objects.
>
> Generic     6.75        --      -37%      -41%
> GraphicsF   4.26       58%        --       -7%
> Slim        3.98       70%        7%        --
>
> this is using code on the lightweight_feature_branch so
> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
> lightweight_feature_branch -d core_lwf bioperl-live
>
> http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/seqfeature_speed.pl>
> and the GFF3 file I used to parse
> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>
> -jason
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason at bioperl.org  Wed Nov  7 15:09:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 15:09:35 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>

It uses hashes there so technically it is not entirely array based.

-jason
On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:

> I wonder if it is worth moving to the array-based version more  
> generally,
> then.
>
> How does the array based feature object deal with tags?
>
> Lincoln
>
> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>
>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>> Bio::Graphics::Feature so I suspect most of the speedup comes from  
>> removing
>> location objects.
>>
>> Generic     6.75        --      -37%      -41%
>> GraphicsF   4.26       58%        --       -7%
>> Slim        3.98       70%        7%        --
>>
>> this is using code on the lightweight_feature_branch so
>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>> lightweight_feature_branch -d core_lwf bioperl-live
>>
>> http://jason.open-bio.org/~jason/bioperl/ 
>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/ 
>> seqfeature_speed.pl>
>> and the GFF3 file I used to parse
>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http:// 
>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>
>> -jason
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Nov  7 16:12:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 15:12:35 -0600
Subject: [Bioperl-l] (no subject)
In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>

I can see preferring a lightweight simple SF over SF::Generic in the  
next BioPerl dev cycle.  I guess we would just layer split locations  
as simple sub-features/segments, typing when necessary?  That  
shouldn't be much more overhead than creating a layered Location::Split.

chris

On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:

> It uses hashes there so technically it is not entirely array based.
>
> -jason
> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>
>> I wonder if it is worth moving to the array-based version more
>> generally,
>> then.
>>
>> How does the array based feature object deal with tags?
>>
>> Lincoln
>>
>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>
>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>> removing
>>> location objects.
>>>
>>> Generic     6.75        --      -37%      -41%
>>> GraphicsF   4.26       58%        --       -7%
>>> Slim        3.98       70%        7%        --
>>>
>>> this is using code on the lightweight_feature_branch so
>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>
>>> http://jason.open-bio.org/~jason/bioperl/
>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>> seqfeature_speed.pl>
>>> and the GFF3 file I used to parse
>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>
>>> -jason
>>>
>>
>>
>>
>> -- 
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Wed Nov  7 18:19:15 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Nov 2007 18:19:15 -0500
Subject: [Bioperl-l] lightweight features
In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
Message-ID: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>

It seems to me that there are applications where you're dealing with  
a huge number of features (such as GFF) and where therefore a  
lightweight object makes tremendous sense. But when you parse a  
genbank file, I'm not sure that's the bottleneck, unless maybe it's a  
large contig with lots of feature annotations.

I guess we'll ultimately want a way to control the type of feature  
being instantiated by a parser, e..g using a factory.

	-hilmar

On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:

> I can see preferring a lightweight simple SF over SF::Generic in the
> next BioPerl dev cycle.  I guess we would just layer split locations
> as simple sub-features/segments, typing when necessary?  That
> shouldn't be much more overhead than creating a layered  
> Location::Split.
>
> chris
>
> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>
>> It uses hashes there so technically it is not entirely array based.
>>
>> -jason
>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>
>>> I wonder if it is worth moving to the array-based version more
>>> generally,
>>> then.
>>>
>>> How does the array based feature object deal with tags?
>>>
>>> Lincoln
>>>
>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>
>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>> removing
>>>> location objects.
>>>>
>>>> Generic     6.75        --      -37%      -41%
>>>> GraphicsF   4.26       58%        --       -7%
>>>> Slim        3.98       70%        7%        --
>>>>
>>>> this is using code on the lightweight_feature_branch so
>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>
>>>> http://jason.open-bio.org/~jason/bioperl/
>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>> seqfeature_speed.pl>
>>>> and the GFF3 file I used to parse
>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>
>>>> -jason
>>>>
>>>
>>>
>>>
>>> -- 
>>> Lincoln D. Stein
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724
>>> (516) 367-8380 (voice)
>>> (516) 367-8389 (fax)
>>> FOR URGENT MESSAGES & SCHEDULING,
>>> PLEASE CONTACT MY ASSISTANT,
>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Nov  7 20:04:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 19:04:05 -0600
Subject: [Bioperl-l] lightweight features
In-Reply-To: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
	<D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
Message-ID: <E541C60D-6741-4923-A71D-E14CE6FE176D@uiuc.edu>

I'm also thinking a factory is a good possibility; maybe something to  
take the place of FTHelper.

chris

On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote:

> It seems to me that there are applications where you're dealing with
> a huge number of features (such as GFF) and where therefore a
> lightweight object makes tremendous sense. But when you parse a
> genbank file, I'm not sure that's the bottleneck, unless maybe it's a
> large contig with lots of feature annotations.
>
> I guess we'll ultimately want a way to control the type of feature
> being instantiated by a parser, e..g using a factory.
>
> 	-hilmar
>
> On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:
>
>> I can see preferring a lightweight simple SF over SF::Generic in the
>> next BioPerl dev cycle.  I guess we would just layer split locations
>> as simple sub-features/segments, typing when necessary?  That
>> shouldn't be much more overhead than creating a layered
>> Location::Split.
>>
>> chris
>>
>> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>>
>>> It uses hashes there so technically it is not entirely array based.
>>>
>>> -jason
>>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>>
>>>> I wonder if it is worth moving to the array-based version more
>>>> generally,
>>>> then.
>>>>
>>>> How does the array based feature object deal with tags?
>>>>
>>>> Lincoln
>>>>
>>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>>
>>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>>> removing
>>>>> location objects.
>>>>>
>>>>> Generic     6.75        --      -37%      -41%
>>>>> GraphicsF   4.26       58%        --       -7%
>>>>> Slim        3.98       70%        7%        --
>>>>>
>>>>> this is using code on the lightweight_feature_branch so
>>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl  
>>>>> co -r
>>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>>
>>>>> http://jason.open-bio.org/~jason/bioperl/
>>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>>> seqfeature_speed.pl>
>>>>> and the GFF3 file I used to parse
>>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>>
>>>>> -jason
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Lincoln D. Stein
>>>> Cold Spring Harbor Laboratory
>>>> 1 Bungtown Road
>>>> Cold Spring Harbor, NY 11724
>>>> (516) 367-8380 (voice)
>>>> (516) 367-8389 (fax)
>>>> FOR URGENT MESSAGES & SCHEDULING,
>>>> PLEASE CONTACT MY ASSISTANT,
>>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov  7 23:45:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 22:45:26 -0600
Subject: [Bioperl-l] test please ignore
Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>


From cjfields at uiuc.edu  Thu Nov  8 10:50:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Nov 2007 09:50:02 -0600
Subject: [Bioperl-l] test please ignore
In-Reply-To: <47332534.5090205@bms.com>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
	<47332534.5090205@bms.com>
Message-ID: <D0ADF51D-92BE-4645-BB1C-564536732368@uiuc.edu>

And respond back!  Just checking the mail list; the open-bio wiki  
pages were down last night.

chris

On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote:

> Chris Fields wrote:
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> This is the best way to make everyone open this e-mail ;-)
> Stefan


From stefan.kirov at bms.com  Thu Nov  8 10:03:16 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 08 Nov 2007 10:03:16 -0500
Subject: [Bioperl-l] test please ignore
In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
Message-ID: <47332534.5090205@bms.com>

Chris Fields wrote:
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
This is the best way to make everyone open this e-mail ;-)
Stefan


From Kevin.M.Brown at asu.edu  Thu Nov  8 17:30:24 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Nov 2007 15:30:24 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org>
References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>
	<20071108003638.GA5892@eniac.jgi-psf.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu>

OK, found the issue.  For whatever reason the Align.pm file is inside
the Align folder and so the package name and path don't match up once it
is installed.  This would cause it to have a name of
"Bio::Ext::Align::Align" instead of "Bio::Ext::Align".  Not sure why
this wasn't caught when I did "perl Makefile.pl && make && make test &&
make install" 

> -----Original Message-----
> From: Joel Martin [mailto:j_martin at lbl.gov] 
> Sent: Wednesday, November 07, 2007 5:37 PM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Bio::Ext::Align?
> 
> Hello,
>     Might be a side effect of fixing the other bioperl-ext package, 
> what steps exactly did this entail:
> 
> > I installed bioperl-ext from CVS, 
> 
> ?
> 
> you can probably bypass it at the moment by doing this after 
> unpacking the
> bioperl-ext package 
> 
> cd Bio/Ext/Align
> perl Makefile.PL
> make
> make test
> make install
> 
> and
> 
> cd Bio/Ext/HMM
> perl Makefile.PL
> make 
> make test
> make install
> 
> Joel
> 
> but can't figure out what else is
> > missing to utilize Bio::Tools::pSW.  The error I get from 
> the example
> > script in the wiki is:
> > 
> > The C-compiled engine for Smith Waterman alignments 
> (Bio::Ext::Align)
> > has not been installed.
> >  Please read the install the bioperl-ext package
> > 
> > BEGIN failed--compilation aborted at
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
> > Compilation failed in require at ./align_test.pl line 3.
> > BEGIN failed--compilation aborted at ./align_test.pl line 3.
> > 
> > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
> > Align, but no Align.pm file.
> > 
> > I followed the directions in the wiki to install 1.5.2_102 
> (think I had
> > _100 installed previously).  Any thoughts on what I'm missing?
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From akarger at CGR.Harvard.edu  Fri Nov  9 09:53:02 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 9 Nov 2007 09:53:02 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
Message-ID: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>

When I tblastn ENSP00000349467 against the human genome, I get a few
hits on chr10, among which are:


 Score =  192 bits (487), Expect(2) = 5e-64
 Identities = 99/109 (90%), Positives = 99/109 (90%)
 Frame = +2

Query: 40
LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99
                L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F
VFDKDGNG
Sbjct: 71593562
LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741

Query: 100      YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148
                YIS  EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA
Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA
71593885


 Score = 75.1 bits (183), Expect(2) = 5e-64
 Identities = 36/43 (83%), Positives = 39/43 (90%)
 Frame = +1

Query: 1        MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43
                MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS  ++
Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575


As you can see from Sbjct lines, these two hits are basically
contiguous.
I was surprised to see that the bit scores and identities and alignment
lengths here are totally different but the expectation values are
identical. 

After a bit of grepping in the BLAST source, I found reference to "sum
segments" and "a collection [of] multiple distinct alignments with
asymmetric gaps between the alignments" and decided it was time to cry
for help. When does BLAST decide that two or more alignments belong
"together" and how does the affect the evalue? Is the evalue really
showing how good those two alignments combined are, despite the frame
shift? (It so happens that that's what I want.)

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

Thanks,

- Amir Karger
Research Computing
Life Sciences Division
Harvard University


From cjfields at uiuc.edu  Fri Nov  9 12:58:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Nov 2007 11:58:16 -0600
Subject: [Bioperl-l] GFF3loader and indexing
Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu>

Quick question: shouldn't the new Index attribute be passed on to  
seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping  
purposes (for instance, properly reloading dumped gff3 data)?  I'm  
testing out a feature editor using volvox.gff3 data in GBrowse and  
the mRNA features appear to drop this attribute once loaded:

Original data:

ctgA	example	gene	1050	9000	.	+	.	ID=EDEN;Name=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	ID=EDEN.1;Parent=EDEN;Name=EDEN. 
1;Note=Eden splice form 1;Index=1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=EDEN.1

partial gff3_string(1) output:

ctgA	example	gene	1050	9000	.	+	.	 
Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	Name=EDEN. 
1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=51;ID=52
...

chris


From David.Messina at sbc.su.se  Sat Nov 10 06:04:25 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 10 Nov 2007 12:04:25 +0100
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>

Hi Amir,

I don't have my BLAST book handy, and my memory is a little fuzzy, but I
think the Expect(2) you're seeing is the E-value based on both HSPs
combined. And I think this is why you see the same Expect value for both --
because it is shared between them (which sounds like what you wanted).

Again, this is just from memory, but I think this is an option that has to
be turned on rather than something which Blast decides to do on its own.


I don't know whether BioPerl reports this or not. Would you mind e-mailing
me a entire BLAST report as a sample? When I have some time I'd like to play
around with this a bit.

Thanks,
Dave


From sac at bioperl.org  Sat Nov 10 17:59:28 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Nov 2007 14:59:28 -0800
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>

The Bioperl blast parser should extract that value and you can obtain
it from an HSP object, via the HSPI::n() method, documented here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23

Dave's basically correct in his explanation. It's a result of the
application of sum statistics by the blast algorithm. You can read all
about it in Korf et al's BLAST book. Here's the relevant section:

http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1

Steve

On Nov 10, 2007 3:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Amir,
>
> I don't have my BLAST book handy, and my memory is a little fuzzy, but I
> think the Expect(2) you're seeing is the E-value based on both HSPs
> combined. And I think this is why you see the same Expect value for both --
> because it is shared between them (which sounds like what you wanted).
>
> Again, this is just from memory, but I think this is an option that has to
> be turned on rather than something which Blast decides to do on its own.
>
>
> I don't know whether BioPerl reports this or not. Would you mind e-mailing
> me a entire BLAST report as a sample? When I have some time I'd like to play
> around with this a bit.
>
> Thanks,
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bernd.web at gmail.com  Tue Nov 13 06:57:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 13 Nov 2007 12:57:04 +0100
Subject: [Bioperl-l] Panel link
Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com>

Hi,

Is it possible with Panel to provide javascript event handlers?
With -link we can provide hrefs as:
  -link => 'http://www.google.com/search?q=$description'
or use a coderef that returns a href.

However, I'd like to set-up links as:
<area .... href="#id" onmouseover="function()" onmouseout="function()">

Is this possible by default with Panel?

Regards,
Bernd


From akarger at CGR.Harvard.edu  Tue Nov 13 12:12:32 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:12:32 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A0165@huls5.nucleus.harvard.edu>

Thanks for the reply. I'm curious as to how BLAST decides to do this,
but not curious enough to buy the BLAST book.

If you want to see this, you could just tblastn the ENSP00000349467
sequence vs. the genome:
 
MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG
NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE
EVDEMIREADIDGDGQVNYEEFVQMMTAK
against the human genome at NCBI or locally.
 
I've attached the tblastn report for that protein, which includes the
results I quoted. (It was done as part of a blast of 150 proteins vs.
the genome.)
 
-Amir


________________________________

	From: dave at davemessina.com [mailto:dave at davemessina.com] On
Behalf Of Dave Messina
	Sent: Saturday, November 10, 2007 6:04 AM
	To: Amir Karger
	Cc: bioperl-l
	Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast
result?
	
	
	Hi Amir,

	I don't have my BLAST book handy, and my memory is a little
fuzzy, but I think the Expect(2) you're seeing is the E-value based on
both HSPs combined. And I think this is why you see the same Expect
value for both -- because it is shared between them (which sounds like
what you wanted). 

	Again, this is just from memory, but I think this is an option
that has to be turned on rather than something which Blast decides to do
on its own.

	 
	I don't know whether BioPerl reports this or not. Would you mind
e-mailing me a entire BLAST report as a sample? When I have some time
I'd like to play around with this a bit.

	Thanks,
	Dave


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ENSP00000349467_tblastn.txt.gz
Type: application/x-gzip
Size: 9755 bytes
Desc: ENSP00000349467_tblastn.txt.gz
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071113/f8853e76/attachment-0003.gz>

From akarger at CGR.Harvard.edu  Tue Nov 13 12:30:52 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:30:52 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>

> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf 
> Of Steve Chervitz
> 
> The Bioperl blast parser should extract that value and you can obtain
> it from an HSP object, via the HSPI::n() method, documented here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
io/Search/HSP/HSPI.html#POD23

As I mentioned in my email:

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

And the docs for n() actually say, "This value is not defined with NCBI
Blast2 with gapping" although they don't say why. Which may explain why,
when I ran the following code on the blast result I included in my last
email, I got empty values for all of the n's. (Why is n() undefined for
gapped blast if I'm getting n's in my results from that blast?)

use warnings;
use strict;
use Bio::SearchIO;

my $blast_out = $ARGV[0];
my $in = new Bio::SearchIO(-format => 'blast',
                            -file   => $blast_out,
                            -report_type => 'tblastn');

print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
Evalue)), "\n";
while(my $query = $in->next_result) {
    while(my $subject = $query->next_hit) {
        while (my $hsp = $subject->next_hsp) {
            print join("\t",
                $query->query_name,
                $hsp->start("query"),
                $hsp->end("query"),
                $hsp->strand("hit"),
                $subject->name,
                $hsp->start("hit"),
                $hsp->end("hit"),
                $subject->frame,
                $hsp->n,
                $hsp->evalue,
            ),"\n";
        }
    }
}

> Dave's basically correct in his explanation. It's a result of the
> application of sum statistics by the blast algorithm. You can read all
> about it in Korf et al's BLAST book. Here's the relevant section:

[snip]

Thanks,

-Amir


From cjfields at uiuc.edu  Tue Nov 13 12:42:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Nov 2007 11:42:07 -0600
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
	<B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>

Amir,

Can you file this as a bug?  Dave mentioned he would look into it but  
I think it warrants tracking to make sure it gets fixed:

http://www.bioperl.org/wiki/Bugs

Attach the example BLAST report from your last post to the report.   
BTW, I wonder how this appears in XML output?

chris

On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:

>> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
>> Of Steve Chervitz
>>
>> The Bioperl blast parser should extract that value and you can obtain
>> it from an HSP object, via the HSPI::n() method, documented here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/Search/HSP/HSPI.html#POD23
>
> As I mentioned in my email:
>
> And does anyone know off-hand if Bioperl will tell me when situations
> like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> subroutine
> would help, but I just get a bunch of empty strings for that,  
> whether or
> not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> {"_n"} is
> undef.)
>
> And the docs for n() actually say, "This value is not defined with  
> NCBI
> Blast2 with gapping" although they don't say why. Which may explain  
> why,
> when I ran the following code on the blast result I included in my  
> last
> email, I got empty values for all of the n's. (Why is n() undefined  
> for
> gapped blast if I'm getting n's in my results from that blast?)
>
> use warnings;
> use strict;
> use Bio::SearchIO;
>
> my $blast_out = $ARGV[0];
> my $in = new Bio::SearchIO(-format => 'blast',
>                             -file   => $blast_out,
>                             -report_type => 'tblastn');
>
> print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
> Evalue)), "\n";
> while(my $query = $in->next_result) {
>     while(my $subject = $query->next_hit) {
>         while (my $hsp = $subject->next_hsp) {
>             print join("\t",
>                 $query->query_name,
>                 $hsp->start("query"),
>                 $hsp->end("query"),
>                 $hsp->strand("hit"),
>                 $subject->name,
>                 $hsp->start("hit"),
>                 $hsp->end("hit"),
>                 $subject->frame,
>                 $hsp->n,
>                 $hsp->evalue,
>             ),"\n";
>         }
>     }
> }
>
>> Dave's basically correct in his explanation. It's a result of the
>> application of sum statistics by the blast algorithm. You can read  
>> all
>> about it in Korf et al's BLAST book. Here's the relevant section:
>
> [snip]
>
> Thanks,
>
> -Amir
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lskatz at gatech.edu  Tue Nov 13 20:27:45 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Tue, 13 Nov 2007 20:27:45 -0500
Subject: [Bioperl-l] chromatogram
Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>

Hi,
I would like to know how to draw a chromatogram file.  Does anyone
have any sample code where you read in an scf file and create a jpeg
or other image file?
For that matter, I want to be able to customize these images with base
calls if possible.  I really appreciate the help, so thanks!

-- 
Lee Katz


From mvrmakam at yahoo.com  Wed Nov 14 04:52:13 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST)
Subject: [Bioperl-l] Installing Bioperl on Windows XP
Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com>

Hi,

I am encountering a problem while installing Bioperl on Windows XP.  I have installed ActivePerl version 5.8.8.822.  I am using Perl Package Manager GUI.  Also, I am following the instructions outlined for installing Bioperl on Windows.  I am getting an error.  The error is as follows:

 Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com')

I do not know how to overcome this problem.  The other issue is when I type bioperl in the search box I do not see any packages of bioperl.  I do not know what the problem is.  If anyone of you could guide me through the installation process I would appreciate it.

Thanks,

Roshan


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/


From cjfields at uiuc.edu  Wed Nov 14 09:02:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Nov 2007 08:02:05 -0600
Subject: [Bioperl-l] Installing Bioperl on Windows XP
In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com>
References: <235423.72586.qm@web33703.mail.mud.yahoo.com>
Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu>

The instructions are pretty specific:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Note the section on adding new repositories.  As for the PPM  
connection error, it's more than likely an error with the default  
address but it isn't bioperl-related; maybe answers lie here:

http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- 
faq2.html#ppm_repositories

chris

On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote:

> Hi,
>
> I am encountering a problem while installing Bioperl on Windows  
> XP.  I have installed ActivePerl version 5.8.8.822.  I am using  
> Perl Package Manager GUI.  Also, I am following the instructions  
> outlined for installing Bioperl on Windows.  I am getting an  
> error.  The error is as follows:
>
>  Downloading ActiveState Package Repository packlist ... failed 500  
> Can't connect to ppm4.activestate.com:80 (Bad hostname  
> 'ppm4.activestate.com')
>
> I do not know how to overcome this problem.  The other issue is  
> when I type bioperl in the search box I do not see any packages of  
> bioperl.  I do not know what the problem is.  If anyone of you  
> could guide me through the installation process I would appreciate it.
>
> Thanks,
>
> Roshan


From reshetovdenis at gmail.com  Wed Nov 14 12:28:40 2007
From: reshetovdenis at gmail.com (Denis Reshetov)
Date: Wed, 14 Nov 2007 20:28:40 +0300
Subject: [Bioperl-l] how to load all genomes
Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>

Dear BioPerl-db Creators,

I`m trying to load all genomes from NCBI ftp site
to my BioSql database using common script load_seqdatabase.pl

But it seems very slow. Let me know what is the better way to do it?

Thank you very much,

Denis.


From barry.moore at genetics.utah.edu  Wed Nov 14 14:18:29 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 14 Nov 2007 12:18:29 -0700
Subject: [Bioperl-l] how to load all genomes
In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu>

Denis,

You might be interested in this thread from a couple years ago.  I  
was having a similar problem, that I eventually resolved.   
Unfortunately the reason for the problem and the solution weren't  
entirely clear, but you may be able to glean some ideas from it.   
Also, you may have already done this, but I suggest searching the  
archives from this list because it seems like this comes up every now  
and then, so there may be other postings similar to the one I'm  
sending you that could help you.

http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html

Finally, if you are still having problems, you'll want to include a  
few more details about your situation.  What DB are you using, have  
you preloaded taxonomy data etc. How fast/slow are your sequences  
loading?

Barry

On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote:

> Dear BioPerl-db Creators,
>
> I`m trying to load all genomes from NCBI ftp site
> to my BioSql database using common script load_seqdatabase.pl
>
> But it seems very slow. Let me know what is the better way to do it?
>
> Thank you very much,
>
> Denis.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 14:57:49 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 08:57:49 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>

Here's my trace viewer.
Please excuse my dodgy Perl and debugging code as it's still under
development  :-)


Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz


------------------------------------------------------------------------
------------------

#!perl -w
use ABI;

use GD::Graph::lines;
use GD::Graph::colour;
use GD::Graph::Data;

use Data::Dumper;


use Getopt::Long;

use constant HEIGHT => 300;

GetOptions ('h|height=i' => \$HEIGHT,
            'f|file=s' => \$FILE,
            'o|out=s' => \$OUTFILE,
            'l|left=s' => \$LEFT_SEQ,
            'r|right=s' => \$RIGHT_SEQ,
            's|size=i' => \$SIZE,
            ) || die <<USAGE;
Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
test2.png -l actacgtacgta -r atgatcgtacgtac
or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
--out test2.png --left actacgtacgta --right atgatcgtacgtac

Options:
--height <pixels> Set height of image (${\HEIGHT} pixels default)
--file <trace file-name> Filename for the ABI trace file
--out <output file-name> Filename for the generated .png image
--left <left end sequence>
--right <right end sequence>
--size <size of clipped fasta sequence>

Parse an ABI trace file and render a PNG image.
See http://search.cpan.org/dist/ABI/ABI.pm
    or
    http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
USAGE

my $height = $HEIGHT || HEIGHT;
my $file = $FILE;
my $outfile = $OUTFILE;

my $abi = ABI->new(-file=> $file);

my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"

my @base_calls = $abi->get_base_calls(); # Get the base calls
my $sequence =$abi->get_sequence();
@bp = split(//, $sequence);


# iterate over array
$size = $abi->get_trace_length();
for ($i=0,$count = 0; $i<$size; $i++) {
     if(grep(/\b$i\b/, @base_calls)){
       $bases[$i] = $bp[$count];
       $count++;
     }else{
       $bases[$i] = ' ';
     }
}

# create the data. see GD::Graph::Data for details of the format
my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );

my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
   $graph->set(
   title => $abi->get_sample_name(),
#	y_max_value => $abi->get_max_trace() + 50,
	x_max_value => $abi->get_trace_length(),
	t_margin => 5,
    b_margin => 5,
    l_margin => 5,
    r_margin => 5,
    x_ticks => 0,
    text_space => 0,
	line_width 	=> 1,
	transparent	=> 0,
	b_margin => 30,
	t_margin => 35,
	x_plot_values => 0,
	interlaced => 1,
);

# allocate some colors for drawing the bases
#use colors same as Chromas
$graph->set( dclrs => [ qw( green blue black red pink) ] );

#plot the data
my $gd = $graph->plot(\@data);

$black = $gd->colorAllocate(0,0,0);       # A
$blue = $gd->colorAllocate(0,0,255);      # C
$red = $gd->colorAllocate(255,0,0);       # G
$green = $gd->colorAllocate(0,255,0);     # T
$magenta =$gd->colorAllocate(255,0,255);  # N
$white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
$gray = $gd->colorAllocate(210,210,210);
%colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
$magenta, " ",$white);

#$start_base = index(lc($sequence),lc($LEFT_SEQ));
$start_base = find_match($sequence,$LEFT_SEQ);

#if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
$end_base = find_match($sequence,$RIGHT_SEQ, 1);
if($end_base){
 $end_base += length($RIGHT_SEQ);
}


# get the coords of the features on the image
@coords = $graph->get_hotspot(1);
$size = @coords;
$printed_num = 1;
$basecount = 0;
$numstoprint = $basecount - $start_base;

# draw the colored bases and scale at top and bottom of image
for ($i=0,$count = 0; $i<$size; $i++) {
  $c = $coords[$i];
  (undef, $xs, undef, undef, undef, undef) = @$c;
  $base = $bases[$i];
  if($base =~ /[ACGTN]/){
   if($start_base - 1 == $basecount){$start_base_coord = $xs;}
   if($end_base - 1 == $basecount){$end_base_coord = $xs;}
   if(defined($SIZE) && $start_base+$SIZE -2 ==
$basecount){$end_base_coord_by_size = $xs;}
   $basecount++;
   $numstoprint++;
   $printed_num = 0;
  }
  # print the bases top and bottom
  $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
  $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});

  # print scale
  if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
    if($LEFT_SEQ){
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }else{
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }
  }
  $top_right_corner = $xs;
}


# only draw the clipped region if the calculated size is + or - 6bp
#if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
- $SIZE >= -6 ){
# draw the clipped regions as gray
  #if LEFT_SEQ supplied and a match found
  if($LEFT_SEQ && $start_base > 0){
     $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
33,$red);
     $clipped = 1;
  }
 #if RIGHT_SEQ supplied and a match found
 if($RIGHT_SEQ && $end_base > 0){
   print join("\t", ($end_base)),"\n";
   $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
33,$gray);
   $clipped = 1;
 }
 #if no RIGHT_SEQ supplied or no match found, use left match + seq
length
 if(!$RIGHT_SEQ || $end_base < 0){
 
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
t - 33,$blue);
  $clipped = 1;
 }
 

# set height based on max trace within clipped region
   $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);

  # need to re-plot the data over the grayed out area
  $graph->plot(\@data) if $clipped;
  $gd->filledRectangle(0,0,$top_right_corner,33,$white);

#}

#print the graph
open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
binmode OUT;
print OUT $gd->png;
close OUT;


sub find_match{
  my ($sequence,$query,$last) = @_;
  return -1 if length($query) < 6;
  my($odds, $evens, $ones, $twos, $threes, $match_pos);
    # try exact match
    $match_pos = do_regex($query, $sequence,$last); return $match_pos if
$match_pos > 0;

    # try matching every second base starting from the second base e.g.
it will be .C.T.C.G.etc
    map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
($query=~m/(\w\w)/g);
    $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
if $match_pos > 0;

    # try matching every third base starting from the first base e.g. it
will be C..T..G..T etc
    map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
$threes.="..$3"} ($query =~m/(\w\w\w)/g);
    $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($threes, $sequence,$last); return $match_pos
if $match_pos > 0;

     # not found
     return -1;
}

sub do_regex(){
	my ($query,$sequence,$last)= @_;
    #print "trying $query \n";
    my $result = -1;
      $result = pos($sequence)-length($query)+1 if $last && ($sequence
=~ m/.*($query)/ig);
      $result = pos($sequence)-length($query)+1 if($sequence =~
m/.*?($query)/ig);
    return $result;
}

------------------------------------------------------------------------
------------------

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 15:47:20 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 15:47:20 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
Message-ID: <473B5ED8.1090201@mail.nih.gov>

I guess you need chromatogram from SCF. I can't help in that. ABI.pm is 
not in Bioperl distribution. But to make the record straight, you can 
use one step chromatogram drawing in SVG from ABI file using my BioSVG
module, available at:

http://www.bioinformatics.org/~malay/biosvg/

Malay


Smithies, Russell wrote:
> Here's my trace viewer.
> Please excuse my dodgy Perl and debugging code as it's still under
> development  :-)
> 
> 
> Russell Smithies
> 
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> 
> Invermay  Research Centre
> Puddle Alley, 
> Mosgiel, 
> New Zealand
> T  +64 3 489 3809   
> F  +64 3 489 9174  
> www.agresearch.co.nz
> 
> 
> ------------------------------------------------------------------------
> ------------------
> 
> #!perl -w
> use ABI;
> 
> use GD::Graph::lines;
> use GD::Graph::colour;
> use GD::Graph::Data;
> 
> use Data::Dumper;
> 
> 
> use Getopt::Long;
> 
> use constant HEIGHT => 300;
> 
> GetOptions ('h|height=i' => \$HEIGHT,
>             'f|file=s' => \$FILE,
>             'o|out=s' => \$OUTFILE,
>             'l|left=s' => \$LEFT_SEQ,
>             'r|right=s' => \$RIGHT_SEQ,
>             's|size=i' => \$SIZE,
>             ) || die <<USAGE;
> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> test2.png -l actacgtacgta -r atgatcgtacgtac
> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> --out test2.png --left actacgtacgta --right atgatcgtacgtac
> 
> Options:
> --height <pixels> Set height of image (${\HEIGHT} pixels default)
> --file <trace file-name> Filename for the ABI trace file
> --out <output file-name> Filename for the generated .png image
> --left <left end sequence>
> --right <right end sequence>
> --size <size of clipped fasta sequence>
> 
> Parse an ABI trace file and render a PNG image.
> See http://search.cpan.org/dist/ABI/ABI.pm
>     or
>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> USAGE
> 
> my $height = $HEIGHT || HEIGHT;
> my $file = $FILE;
> my $outfile = $OUTFILE;
> 
> my $abi = ABI->new(-file=> $file);
> 
> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> 
> my @base_calls = $abi->get_base_calls(); # Get the base calls
> my $sequence =$abi->get_sequence();
> @bp = split(//, $sequence);
> 
> 
> 
> # iterate over array
> $size = $abi->get_trace_length();
> for ($i=0,$count = 0; $i<$size; $i++) {
>      if(grep(/\b$i\b/, @base_calls)){
>        $bases[$i] = $bp[$count];
>        $count++;
>      }else{
>        $bases[$i] = ' ';
>      }
> }
> 
> # create the data. see GD::Graph::Data for details of the format
> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> 
> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>    $graph->set(
>    title => $abi->get_sample_name(),
> #	y_max_value => $abi->get_max_trace() + 50,
> 	x_max_value => $abi->get_trace_length(),
> 	t_margin => 5,
>     b_margin => 5,
>     l_margin => 5,
>     r_margin => 5,
>     x_ticks => 0,
>     text_space => 0,
> 	line_width 	=> 1,
> 	transparent	=> 0,
> 	b_margin => 30,
> 	t_margin => 35,
> 	x_plot_values => 0,
> 	interlaced => 1,
> );
> 
> # allocate some colors for drawing the bases
> #use colors same as Chromas
> $graph->set( dclrs => [ qw( green blue black red pink) ] );
> 
> #plot the data
> my $gd = $graph->plot(\@data);
> 
> $black = $gd->colorAllocate(0,0,0);       # A
> $blue = $gd->colorAllocate(0,0,255);      # C
> $red = $gd->colorAllocate(255,0,0);       # G
> $green = $gd->colorAllocate(0,255,0);     # T
> $magenta =$gd->colorAllocate(255,0,255);  # N
> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> $gray = $gd->colorAllocate(210,210,210);
> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> $magenta, " ",$white);
> 
> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> $start_base = find_match($sequence,$LEFT_SEQ);
> 
> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> if($end_base){
>  $end_base += length($RIGHT_SEQ);
> }
> 
> 
> # get the coords of the features on the image
> @coords = $graph->get_hotspot(1);
> $size = @coords;
> $printed_num = 1;
> $basecount = 0;
> $numstoprint = $basecount - $start_base;
> 
> # draw the colored bases and scale at top and bottom of image
> for ($i=0,$count = 0; $i<$size; $i++) {
>   $c = $coords[$i];
>   (undef, $xs, undef, undef, undef, undef) = @$c;
>   $base = $bases[$i];
>   if($base =~ /[ACGTN]/){
>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>    if(defined($SIZE) && $start_base+$SIZE -2 ==
> $basecount){$end_base_coord_by_size = $xs;}
>    $basecount++;
>    $numstoprint++;
>    $printed_num = 0;
>   }
>   # print the bases top and bottom
>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>   $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});
> 
>   # print scale
>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>     if($LEFT_SEQ){
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }else{
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }
>   }
>   $top_right_corner = $xs;
> }
> 
> 
> 
> # only draw the clipped region if the calculated size is + or - 6bp
> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
> - $SIZE >= -6 ){
> # draw the clipped regions as gray
>   #if LEFT_SEQ supplied and a match found
>   if($LEFT_SEQ && $start_base > 0){
>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> 33,$red);
>      $clipped = 1;
>   }
>  #if RIGHT_SEQ supplied and a match found
>  if($RIGHT_SEQ && $end_base > 0){
>    print join("\t", ($end_base)),"\n";
>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
> 33,$gray);
>    $clipped = 1;
>  }
>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> length
>  if(!$RIGHT_SEQ || $end_base < 0){
>  
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> t - 33,$blue);
>   $clipped = 1;
>  }
>  
> 
> 
> # set height based on max trace within clipped region
>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);
> 
>   # need to re-plot the data over the grayed out area
>   $graph->plot(\@data) if $clipped;
>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> 
> #}
> 
> #print the graph
> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> binmode OUT;
> print OUT $gd->png;
> close OUT;
> 
> 
> sub find_match{
>   my ($sequence,$query,$last) = @_;
>   return -1 if length($query) < 6;
>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>     # try exact match
>     $match_pos = do_regex($query, $sequence,$last); return $match_pos if
> $match_pos > 0;
> 
>     # try matching every second base starting from the second base e.g.
> it will be .C.T.C.G.etc
>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> ($query=~m/(\w\w)/g);
>     $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
> if $match_pos > 0;
> 
>     # try matching every third base starting from the first base e.g. it
> will be C..T..G..T etc
>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>     $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($threes, $sequence,$last); return $match_pos
> if $match_pos > 0;
> 
>      # not found
>      return -1;
> }
> 
> sub do_regex(){
> 	my ($query,$sequence,$last)= @_;
>     #print "trying $query \n";
>     my $result = -1;
>       $result = pos($sequence)-length($query)+1 if $last && ($sequence
> =~ m/.*($query)/ig);
>       $result = pos($sequence)-length($query)+1 if($sequence =~
> m/.*?($query)/ig);
>     return $result;
> }
> 
> ------------------------------------------------------------------------
> ------------------
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Lee Katz
>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] chromatogram
>>
>> Hi,
>> I would like to know how to draw a chromatogram file.  Does anyone
>> have any sample code where you read in an scf file and create a jpeg
>> or other image file?
>> For that matter, I want to be able to customize these images with base
>> calls if possible.  I really appreciate the help, so thanks!
>>
>> --
>> Lee Katz
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Malay K Basu
www.malaybasu.net


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 15:58:19 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 09:58:19 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B5ED8.1090201@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>


We try and avoid SVG at all costs as installing plugins and viewers in a
locked down corporate environment can be more trouble than it's worth
whereas generating .png images works for any browser with no extras
required.
We actually call this trace drawing code from Python which then
generates webpages with the embedded image. 
It also means we don't need to licence, install and maintain a trace
viewer like Chromas.
:-)

Russell

> -----Original Message-----
> From: Malay [mailto:mbasu at mail.nih.gov]
> Sent: Thursday, 15 November 2007 9:47 a.m.
> To: Smithies, Russell
> Cc: Lee Katz; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] chromatogram
> 
> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
is
> not in Bioperl distribution. But to make the record straight, you can
> use one step chromatogram drawing in SVG from ABI file using my BioSVG
> module, available at:
> 
> http://www.bioinformatics.org/~malay/biosvg/
> 
> Malay
> 
> 
> 
> 
> Smithies, Russell wrote:
> > Here's my trace viewer.
> > Please excuse my dodgy Perl and debugging code as it's still under
> > development  :-)
> >
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> > #!perl -w
> > use ABI;
> >
> > use GD::Graph::lines;
> > use GD::Graph::colour;
> > use GD::Graph::Data;
> >
> > use Data::Dumper;
> >
> >
> > use Getopt::Long;
> >
> > use constant HEIGHT => 300;
> >
> > GetOptions ('h|height=i' => \$HEIGHT,
> >             'f|file=s' => \$FILE,
> >             'o|out=s' => \$OUTFILE,
> >             'l|left=s' => \$LEFT_SEQ,
> >             'r|right=s' => \$RIGHT_SEQ,
> >             's|size=i' => \$SIZE,
> >             ) || die <<USAGE;
> > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> > test2.png -l actacgtacgta -r atgatcgtacgtac
> > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> > --out test2.png --left actacgtacgta --right atgatcgtacgtac
> >
> > Options:
> > --height <pixels> Set height of image (${\HEIGHT} pixels default)
> > --file <trace file-name> Filename for the ABI trace file
> > --out <output file-name> Filename for the generated .png image
> > --left <left end sequence>
> > --right <right end sequence>
> > --size <size of clipped fasta sequence>
> >
> > Parse an ABI trace file and render a PNG image.
> > See http://search.cpan.org/dist/ABI/ABI.pm
> >     or
> >     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> > USAGE
> >
> > my $height = $HEIGHT || HEIGHT;
> > my $file = $FILE;
> > my $outfile = $OUTFILE;
> >
> > my $abi = ABI->new(-file=> $file);
> >
> > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> >
> > my @base_calls = $abi->get_base_calls(); # Get the base calls
> > my $sequence =$abi->get_sequence();
> > @bp = split(//, $sequence);
> >
> >
> >
> > # iterate over array
> > $size = $abi->get_trace_length();
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >      if(grep(/\b$i\b/, @base_calls)){
> >        $bases[$i] = $bp[$count];
> >        $count++;
> >      }else{
> >        $bases[$i] = ' ';
> >      }
> > }
> >
> > # create the data. see GD::Graph::Data for details of the format
> > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> >
> > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
> >    $graph->set(
> >    title => $abi->get_sample_name(),
> > #	y_max_value => $abi->get_max_trace() + 50,
> > 	x_max_value => $abi->get_trace_length(),
> > 	t_margin => 5,
> >     b_margin => 5,
> >     l_margin => 5,
> >     r_margin => 5,
> >     x_ticks => 0,
> >     text_space => 0,
> > 	line_width 	=> 1,
> > 	transparent	=> 0,
> > 	b_margin => 30,
> > 	t_margin => 35,
> > 	x_plot_values => 0,
> > 	interlaced => 1,
> > );
> >
> > # allocate some colors for drawing the bases
> > #use colors same as Chromas
> > $graph->set( dclrs => [ qw( green blue black red pink) ] );
> >
> > #plot the data
> > my $gd = $graph->plot(\@data);
> >
> > $black = $gd->colorAllocate(0,0,0);       # A
> > $blue = $gd->colorAllocate(0,0,255);      # C
> > $red = $gd->colorAllocate(255,0,0);       # G
> > $green = $gd->colorAllocate(0,255,0);     # T
> > $magenta =$gd->colorAllocate(255,0,255);  # N
> > $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> > $gray = $gd->colorAllocate(210,210,210);
> > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> > $magenta, " ",$white);
> >
> > #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> > $start_base = find_match($sequence,$LEFT_SEQ);
> >
> > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> > $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> > if($end_base){
> >  $end_base += length($RIGHT_SEQ);
> > }
> >
> >
> > # get the coords of the features on the image
> > @coords = $graph->get_hotspot(1);
> > $size = @coords;
> > $printed_num = 1;
> > $basecount = 0;
> > $numstoprint = $basecount - $start_base;
> >
> > # draw the colored bases and scale at top and bottom of image
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >   $c = $coords[$i];
> >   (undef, $xs, undef, undef, undef, undef) = @$c;
> >   $base = $bases[$i];
> >   if($base =~ /[ACGTN]/){
> >    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
> >    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
> >    if(defined($SIZE) && $start_base+$SIZE -2 ==
> > $basecount){$end_base_coord_by_size = $xs;}
> >    $basecount++;
> >    $numstoprint++;
> >    $printed_num = 0;
> >   }
> >   # print the bases top and bottom
> >   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
> >   $gd->string(GD::Font->Small(),$xs,$height -
30,$base,$colors{$base});
> >
> >   # print scale
> >   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
> >     if($LEFT_SEQ){
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }else{
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }
> >   }
> >   $top_right_corner = $xs;
> > }
> >
> >
> >
> > # only draw the clipped region if the calculated size is + or - 6bp
> > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
$start_base)
> > - $SIZE >= -6 ){
> > # draw the clipped regions as gray
> >   #if LEFT_SEQ supplied and a match found
> >   if($LEFT_SEQ && $start_base > 0){
> >      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> > 33,$red);
> >      $clipped = 1;
> >   }
> >  #if RIGHT_SEQ supplied and a match found
> >  if($RIGHT_SEQ && $end_base > 0){
> >    print join("\t", ($end_base)),"\n";
> >    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
-
> > 33,$gray);
> >    $clipped = 1;
> >  }
> >  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> > length
> >  if(!$RIGHT_SEQ || $end_base < 0){
> >
> >
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> > t - 33,$blue);
> >   $clipped = 1;
> >  }
> >
> >
> >
> > # set height based on max trace within clipped region
> >    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
50);
> >
> >   # need to re-plot the data over the grayed out area
> >   $graph->plot(\@data) if $clipped;
> >   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> >
> > #}
> >
> > #print the graph
> > open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> > binmode OUT;
> > print OUT $gd->png;
> > close OUT;
> >
> >
> > sub find_match{
> >   my ($sequence,$query,$last) = @_;
> >   return -1 if length($query) < 6;
> >   my($odds, $evens, $ones, $twos, $threes, $match_pos);
> >     # try exact match
> >     $match_pos = do_regex($query, $sequence,$last); return
$match_pos if
> > $match_pos > 0;
> >
> >     # try matching every second base starting from the second base
e.g.
> > it will be .C.T.C.G.etc
> >     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> > ($query=~m/(\w\w)/g);
> >     $match_pos = do_regex($odds, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($evens, $sequence,$last);  return
$match_pos
> > if $match_pos > 0;
> >
> >     # try matching every third base starting from the first base
e.g. it
> > will be C..T..G..T etc
> >     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> > $threes.="..$3"} ($query =~m/(\w\w\w)/g);
> >     $match_pos = do_regex($ones, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($twos, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($threes, $sequence,$last); return
$match_pos
> > if $match_pos > 0;
> >
> >      # not found
> >      return -1;
> > }
> >
> > sub do_regex(){
> > 	my ($query,$sequence,$last)= @_;
> >     #print "trying $query \n";
> >     my $result = -1;
> >       $result = pos($sequence)-length($query)+1 if $last &&
($sequence
> > =~ m/.*($query)/ig);
> >       $result = pos($sequence)-length($query)+1 if($sequence =~
> > m/.*?($query)/ig);
> >     return $result;
> > }
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-
> >> bio.org] On Behalf Of Lee Katz
> >> Sent: Wednesday, 14 November 2007 2:28 p.m.
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] chromatogram
> >>
> >> Hi,
> >> I would like to know how to draw a chromatogram file.  Does anyone
> >> have any sample code where you read in an scf file and create a
jpeg
> >> or other image file?
> >> For that matter, I want to be able to customize these images with
base
> >> calls if possible.  I really appreciate the help, so thanks!
> >>
> >> --
> >> Lee Katz
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> =============================================================
> ==========
> > Attention: The information contained in this message and/or
attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
privileged
> > material. Any review, retransmission, dissemination or other use of,
or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> >
> =============================================================
> ==========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> --
> Malay K Basu
> www.malaybasu.net

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 16:04:25 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 16:04:25 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
Message-ID: <473B62D9.8010004@mail.nih.gov>

You don't need any plugin. Firefox natively can show most of the SVG files.

-Malay

Smithies, Russell wrote:
> We try and avoid SVG at all costs as installing plugins and viewers in a
> locked down corporate environment can be more trouble than it's worth
> whereas generating .png images works for any browser with no extras
> required.
> We actually call this trace drawing code from Python which then
> generates webpages with the embedded image. 
> It also means we don't need to licence, install and maintain a trace
> viewer like Chromas.
> :-)
> 
> Russell
> 
>> -----Original Message-----
>> From: Malay [mailto:mbasu at mail.nih.gov]
>> Sent: Thursday, 15 November 2007 9:47 a.m.
>> To: Smithies, Russell
>> Cc: Lee Katz; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] chromatogram
>>
>> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
> is
>> not in Bioperl distribution. But to make the record straight, you can
>> use one step chromatogram drawing in SVG from ABI file using my BioSVG
>> module, available at:
>>
>> http://www.bioinformatics.org/~malay/biosvg/
>>
>> Malay
>>
>>
>>
>>
>> Smithies, Russell wrote:
>>> Here's my trace viewer.
>>> Please excuse my dodgy Perl and debugging code as it's still under
>>> development  :-)
>>>
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>> #!perl -w
>>> use ABI;
>>>
>>> use GD::Graph::lines;
>>> use GD::Graph::colour;
>>> use GD::Graph::Data;
>>>
>>> use Data::Dumper;
>>>
>>>
>>> use Getopt::Long;
>>>
>>> use constant HEIGHT => 300;
>>>
>>> GetOptions ('h|height=i' => \$HEIGHT,
>>>             'f|file=s' => \$FILE,
>>>             'o|out=s' => \$OUTFILE,
>>>             'l|left=s' => \$LEFT_SEQ,
>>>             'r|right=s' => \$RIGHT_SEQ,
>>>             's|size=i' => \$SIZE,
>>>             ) || die <<USAGE;
>>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
>>> test2.png -l actacgtacgta -r atgatcgtacgtac
>>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
>>> --out test2.png --left actacgtacgta --right atgatcgtacgtac
>>>
>>> Options:
>>> --height <pixels> Set height of image (${\HEIGHT} pixels default)
>>> --file <trace file-name> Filename for the ABI trace file
>>> --out <output file-name> Filename for the generated .png image
>>> --left <left end sequence>
>>> --right <right end sequence>
>>> --size <size of clipped fasta sequence>
>>>
>>> Parse an ABI trace file and render a PNG image.
>>> See http://search.cpan.org/dist/ABI/ABI.pm
>>>     or
>>>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
>>> USAGE
>>>
>>> my $height = $HEIGHT || HEIGHT;
>>> my $file = $FILE;
>>> my $outfile = $OUTFILE;
>>>
>>> my $abi = ABI->new(-file=> $file);
>>>
>>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
>>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
>>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
>>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
>>>
>>> my @base_calls = $abi->get_base_calls(); # Get the base calls
>>> my $sequence =$abi->get_sequence();
>>> @bp = split(//, $sequence);
>>>
>>>
>>>
>>> # iterate over array
>>> $size = $abi->get_trace_length();
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>      if(grep(/\b$i\b/, @base_calls)){
>>>        $bases[$i] = $bp[$count];
>>>        $count++;
>>>      }else{
>>>        $bases[$i] = ' ';
>>>      }
>>> }
>>>
>>> # create the data. see GD::Graph::Data for details of the format
>>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
>>>
>>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>>>    $graph->set(
>>>    title => $abi->get_sample_name(),
>>> #	y_max_value => $abi->get_max_trace() + 50,
>>> 	x_max_value => $abi->get_trace_length(),
>>> 	t_margin => 5,
>>>     b_margin => 5,
>>>     l_margin => 5,
>>>     r_margin => 5,
>>>     x_ticks => 0,
>>>     text_space => 0,
>>> 	line_width 	=> 1,
>>> 	transparent	=> 0,
>>> 	b_margin => 30,
>>> 	t_margin => 35,
>>> 	x_plot_values => 0,
>>> 	interlaced => 1,
>>> );
>>>
>>> # allocate some colors for drawing the bases
>>> #use colors same as Chromas
>>> $graph->set( dclrs => [ qw( green blue black red pink) ] );
>>>
>>> #plot the data
>>> my $gd = $graph->plot(\@data);
>>>
>>> $black = $gd->colorAllocate(0,0,0);       # A
>>> $blue = $gd->colorAllocate(0,0,255);      # C
>>> $red = $gd->colorAllocate(255,0,0);       # G
>>> $green = $gd->colorAllocate(0,255,0);     # T
>>> $magenta =$gd->colorAllocate(255,0,255);  # N
>>> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
>>> $gray = $gd->colorAllocate(210,210,210);
>>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
>>> $magenta, " ",$white);
>>>
>>> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
>>> $start_base = find_match($sequence,$LEFT_SEQ);
>>>
>>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
>>> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
>>> if($end_base){
>>>  $end_base += length($RIGHT_SEQ);
>>> }
>>>
>>>
>>> # get the coords of the features on the image
>>> @coords = $graph->get_hotspot(1);
>>> $size = @coords;
>>> $printed_num = 1;
>>> $basecount = 0;
>>> $numstoprint = $basecount - $start_base;
>>>
>>> # draw the colored bases and scale at top and bottom of image
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>   $c = $coords[$i];
>>>   (undef, $xs, undef, undef, undef, undef) = @$c;
>>>   $base = $bases[$i];
>>>   if($base =~ /[ACGTN]/){
>>>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>>>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>>>    if(defined($SIZE) && $start_base+$SIZE -2 ==
>>> $basecount){$end_base_coord_by_size = $xs;}
>>>    $basecount++;
>>>    $numstoprint++;
>>>    $printed_num = 0;
>>>   }
>>>   # print the bases top and bottom
>>>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>>>   $gd->string(GD::Font->Small(),$xs,$height -
> 30,$base,$colors{$base});
>>>   # print scale
>>>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>>>     if($LEFT_SEQ){
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }else{
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }
>>>   }
>>>   $top_right_corner = $xs;
>>> }
>>>
>>>
>>>
>>> # only draw the clipped region if the calculated size is + or - 6bp
>>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
> $start_base)
>>> - $SIZE >= -6 ){
>>> # draw the clipped regions as gray
>>>   #if LEFT_SEQ supplied and a match found
>>>   if($LEFT_SEQ && $start_base > 0){
>>>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
>>> 33,$red);
>>>      $clipped = 1;
>>>   }
>>>  #if RIGHT_SEQ supplied and a match found
>>>  if($RIGHT_SEQ && $end_base > 0){
>>>    print join("\t", ($end_base)),"\n";
>>>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
> -
>>> 33,$gray);
>>>    $clipped = 1;
>>>  }
>>>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
>>> length
>>>  if(!$RIGHT_SEQ || $end_base < 0){
>>>
>>>
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
>>> t - 33,$blue);
>>>   $clipped = 1;
>>>  }
>>>
>>>
>>>
>>> # set height based on max trace within clipped region
>>>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
> 50);
>>>   # need to re-plot the data over the grayed out area
>>>   $graph->plot(\@data) if $clipped;
>>>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
>>>
>>> #}
>>>
>>> #print the graph
>>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
>>> binmode OUT;
>>> print OUT $gd->png;
>>> close OUT;
>>>
>>>
>>> sub find_match{
>>>   my ($sequence,$query,$last) = @_;
>>>   return -1 if length($query) < 6;
>>>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>>>     # try exact match
>>>     $match_pos = do_regex($query, $sequence,$last); return
> $match_pos if
>>> $match_pos > 0;
>>>
>>>     # try matching every second base starting from the second base
> e.g.
>>> it will be .C.T.C.G.etc
>>>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
>>> ($query=~m/(\w\w)/g);
>>>     $match_pos = do_regex($odds, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($evens, $sequence,$last);  return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>     # try matching every third base starting from the first base
> e.g. it
>>> will be C..T..G..T etc
>>>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
>>> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>>>     $match_pos = do_regex($ones, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($twos, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($threes, $sequence,$last); return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>      # not found
>>>      return -1;
>>> }
>>>
>>> sub do_regex(){
>>> 	my ($query,$sequence,$last)= @_;
>>>     #print "trying $query \n";
>>>     my $result = -1;
>>>       $result = pos($sequence)-length($query)+1 if $last &&
> ($sequence
>>> =~ m/.*($query)/ig);
>>>       $result = pos($sequence)-length($query)+1 if($sequence =~
>>> m/.*?($query)/ig);
>>>     return $result;
>>> }
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-
>>>> bio.org] On Behalf Of Lee Katz
>>>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] chromatogram
>>>>
>>>> Hi,
>>>> I would like to know how to draw a chromatogram file.  Does anyone
>>>> have any sample code where you read in an scf file and create a
> jpeg
>>>> or other image file?
>>>> For that matter, I want to be able to customize these images with
> base
>>>> calls if possible.  I really appreciate the help, so thanks!
>>>>
>>>> --
>>>> Lee Katz
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =============================================================
>> ==========
>>> Attention: The information contained in this message and/or
> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
> privileged
>>> material. Any review, retransmission, dissemination or other use of,
> or
>>> taking of any action in reliance upon, this information by persons
> or
>>> entities other than the intended recipients is prohibited by
> AgResearch
>>> Limited. If you have received this message in error, please notify
> the
>>> sender immediately.
>>>
>> =============================================================
>> ==========
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Malay K Basu
>> www.malaybasu.net
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================


-- 
Malay K Basu
www.malaybasu.net


From tomboy at cs.huji.ac.il  Wed Nov 14 21:43:43 2007
From: tomboy at cs.huji.ac.il (Tomer Hertz)
Date: Wed, 14 Nov 2007 18:43:43 -0800
Subject: [Bioperl-l] problems in stalling bio perl
Message-ID: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>

hi
when I try to install bioperl I get the following error message:

hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
$ perl Build.PL
Can't find file lib/Module/Build.pm to determine version at
/usr/lib/perl5/site_
perl/5.8/Module/Build/Base.pm line 950.
can you please help. I have tried reinstalling the build command and that
does not seem to help as well.

many thanks
--Tomer

-- 
--------------------------------------------------------------------------------
Tomer Hertz
Postdoctoral Researcher
Machine Learning and Applied Statistics
Microsoft Research
One Microsoft Way, Redmond, WA, 98052, USA

Homepage: www.cs.huji.ac.il/~tomboy
Email: hertz at microsoft dot com
Tel: (425)-421-8313               Fax: (425) 936-7329
--------------------------------------------------------------------------------


From lskatz at gatech.edu  Thu Nov 15 08:24:02 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Thu, 15 Nov 2007 08:24:02 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B62D9.8010004@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>

Thank you all.
Are you all sure in that there is no way to go from an scf to an image
though?  I do have abi files, but I am relying on Phred output for
base calls for other things and I want to stay consistent.  This means
that if I use the fasta files that I get from Phred in another part of
my program, I need to use the scf files it produces.

If this is not possible, do you know if drawing an scf is in the works?  Thanks.

-- 
Lee Katz
http://www.lskatz.com


From cain.cshl at gmail.com  Thu Nov 15 09:21:26 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 15 Nov 2007 09:21:26 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <1195136486.2785.12.camel@localhost.localdomain>

Hi Lee,

Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses
Bio::SCF to draw trace files onto a Bio::Graphics::Panel.  Bio::SCF is
not part of bioperl, so you have to get it from CPAN and it depends on
the Staden io-lib package, so you'll need that too.  You can get GBrowse
from http://www.gmod.org/gbrowse , and you can look at the tutorial for
more information on configuring the trace glyph.

Scott


On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote:
> Thank you all.
> Are you all sure in that there is no way to go from an scf to an image
> though?  I do have abi files, but I am relying on Phred output for
> base calls for other things and I want to stay consistent.  This means
> that if I use the fasta files that I get from Phred in another part of
> my program, I need to use the scf files it produces.
> 
> If this is not possible, do you know if drawing an scf is in the works?  Thanks.
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From bosborne11 at verizon.net  Thu Nov 15 09:18:05 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 09:18:05 -0500
Subject: [Bioperl-l] problems in stalling bio perl
In-Reply-To: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>
Message-ID: <C361BF4D.103D8%bosborne11@verizon.net>

Tomer,

Interesting. When I used Cygwin I always worked entirely within the C:
drive, it looks like you're executing the script from the E: drive. Is
Cygwin installed in C:/cygwin? You can see what I'm getting at, it's
possible that you need to set $PERL5LIB to something like
/cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say?

Brian O.


On 11/14/07 9:43 PM, "Tomer Hertz" <tomboy at cs.huji.ac.il> wrote:

> hi
> when I try to install bioperl I get the following error message:
> 
> hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
> $ perl Build.PL
> Can't find file lib/Module/Build.pm to determine version at
> /usr/lib/perl5/site_
> perl/5.8/Module/Build/Base.pm line 950.
> can you please help. I have tried reinstalling the build command and that
> does not seem to help as well.
> 
> many thanks
> --Tomer


From bernd.web at gmail.com  Thu Nov 15 10:26:42 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 16:26:42 +0100
Subject: [Bioperl-l] Graphics::Panel
Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>

Hi,

Has someone been able to access '$description' for the production of
imagemaps with Graphics::Panel?
The map below does not print the "title" tag at all, '$description'
seems not available, although for the tracks ($panel->add_track) it is
available.
$map = $panel->create_web_map($mapname, $linkrule, '$description');

Replacing '$description' with a coderef for the titletag does work, if
I use the code below
my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };


I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }


Regards,
Bernd


From luciap at sas.upenn.edu  Thu Nov 15 10:44:21 2007
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Thu, 15 Nov 2007 10:44:21 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>

Hi
I was asked this question recently
and it occurred to me I must be doing things inefficiently
To produce gff file I was using SeqIO to parse the required fields, then
according to the conventions just printing out whatever was required tab
delimited, which is easy

but if I wanted to generate a genbank file, extracting features from a gff file
and a plain fasta file it was more complicated
is there support for gff in bioperl now?
anyone can contribute with  smart way to go from/to gff, genebank and embl?

thanks very much

Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From lstein at cshl.edu  Thu Nov 15 12:38:04 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Nov 2007 12:38:04 -0500
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>

Depending on which Feature object you use, you may have to use a tag named
"note" instead of "description".

Lincoln

On Nov 15, 2007 10:26 AM, Bernd Web <bernd.web at gmail.com> wrote:

> Hi,
>
> Has someone been able to access '$description' for the production of
> imagemaps with Graphics::Panel?
> The map below does not print the "title" tag at all, '$description'
> seems not available, although for the tracks ($panel->add_track) it is
> available.
> $map = $panel->create_web_map($mapname, $linkrule, '$description');
>
> Replacing '$description' with a coderef for the titletag does work, if
> I use the code below
> my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
>
>
> I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bernd.web at gmail.com  Thu Nov 15 13:03:19 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 19:03:19 +0100
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
	<6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com>

On Nov 15, 2007 6:38 PM, Lincoln Stein <lstein at cshl.edu> wrote:
> Depending on which Feature object you use, you may have to use a tag named
> "note" instead of "description".
>
> Lincoln
>
>
>
> On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote:
> >
> >
> >
> > Hi,
> >
> > Has someone been able to access '$description' for the production of
> > imagemaps with Graphics::Panel?
> > The map below does not print the "title" tag at all, '$description'
> > seems not available, although for the tracks ($panel->add_track) it is
> > available.
> > $map = $panel->create_web_map($mapname, $linkrule, '$description');
> >
> > Replacing '$description' with a coderef for the titletag does work, if
> > I use the code below
> > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
> >
> >
> > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Thu Nov 15 13:43:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Nov 2007 12:43:02 -0600
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>

There are currently many ways to get what you want, but not all are  
consistent (particularly re: GFF3).  We are aiming for more  
consistent, compliant GFF/GTF output in the next developer series  
(1.7) of Bioperl.

You can try using bp_genbank2gff or bp_genbank2gff3 (both in the  
scripts directory); these are probably the most common way when  
working directly from a seq record.  Bio::Tools::GFF is the most  
commonly used class though I'm unsure of it's status for GFF3  
output.  From within a Bio::SeqI you can call write_gff() (currently  
not very flexible) or from the SeqFeature itself gff_string().   
Bio::Graphics::Feature has the additional method gff3_string().   
Bio::FeatureIO is also an option, though I would consider it very  
experimental (it will likely undergo significant revision in the next  
bioperl dev series).

Any others anyone can think of, maybe non-BioPerl related as well?

chris

On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:

> Hi
> I was asked this question recently
> and it occurred to me I must be doing things inefficiently
> To produce gff file I was using SeqIO to parse the required fields,  
> then
> according to the conventions just printing out whatever was  
> required tab
> delimited, which is easy
>
> but if I wanted to generate a genbank file, extracting features  
> from a gff file
> and a plain fasta file it was more complicated
> is there support for gff in bioperl now?
> anyone can contribute with  smart way to go from/to gff, genebank  
> and embl?
>
> thanks very much
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Nov 15 14:19:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 14:19:41 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>
Message-ID: <C36205FD.103EA%bosborne11@verizon.net>

Chris,

There's also a genbank2gff3.PLS script in the GMOD package (
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
revision=1.5&view=markup). However, it has not been modified for a couple of
years, it may not be the "preferred" script.

See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information
on using Bioperl's bp_genbank2gff3 script.

Brian O.


On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> There are currently many ways to get what you want, but not all are
> consistent (particularly re: GFF3).  We are aiming for more
> consistent, compliant GFF/GTF output in the next developer series
> (1.7) of Bioperl.
> 
> You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> scripts directory); these are probably the most common way when
> working directly from a seq record.  Bio::Tools::GFF is the most
> commonly used class though I'm unsure of it's status for GFF3
> output.  From within a Bio::SeqI you can call write_gff() (currently
> not very flexible) or from the SeqFeature itself gff_string().
> Bio::Graphics::Feature has the additional method gff3_string().
> Bio::FeatureIO is also an option, though I would consider it very
> experimental (it will likely undergo significant revision in the next
> bioperl dev series).
> 
> Any others anyone can think of, maybe non-BioPerl related as well?
> 
> chris
> 
> On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> 
>> Hi
>> I was asked this question recently
>> and it occurred to me I must be doing things inefficiently
>> To produce gff file I was using SeqIO to parse the required fields,
>> then
>> according to the conventions just printing out whatever was
>> required tab
>> delimited, which is easy
>> 
>> but if I wanted to generate a genbank file, extracting features
>> from a gff file
>> and a plain fasta file it was more complicated
>> is there support for gff in bioperl now?
>> anyone can contribute with  smart way to go from/to gff, genebank
>> and embl?
>> 
>> thanks very much
>> 
>> Lucia Peixoto
>> Department of Biology,SAS
>> University of Pennsylvania
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Thu Nov 15 17:31:28 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 16 Nov 2007 11:31:28 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>

Just to add to this, does anyone have any code for reading .sff 'traces'
from 454 sequences?

Thanx,

Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From torsten.seemann at infotech.monash.edu.au  Thu Nov 15 20:13:22 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 16 Nov 2007 12:13:22 +1100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
Message-ID: <a79f6a4b0711151713g26905bc6g5b19202b992f4e08@mail.gmail.com>

> Just to add to this, does anyone have any code for reading .sff 'traces'
> from 454 sequences?

The .SFF files can be manipulated using the SFF tools which 454
distribute with their result data. eg. "sffinfo 454AllContigs.sff"
will list all the reads with the original flowgram values etc.
However, the SFF tools are i386.Linux binaries, so not really a
portable solution.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From mvrmakam at yahoo.com  Thu Nov 15 22:04:55 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST)
Subject: [Bioperl-l] Problem with installing bioperl on Windows
Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com>

Hi,

I have installed Perl Package Manager ver 5.8.8.822 on windows XP.  I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View.  However, I am not able to see any packages in the view box.  Can anyone help me in this matter.

Roshan


      ____________________________________________________________________________________
Get easy, one-click access to your favorites. 
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs 


From David.Messina at sbc.su.se  Fri Nov 16 03:33:04 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 16 Nov 2007 09:33:04 +0100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com>

> If this is not possible, do you know if drawing an scf is in the
> works?  Thanks.
>


One non-BioPerl solution is 4peaks:
http://mekentosj.com/4peaks/

Mac only, but really great software. I'm also a fan of their Papers journal
article PDF library program.


Dave


From neetisomaiya at gmail.com  Mon Nov 19 01:11:49 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 19 Nov 2007 11:41:49 +0530
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>

Hi,

I am using Bio::SeqIO for parsing KEGG gene ent files.

A part of my code is

foreach my $key ( $ac->get_all_annotation_keys() )
                                {
                                        if($key eq "dblink")
                                        {
                                                my %values =
$ac->get_Annotations($key);
                                                foreach my $value (
keys(%values ))
                                                {
                                                        print "\n*****VALUE
$value*****\n";
                                                }
                                        }
                                 }

Here not all dblinks present in the actual file get parsed. For eg, in the
data below,
ENTRY       116064            CDS       H.sapiens
NAME        LRRC58
DEFINITION  leucine rich repeat containing 58
POSITION    3q13.33
MOTIF       Pfam: SdiA-regulated LRR_1
            PROSITE: LEU_RICH
DBLINKS     NCBI-GI: 153792305
            NCBI-GeneID: 116064
            HGNC: 26968
            Ensembl: ENSG00000163428
            UniProt: Q96CX6

Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE,
but doesnt give me HGNC and UniProt. For other entries it gives me other
combinations of dbs.

Can anyone help me with this. Why is this happenning? I have no clue.

Thanks and Regards,
Neeti.
-- 
-Neeti
Even my blood says, B positive


From johnston at biochem.ucl.ac.uk  Mon Nov 19 06:44:59 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT)
Subject: [Bioperl-l] blast database names
Message-ID: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>

Hello,

Is there a list of the possible database names for -data =>
$dbname in RemoteBlast somwhere?

Cheers,
Cass


From cjfields at uiuc.edu  Mon Nov 19 08:44:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 07:44:46 -0600
Subject: [Bioperl-l] blast database names
In-Reply-To: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
Message-ID: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>

Here's a recent list (don't know if it's up-to-date):

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote:

> Hello,
>
> Is there a list of the possible database names for -data =>
> $dbname in RemoteBlast somwhere?
>
> Cheers,
> Cass
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Nov 19 09:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 08:33:46 -0600
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
Message-ID: <F81EBCF4-20AD-486C-A9EC-301FE9475504@uiuc.edu>

It makes sense in the light that you're (erroneously) using a hash:

    my %values = $ac->get_Annotations($key);

This assigns key-value pairs of DBLink => DBLink; you don't see an  
error b/c the number of links happens to be even (I get 8) but you  
would if the number of links returned is odd (missing value for key  
error or something along those lines).  So when you call:

    foreach my $value (keys(%values)) {....}

you only get half of the DBLinks.  You should use an array:

    my @values = $ac->get_Annotations($key);
    foreach my $value (@values) {
       print $value->as_text,"\n";
    }

Note the loop change; Bio::Annotation are no longer operator  
overloaded so your print statement wouldn't work in a bioperl 1.6 world.

chris

On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote:

> Hi,
>
> I am using Bio::SeqIO for parsing KEGG gene ent files.
>
> A part of my code is
>
> foreach my $key ( $ac->get_all_annotation_keys() )
>                                 {
>                                         if($key eq "dblink")
>                                         {
>                                                 my %values =
> $ac->get_Annotations($key);
>                                                 foreach my $value (
> keys(%values ))
>                                                 {
>                                                         print  
> "\n*****VALUE
> $value*****\n";
>                                                 }
>                                         }
>                                  }
>
> Here not all dblinks present in the actual file get parsed. For eg,  
> in the
> data below,
> ENTRY       116064            CDS       H.sapiens
> NAME        LRRC58
> DEFINITION  leucine rich repeat containing 58
> POSITION    3q13.33
> MOTIF       Pfam: SdiA-regulated LRR_1
>             PROSITE: LEU_RICH
> DBLINKS     NCBI-GI: 153792305
>             NCBI-GeneID: 116064
>             HGNC: 26968
>             Ensembl: ENSG00000163428
>             UniProt: Q96CX6
>
> Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and  
> PROSITE,
> but doesnt give me HGNC and UniProt. For other entries it gives me  
> other
> combinations of dbs.
>
> Can anyone help me with this. Why is this happenning? I have no clue.
>
> Thanks and Regards,
> Neeti.
> -- 
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From akarger at CGR.Harvard.edu  Mon Nov 19 10:38:26 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 19 Nov 2007 10:38:26 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
References: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
	<3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E0747C64A@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> Sent: Tuesday, November 13, 2007 12:42 PM
> To: Amir Karger
> Cc: Steve Chervitz; Dave Messina; bioperl-l
> Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result?
> 
> Amir,
> 
> Can you file this as a bug?  

Done.

http://bugzilla.open-bio.org/show_bug.cgi?id=2399

> Dave mentioned he would look 
> into it but  
> I think it warrants tracking to make sure it gets fixed:
> 
> http://www.bioperl.org/wiki/Bugs
> 
> Attach the example BLAST report from your last post to the report.   
> BTW, I wonder how this appears in XML output?
> 
> chris
> 
> On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:
> 
> >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
> >> Of Steve Chervitz
> >>
> >> The Bioperl blast parser should extract that value and you 
> can obtain
> >> it from an HSP object, via the HSPI::n() method, documented here:
> >>
> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> > io/Search/HSP/HSPI.html#POD23
> >
> > As I mentioned in my email:
> >
> > And does anyone know off-hand if Bioperl will tell me when 
> situations
> > like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> > subroutine
> > would help, but I just get a bunch of empty strings for that,  
> > whether or
> > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> > {"_n"} is
> > undef.)
> >
> > And the docs for n() actually say, "This value is not defined with  
> > NCBI
> > Blast2 with gapping" although they don't say why. Which may 
> explain  
> > why,
> > when I ran the following code on the blast result I included in my  
> > last
> > email, I got empty values for all of the n's. (Why is n() 
> undefined  
> > for
> > gapped blast if I'm getting n's in my results from that blast?)
> >
> > use warnings;
> > use strict;
> > use Bio::SearchIO;
> >
> > my $blast_out = $ARGV[0];
> > my $in = new Bio::SearchIO(-format => 'blast',
> >                             -file   => $blast_out,
> >                             -report_type => 'tblastn');
> >
> > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart 
> Send Frame N
> > Evalue)), "\n";
> > while(my $query = $in->next_result) {
> >     while(my $subject = $query->next_hit) {
> >         while (my $hsp = $subject->next_hsp) {
> >             print join("\t",
> >                 $query->query_name,
> >                 $hsp->start("query"),
> >                 $hsp->end("query"),
> >                 $hsp->strand("hit"),
> >                 $subject->name,
> >                 $hsp->start("hit"),
> >                 $hsp->end("hit"),
> >                 $subject->frame,
> >                 $hsp->n,
> >                 $hsp->evalue,
> >             ),"\n";
> >         }
> >     }
> > }
> >
> >> Dave's basically correct in his explanation. It's a result of the
> >> application of sum statistics by the blast algorithm. You 
> can read  
> >> all
> >> about it in Korf et al's BLAST book. Here's the relevant section:
> >
> > [snip]
> >
> > Thanks,
> >
> > -Amir
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> 


From aaron.j.mackey at gsk.com  Mon Nov 19 11:50:53 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 19 Nov 2007 11:50:53 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <C36205FD.103EA%bosborne11@verizon.net>
Message-ID: <OF0C0B3E21.611ACEBE-ON85257398.005C01A8-85257398.005C8D95@gsk.com>

While Lucia's subject line asked for genbank2gff, her message actually 
asked the reverse (gff + fasta -> genbank).

e.g. pretend you had to prepare a genome annotation for submission to 
GenBank ...

and no, I don't know of any generalized gff2genbank script out there ...

Lucia, the SeqIO::genbank module will write GenBank format, but you have 
to get all the bits and bobs together in the right way, i.e. construct the 
various AnnotationCollections and SeqFeatures (with SplitLocations for 
exons, CDS, etc.) that a GenBank record expects.  One way to do this is to 
start with a template GenBank file that you'd like to mimic, strip it down 
to only two gene models, use SeqIO::genbank to read it into memory, and 
then step through the object with the Perl debugger to see how it is 
composed.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM:

> Chris,
> 
> There's also a genbank2gff3.PLS script in the GMOD package (
> 
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
> revision=1.5&view=markup). However, it has not been modified for a 
couple of
> years, it may not be the "preferred" script.
> 
> See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
> http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more 
information
> on using Bioperl's bp_genbank2gff3 script.
> 
> Brian O.
> 
> 
> On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > There are currently many ways to get what you want, but not all are
> > consistent (particularly re: GFF3).  We are aiming for more
> > consistent, compliant GFF/GTF output in the next developer series
> > (1.7) of Bioperl.
> > 
> > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> > scripts directory); these are probably the most common way when
> > working directly from a seq record.  Bio::Tools::GFF is the most
> > commonly used class though I'm unsure of it's status for GFF3
> > output.  From within a Bio::SeqI you can call write_gff() (currently
> > not very flexible) or from the SeqFeature itself gff_string().
> > Bio::Graphics::Feature has the additional method gff3_string().
> > Bio::FeatureIO is also an option, though I would consider it very
> > experimental (it will likely undergo significant revision in the next
> > bioperl dev series).
> > 
> > Any others anyone can think of, maybe non-BioPerl related as well?
> > 
> > chris
> > 
> > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> > 
> >> Hi
> >> I was asked this question recently
> >> and it occurred to me I must be doing things inefficiently
> >> To produce gff file I was using SeqIO to parse the required fields,
> >> then
> >> according to the conventions just printing out whatever was
> >> required tab
> >> delimited, which is easy
> >> 
> >> but if I wanted to generate a genbank file, extracting features
> >> from a gff file
> >> and a plain fasta file it was more complicated
> >> is there support for gff in bioperl now?
> >> anyone can contribute with  smart way to go from/to gff, genebank
> >> and embl?
> >> 
> >> thanks very much
> >> 
> >> Lucia Peixoto
> >> Department of Biology,SAS
> >> University of Pennsylvania
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From johnston at biochem.ucl.ac.uk  Mon Nov 19 09:46:03 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT)
Subject: [Bioperl-l] blast database names
In-Reply-To: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
	<B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0711191441010.3141@localhost.localdomain>

On Mon, 19 Nov 2007, Chris Fields wrote:

> Here's a recent list (don't know if it's up-to-date):
>
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

Thanks. Perhaps I missed something in the docs, but I don't think I've
quite understood how this is supposed to work. I'm trying to blast primer
sequences against the ref genome sequence. Should I be using ref_contig?
How can I limit the blast to a single species?

cheers,
Cass.


From Kevin.M.Brown at asu.edu  Mon Nov 19 13:31:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 19 Nov 2007 11:31:38 -0700
Subject: [Bioperl-l] pSW vs dpAlign
Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu>

I was able to get the Ext package installed, just had to copy the
Align.pm file up one directory from where it was being put by the
installer.  Now I have a technician trying to use pSW (Bio::Tools::pSW)
and it appears to have been last updated back in '99 and seems to lack
certain methods to get things out of the alignment like the score.  The
test.pl script that Bio::Ext comes with actually uses
Bio::Tools::dpAlign.  Is dpAlign the replacement for pSW?


From bernd.web at gmail.com  Wed Nov 21 11:42:40 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 17:42:40 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de>
	<470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>

Hi Russell,

I came across your question. At first I thought all was well on my
system, but indeed I also have these colouring problems.
I noted that scrore in the bgcolor callback gets a different value!
Printing score during hit parsing($hit->raw_score) gives the same
score as -description
my $score = $feature->score; However, printing score in the bgcolor
sub gives 2573!
All scores in the bgcolor routine all different and higher than the
real scores. Were you able to solve this colouring issue?

Regards,
Bernd

> Hi all,
> I'm using a modified version of Lincoln's tutorial
> (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> to give a similar image to that from NCBI but for some reason, my
> colours are coming out wrong (see attached example)
> They seem to be off by one but I can't see why.
>
> Any ideas?
>
> I can't be certain but I think it's only started doing this since our
> BLAST upgrade to 2.2.17 a few weeks ago.
>
> Here's the colouring code:
> ------------------------------------------------------------------------
> -------
> my $track = $panel->add_track(
>                               -glyph       => 'segments',
>                               -label       => 1,
>                               -connector   => 'dashed',
>                               -bgcolor     => sub {
>                                 my $feature = shift;
>                                 my $score = $feature->score;
>                         return 'red'       if $score >= 200;
>                                     return 'fuchsia' if $score >= 80;
>                                     return 'lime'      if $score >= 50;
>                         return 'blue'      if $score >= 40;
>                                     return 'black';
>                                },
>                               -font2color  => 'gray',
>                               -sort_order  => 'high_score',
>                               -description => sub {
>                                 my $feature = shift;
>                                 return unless
> $feature->has_tag('description');
>                                 my ($description) =
> $feature->each_tag_value('description');
>                                 my $score = $feature->score;
>                                 "$description, score=$score";
>                                },
>                              );
> ------------------------------------------------------------------------
> ---------
>
>
> Thanx,
>
> Russell Smithies
>
>
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bernd.web at gmail.com  Wed Nov 21 12:38:30 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 18:38:30 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk>
	<47022278.7010700@web.de> <47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>

Hi,

I now found that bgcolor is using a  $feature->score that is coming
directly from the blast report, it is not the bit score.
     -bgcolor     => sub {my $feature = shift;
                                  my $score = $feature->score;
				  print "$score\n"; }
always print the score, even if the score is not set in the
Bio::SeqFeature::Generic object.

-description callbacks are somehow using the score from the SeqFeature object.

Does anyone have an idea why?

Further is is possible to get the raw_score of a hit. $hit->raw_score
actually gets the bitscore (w/o decimal point).

Bernd

On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> Hi Russell,
>
> I came across your question. At first I thought all was well on my
> system, but indeed I also have these colouring problems.
> I noted that scrore in the bgcolor callback gets a different value!
> Printing score during hit parsing($hit->raw_score) gives the same
> score as -description
> my $score = $feature->score; However, printing score in the bgcolor
> sub gives 2573!
> All scores in the bgcolor routine all different and higher than the
> real scores. Were you able to solve this colouring issue?
>
> Regards,
> Bernd
>
>
> > Hi all,
> > I'm using a modified version of Lincoln's tutorial
> > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > to give a similar image to that from NCBI but for some reason, my
> > colours are coming out wrong (see attached example)
> > They seem to be off by one but I can't see why.
> >
> > Any ideas?
> >
> > I can't be certain but I think it's only started doing this since our
> > BLAST upgrade to 2.2.17 a few weeks ago.
> >
> > Here's the colouring code:
> > ------------------------------------------------------------------------
> > -------
> > my $track = $panel->add_track(
> >                               -glyph       => 'segments',
> >                               -label       => 1,
> >                               -connector   => 'dashed',
> >                               -bgcolor     => sub {
> >                                 my $feature = shift;
> >                                 my $score = $feature->score;
> >                         return 'red'       if $score >= 200;
> >                                     return 'fuchsia' if $score >= 80;
> >                                     return 'lime'      if $score >= 50;
> >                         return 'blue'      if $score >= 40;
> >                                     return 'black';
> >                                },
> >                               -font2color  => 'gray',
> >                               -sort_order  => 'high_score',
> >                               -description => sub {
> >                                 my $feature = shift;
> >                                 return unless
> > $feature->has_tag('description');
> >                                 my ($description) =
> > $feature->each_tag_value('description');
> >                                 my $score = $feature->score;
> >                                 "$description, score=$score";
> >                                },
> >                              );
> > ------------------------------------------------------------------------
> > ---------
> >
> >
> > Thanx,
> >
> > Russell Smithies
> >
> >
> >
> >
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>


From sac at bioperl.org  Wed Nov 21 13:43:54 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 21 Nov 2007 10:43:54 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>

On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> [snip]
>
> Further is is possible to get the raw_score of a hit. $hit->raw_score
> actually gets the bitscore (w/o decimal point).

Hmmm. raw_score should not be the same as bit score. So given an
example blast hit line such as:

       Score = 60.0 bits (30), Expect = 1e-06

$hit->raw_score() should return 30, not 60, as you seem to be getting.

Could you submit a bug report for this?  http://www.bioperl.org/wiki/Bugs

Thanks,
Steve

>
> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> > Hi Russell,
> >
> > I came across your question. At first I thought all was well on my
> > system, but indeed I also have these colouring problems.
> > I noted that scrore in the bgcolor callback gets a different value!
> > Printing score during hit parsing($hit->raw_score) gives the same
> > score as -description
> > my $score = $feature->score; However, printing score in the bgcolor
> > sub gives 2573!
> > All scores in the bgcolor routine all different and higher than the
> > real scores. Were you able to solve this colouring issue?
> >
> > Regards,
> > Bernd
> >
> >
> > > Hi all,
> > > I'm using a modified version of Lincoln's tutorial
> > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > > to give a similar image to that from NCBI but for some reason, my
> > > colours are coming out wrong (see attached example)
> > > They seem to be off by one but I can't see why.
> > >
> > > Any ideas?
> > >
> > > I can't be certain but I think it's only started doing this since our
> > > BLAST upgrade to 2.2.17 a few weeks ago.
> > >
> > > Here's the colouring code:
> > > ------------------------------------------------------------------------
> > > -------
> > > my $track = $panel->add_track(
> > >                               -glyph       => 'segments',
> > >                               -label       => 1,
> > >                               -connector   => 'dashed',
> > >                               -bgcolor     => sub {
> > >                                 my $feature = shift;
> > >                                 my $score = $feature->score;
> > >                         return 'red'       if $score >= 200;
> > >                                     return 'fuchsia' if $score >= 80;
> > >                                     return 'lime'      if $score >= 50;
> > >                         return 'blue'      if $score >= 40;
> > >                                     return 'black';
> > >                                },
> > >                               -font2color  => 'gray',
> > >                               -sort_order  => 'high_score',
> > >                               -description => sub {
> > >                                 my $feature = shift;
> > >                                 return unless
> > > $feature->has_tag('description');
> > >                                 my ($description) =
> > > $feature->each_tag_value('description');
> > >                                 my $score = $feature->score;
> > >                                 "$description, score=$score";
> > >                                },
> > >                              );
> > > ------------------------------------------------------------------------
> > > ---------
> > >
> > >
> > > Thanx,
> > >
> > > Russell Smithies
> > >
> > >
> > >
> > >
> > > =======================================================================
> > > Attention: The information contained in this message and/or attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or privileged
> > > material. Any review, retransmission, dissemination or other use of, or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > > =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From binkley at genome.stanford.edu  Wed Nov 21 19:35:02 2007
From: binkley at genome.stanford.edu (Jonathan Binkley)
Date: Wed, 21 Nov 2007 16:35:02 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>

Hi,

I installed bioperl on a Mac (OS 10.4, Intel) via fink,
which put it here:

/sw/lib/perl5/5.8.6/Bio/

It seems to work fine, but I need bioperl-ext for
Smith-Waterman alignments.

So, into which directory should I download bioperl-ext and
run the Makefile?

Thanks.


From dcj at sanger.ac.uk  Thu Nov 22 09:47:09 2007
From: dcj at sanger.ac.uk (Daniel Jeffares)
Date: Thu, 22 Nov 2007 14:47:09 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
Message-ID: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>

Hi all,

Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to  
be a little 'broken', at least in my hands.
First,  $bml->set_parameter('runmode', 0); does not work (sets  
runmode to -2). setting runmode to 1 is OK.
Also,  $bml->no_param_checks(1); doesn't seem to work.

The result is that the baseml.ctl file created under /tmp is not  
runnable by baseml with runmode 0. The phylip file created is run OK  
by baeml(with another .ctl file). My script & baseml.ctl below.

Hope it can be fixed,

cheers,

Dan


#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
   use Bio::AlignIO;
   my $alignio = Bio::AlignIO->new(-format => 'phylip',-file =>  
'test.phy');
   my $aln = $alignio->next_aln;

   my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
   $bml->alignment($aln);
   $bml->save_tempfiles(1);
   my $tempdir = $bml->tempdir();


   #set the runmode to zero
   $bml->set_parameter('runmode', 0);

   my ($rc,$parser) = $bml->run();
   system "more $tempdir/baseml.ctl";

   while( my $result = $parser->next_result ) {
     my @otus = $result->get_seqs();
     my $MLmatrix = $result->get_MLmatrix();
     # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
   }
exit;


The baseml.ctl file produced:
seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA
outfile = mlb
fix_rho = 1
verbose = 0
noisy = 0
RateAncestor = 1
kappa = 2.5
model = 0
ndata = 5
Small_Diff = 1e-6
runmode = -2
alpha = 0
fix_kappa = 0
rho = 0
nhomo = 0
getSE = 0
cleandata = 1
fix_alpha = 1
clock = 0
Malpha = 0
ncatG = 5
fix_blength = -1
nparK = 0


Regards,

Daniel Jeffares

______________________________
Population and Comparative Genomics
Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
Phone: +44(0)1223 834244 x 7297
Fax: +44 (0)1223 494919
www.sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Thu Nov 22 11:06:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 22 Nov 2007 17:06:16 +0100
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>

Daniel,

I don't have bioperl-run or PAML installed on my system to test it myself,
but have you tried the latest version of bioperl-run from CVS? It looks like
that code has been worked on since 1.5.2 was released.


If that still doesn't work, could you file this as a bug to make sure it
gets followed up?


Dave


You can grab the tarball here:
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl


and if necessary file the bug here:
BioPerl Bugzilla tracking system <http://bugzilla.open-bio.org/>


From arareko at campus.iztacala.unam.mx  Thu Nov 22 11:37:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 22 Nov 2007 10:37:24 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
Message-ID: <4745B044.5090102@campus.iztacala.unam.mx>

Hi Peter,

In BioPerl, there's no such mapping for db_xref's that I'm aware of. 
Each parser handles db_xref records on its own. Take a look at the 
Bio::SeqIO::genbank code, inside the next_seq() method for example:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup

Regards,
Mauricio.

Peter wrote:
> Dear all,
> 
> I'm one of the Biopython developers.  I've recently got going with
> BioSQL and have been getting to grips with the Biopython BioSQL
> interface.  I'm aware that we need to try and be consistent with
> BioPerl and BioJava, so I'd like to pose my first question related to
> that.
> 
> When loading GenBank records, many features have db_xref qualifiers,
> e.g. from a random CDS feature in E. coli K12:
> 
>                      /db_xref="ASAP:1309"
>                      /db_xref="GI:16128366"
>                      /db_xref="ECOCYC:EG10213"
>                      /db_xref="GeneID:945313"
> 
> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
> "GeneID" before using recording these entries in the seqfeature_dbxref
> and dbxref tables.  For example, "GI" becomes "GeneIndex".
> Biopython's current mapping is as follows:
> 
> # Dictionary of database types, keyed by GenBank db_xref abbreviation
> db_dict = {'GeneID': 'Entrez',
>            'GI': 'GeneIndex',
>            'COG': 'COG',
>            'CDD': 'CDD',
>            'DDBJ': 'DNA Databank of Japan',
>            'Entrez': 'Entrez',
>            'GeneIndex': 'GeneIndex',
>            'PUBMED': 'PubMed',
>            'taxon': 'Taxon',
>            'ATCC': 'ATCC',
>            'ISFinder': 'ISFinder',
>            'GOA': 'Gene Ontology Annotation',
>            'ASAP': 'ASAP',
>            'PSEUDO': 'PSEUDO',
>            'InterPro': 'InterPro',
>            'GEO': 'Gene Expression Omnibus',
>            'EMBL': 'EMBL',
>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>            'ECOCYC': 'EcoCyc',
>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>            }
> 
> In my testing, I've found several GenBank db_xref abbreviation for
> which we don't have a mapping defined, such as "LocusID", "dbSNP",
> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
> 
> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
> similar mapping in their BioSQL code (or GenBank parser), so that
> Biopython can follow your example.
> 
> Thank you,
> 
> Peter
> 
> P.S. See also Biopython bug 2405
> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From avilella at gmail.com  Thu Nov 22 16:55:10 2007
From: avilella at gmail.com (Albert Vilella)
Date: Thu, 22 Nov 2007 21:55:10 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>

Hi,

Am I right in thinking that the '_symbols' hash in SimpleAlign is only
used if one calls the symbol_chars method?

When I comment out this line:

map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
$seq->seq; # line 257

I get a nice speed boost on loading alignments.

Can I comment this line out in the CVS HEAD?

Cheers,

    Albert.

[init] 5.96046447753906e-06 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.0022270679473877 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
2.14348912239075 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
6.91910791397095 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
15.8402290344238 secs...

avilella at magneto:~$ perl
/home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl
-dir /home/avilella/ensembl/exoseq/test -verbose
[init] 1.21593475341797e-05 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.00294303894042969 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
0.510555982589722 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
1.6192569732666 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
3.86473417282104 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta]
6.99602198600769 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta]
7.26704716682434 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta]
8.44332504272461 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta]
12.103296995163 secs...


From cjfields at uiuc.edu  Thu Nov 22 19:30:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:30:51 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu>

How are tests affected?  It might be worth going through the revision  
history to see if there was a specific reason this was implemented,  
but if it passes tests I don't see why we need it.

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 22 19:42:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:42:12 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
	<4745B044.5090102@campus.iztacala.unam.mx>
Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu>

I think SeqIO checks the name for parsing reasons only, in cases  
where the format changes based on the source (such as GenPept  
DBSOURCE data).  I don't think we go beyond that in Bioperl, probably  
b/c modifying or expanding names for data persistence would lead to  
volatile coding issues (i.e. consistency between parsers, constant  
updating to cover new crossrefs, etc).

I would definitely suggest retaining the original DB as it appears in  
the dbxref for consistency/sanity; if needed return expanded names  
using a different method if they are designated.

chris

On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote:

> Hi Peter,
>
> In BioPerl, there's no such mapping for db_xref's that I'm aware of.
> Each parser handles db_xref records on its own. Take a look at the
> Bio::SeqIO::genbank code, inside the next_seq() method for example:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
> Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup
>
> Regards,
> Mauricio.
>
> Peter wrote:
>> Dear all,
>>
>> I'm one of the Biopython developers.  I've recently got going with
>> BioSQL and have been getting to grips with the Biopython BioSQL
>> interface.  I'm aware that we need to try and be consistent with
>> BioPerl and BioJava, so I'd like to pose my first question related to
>> that.
>>
>> When loading GenBank records, many features have db_xref qualifiers,
>> e.g. from a random CDS feature in E. coli K12:
>>
>>                      /db_xref="ASAP:1309"
>>                      /db_xref="GI:16128366"
>>                      /db_xref="ECOCYC:EG10213"
>>                      /db_xref="GeneID:945313"
>>
>> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
>> "GeneID" before using recording these entries in the  
>> seqfeature_dbxref
>> and dbxref tables.  For example, "GI" becomes "GeneIndex".
>> Biopython's current mapping is as follows:
>>
>> # Dictionary of database types, keyed by GenBank db_xref abbreviation
>> db_dict = {'GeneID': 'Entrez',
>>            'GI': 'GeneIndex',
>>            'COG': 'COG',
>>            'CDD': 'CDD',
>>            'DDBJ': 'DNA Databank of Japan',
>>            'Entrez': 'Entrez',
>>            'GeneIndex': 'GeneIndex',
>>            'PUBMED': 'PubMed',
>>            'taxon': 'Taxon',
>>            'ATCC': 'ATCC',
>>            'ISFinder': 'ISFinder',
>>            'GOA': 'Gene Ontology Annotation',
>>            'ASAP': 'ASAP',
>>            'PSEUDO': 'PSEUDO',
>>            'InterPro': 'InterPro',
>>            'GEO': 'Gene Expression Omnibus',
>>            'EMBL': 'EMBL',
>>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>>            'ECOCYC': 'EcoCyc',
>>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>>            }
>>
>> In my testing, I've found several GenBank db_xref abbreviation for
>> which we don't have a mapping defined, such as "LocusID", "dbSNP",
>> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
>>
>> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
>> similar mapping in their BioSQL code (or GenBank parser), so that
>> Biopython can follow your example.
>>
>> Thank you,
>>
>> Peter
>>
>> P.S. See also Biopython bug 2405
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 22 19:49:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:49:15 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>

Albert,

Found it:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
SimpleAlign.pm.diff?r1=1.36&r2=1.37

If it slows performance that dramatically, maybe we can move this to  
a separate AlignUtils method instead.  Maybe something to ask Jason  
about?

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 23 07:29:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Nov 2007 12:29:37 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
Message-ID: <4746C7B1.1010002@sendu.me.uk>

Dave Messina wrote:
> Daniel,
> 
> I don't have bioperl-run or PAML installed on my system to test it myself,
> but have you tried the latest version of bioperl-run from CVS? It looks like
> that code has been worked on since 1.5.2 was released.

Yes, I fixed it in CVS so it should at least /run/. I don't know about 
the parsing side of things, though that may also have been fixed 
recently by someone else.


From avilella at gmail.com  Fri Nov 23 08:08:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Nov 2007 13:08:59 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <4746C7B1.1010002@sendu.me.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>

Just to mention that the new paml4 has a "basemlg" instead of a
"baseml" binary. AFAIK, Jason fixed codeml to make it work both for
paml3.xx a paml4, but I am not sure about baseml.

Also, I think if you set runmode 0, you have to provide a tree:

#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
use Bio::AlignIO;
use Bio::TreeIO;
my $alignio = Bio::AlignIO->new(-format => 'phylip',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy');
my $treeio = Bio::TreeIO->new(-format => 'newick',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree');
my $aln = $alignio->next_aln;
my $tree = $treeio->next_tree;

my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
$bml->alignment($aln);
$bml->tree($tree);
$bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml");
$bml->save_tempfiles(1);
my $tempdir = $bml->tempdir();


#set the runmode to zero
$bml->set_parameter('runmode', 0);

my ($rc,$parser) = $bml->run();
system "more $tempdir/baseml.ctl";

while ( my $result = $parser->next_result ) {
    my @otus = $result->get_seqs();
    my $MLmatrix = $result->get_MLmatrix();
    $DB::single=1;1;
    # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
}
exit;

4 50
Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC
Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC
Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC-
Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC

ACAUUUU-CCUUGCAAAG
ACAUCAU-CCUUGCAAAG
ACAUCAUCCCUCGCAGAG
ACAUCAUCCCUUGCAGAG

(((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm);
On Nov 23, 2007 12:29 PM, Sendu Bala <bix at sendu.me.uk> wrote:
> Dave Messina wrote:
> > Daniel,
> >
> > I don't have bioperl-run or PAML installed on my system to test it myself,
> > but have you tried the latest version of bioperl-run from CVS? It looks like
> > that code has been worked on since 1.5.2 was released.
>
> Yes, I fixed it in CVS so it should at least /run/. I don't know about
> the parsing side of things, though that may also have been fixed
> recently by someone else.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Fri Nov 23 11:24:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 10:24:59 -0600
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
	<358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu>

I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just  
'basemlg'), so it would need to work with both.

Do we want to put a PAML parser/wrapper overhaul on the TODO list for  
1.6?

chris

On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote:

> Just to mention that the new paml4 has a "basemlg" instead of a
> "baseml" binary. AFAIK, Jason fixed codeml to make it work both for
> paml3.xx a paml4, but I am not sure about baseml.
...


From arvindvanam at gmail.com  Fri Nov 23 16:26:06 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST)
Subject: [Bioperl-l]  run RNAfold in perl
Message-ID: <13918981.post@talk.nabble.com>


how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
my $rnafold = $factory->program('rnafold');
my $job=$rnafold->run(-rnafold =>
'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');

I installed Vienna package and then i tried using Pise to create an object
for the program but its giving the following error
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bio::Tools::Run::PiseJob terminated: URL missing
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Tools::Run::PiseJob::terminated
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
STACK: Bio::Tools::Run::PiseApplication::submit
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
STACK: Bio::Tools::Run::PiseApplication::run
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
STACK: evaluate.pl:12


how to make the program RNAfold run in perl... 
IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???

plz reply soon
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Fri Nov 23 17:49:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 16:49:43 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13918981.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>

The Pise wrappers run the programs remotely; see  
Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
mfold wrappers but haven't done so yet.  The Vienna tools do have a  
Perl-based (non-BioPerl-based) module included which uses libRNA, and  
is well worth a look.  Try 'perldoc RNA' if you have installed the  
tools locally, or look here for other Perl-based tools:

http://www.tbi.univie.ac.at/~ivo/RNA/utils.html

chris

On Nov 23, 2007, at 3:26 PM, vanam wrote:

>
> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>
> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
> my $rnafold = $factory->program('rnafold');
> my $job=$rnafold->run(-rnafold =>
> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>
> I installed Vienna package and then i tried using Pise to create an  
> object
> for the program but its giving the following error
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Tools::Run::PiseJob::terminated
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
> STACK: Bio::Tools::Run::PiseApplication::submit
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
> STACK: Bio::Tools::Run::PiseApplication::run
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
> STACK: evaluate.pl:12
>
>
> how to make the program RNAfold run in perl...
> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>
> plz reply soon
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13918981
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Sat Nov 24 02:29:11 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
Message-ID: <13922740.post@talk.nabble.com>


i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and
i used it exactly as it was mentioned in it.

i just want that instead of running its perl version "RNAfold.pl" I can use
the functions associated with RNAfold with a perl program without having to
call the program using system() command.

if you can just tell me how to use these wrapper modules it would b of gr8
help...like while using clustalw or clustalx we define the environment
variable for it ..do we have to do the same for RNAfold or Mfold


Chris Fields wrote:
> 
> The Pise wrappers run the programs remotely; see  
> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
> mfold wrappers but haven't done so yet.  The Vienna tools do have a  
> Perl-based (non-BioPerl-based) module included which uses libRNA, and  
> is well worth a look.  Try 'perldoc RNA' if you have installed the  
> tools locally, or look here for other Perl-based tools:
> 
> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
> 
> chris
> 
> On Nov 23, 2007, at 3:26 PM, vanam wrote:
> 
>>
>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>
>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>> my $rnafold = $factory->program('rnafold');
>> my $job=$rnafold->run(-rnafold =>
>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>
>> I installed Vienna package and then i tried using Pise to create an  
>> object
>> for the program but its giving the following error
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::Tools::Run::PiseJob::terminated
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>> STACK: Bio::Tools::Run::PiseApplication::submit
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>> STACK: Bio::Tools::Run::PiseApplication::run
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>> STACK: evaluate.pl:12
>>
>>
>> how to make the program RNAfold run in perl...
>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>
>> plz reply soon
>> -- 
>> View this message in context: http://www.nabble.com/run-RNAfold-in- 
>> perl-tf4863835.html#a13918981
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From avilella at gmail.com  Sun Nov 25 06:50:42 2007
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 25 Nov 2007 11:50:42 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>

cvs commited now. it is calculated anyway when calling symbol_chars so...

On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> Found it:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>
> If it slows performance that dramatically, maybe we can move this to
> a separate AlignUtils method instead.  Maybe something to ask Jason
> about?
>
> chris
>
> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>
>
> > Hi,
> >
> > Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> > used if one calls the symbol_chars method?
> >
> > When I comment out this line:
> >
> > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> > $seq->seq; # line 257
> >
> > I get a nice speed boost on loading alignments.
> >
> > Can I comment this line out in the CVS HEAD?
> >
> > Cheers,
> >
> >     Albert.
> >
> > [init] 5.96046447753906e-06 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.0022270679473877 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 2.14348912239075 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 6.91910791397095 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 15.8402290344238 secs...
> >
> > avilella at magneto:~$ perl
> > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> > ancestral_alleles.pl
> > -dir /home/avilella/ensembl/exoseq/test -verbose
> > [init] 1.21593475341797e-05 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.00294303894042969 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 0.510555982589722 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 1.6192569732666 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 3.86473417282104 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000203717.chr1.fasta]
> > 6.99602198600769 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000196188.chr1.fasta]
> > 7.26704716682434 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000025800.chr1.fasta]
> > 8.44332504272461 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000117475.chr1.fasta]
> > 12.103296995163 secs...
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Sun Nov 25 10:05:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:05:27 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13922740.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>

Again, these wrappers are for submitting data to a Pise server for  
the corresponding programs (run on a remote server).  There are no  
wrappers for running RNAfold on your computer (i.e. locally), with or  
w/o a set env. variable.  You can try instaling Pise locally and  
setting the location() as shown in POD to localhost, however I don't  
know how stable these modules are with newer versions of Pise.  These  
haven't been updated in a few years, apart from getting tests to work.

Another option is installing EMBOSS along with the EMBASSY version of  
RNAFold; this could conceivably be run through Bio::Factory::EMBOSS.

chris

On Nov 24, 2007, at 1:29 AM, vanam wrote:

>
> i have seen the documentation for  
> Bio::Tools::Run::AnalysisFactory::Pise and
> i used it exactly as it was mentioned in it.
>
> i just want that instead of running its perl version "RNAfold.pl" I  
> can use
> the functions associated with RNAfold with a perl program without  
> having to
> call the program using system() command.
>
> if you can just tell me how to use these wrapper modules it would b  
> of gr8
> help...like while using clustalw or clustalx we define the environment
> variable for it ..do we have to do the same for RNAfold or Mfold
>
>
>
>
> Chris Fields wrote:
>>
>> The Pise wrappers run the programs remotely; see
>> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a
>> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/
>> mfold wrappers but haven't done so yet.  The Vienna tools do have a
>> Perl-based (non-BioPerl-based) module included which uses libRNA, and
>> is well worth a look.  Try 'perldoc RNA' if you have installed the
>> tools locally, or look here for other Perl-based tools:
>>
>> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
>>
>> chris
>>
>> On Nov 23, 2007, at 3:26 PM, vanam wrote:
>>
>>>
>>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>>
>>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>>> my $rnafold = $factory->program('rnafold');
>>> my $job=$rnafold->run(-rnafold =>
>>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>>
>>> I installed Vienna package and then i tried using Pise to create an
>>> object
>>> for the program but its giving the following error
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::Tools::Run::PiseJob::terminated
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>>> STACK: Bio::Tools::Run::PiseApplication::submit
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>>> STACK: Bio::Tools::Run::PiseApplication::run
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>>> STACK: evaluate.pl:12
>>>
>>>
>>> how to make the program RNAfold run in perl...
>>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>>
>>> plz reply soon
>>> -- 
>>> View this message in context: http://www.nabble.com/run-RNAfold-in-
>>> perl-tf4863835.html#a13918981
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13922740
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 10:38:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:38:40 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
Message-ID: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>

Albert,

I was getting a single AlignIO.t fail which appeared to be related to  
this:

...
ok 122 - The object isa Bio::Align::AlignI
ok 123 - consensus_string on metafasta

not ok 124 - symbol_chars() using metafasta
#   Failed test 'symbol_chars() using metafasta'
#   in t/AlignIO.t at line 346.
#          got: '0'
#     expected: '23'

It was b/c the symbol hash was initialized in the constructor (so it  
was present, just empty).  I have changed that in CVS; all tests pass  
now.

chris

On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:

> cvs commited now. it is calculated anyway when calling symbol_chars  
> so...
>
> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> Found it:
>>
>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
>> Bio/
>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>
>> If it slows performance that dramatically, maybe we can move this to
>> a separate AlignUtils method instead.  Maybe something to ask Jason
>> about?
>>
>> chris
>>
>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>
>>
>>> Hi,
>>>
>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is  
>>> only
>>> used if one calls the symbol_chars method?
>>>
>>> When I comment out this line:
>>>
>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>> $seq->seq; # line 257
>>>
>>> I get a nice speed boost on loading alignments.
>>>
>>> Can I comment this line out in the CVS HEAD?
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> [init] 5.96046447753906e-06 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.0022270679473877 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 2.14348912239075 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 6.91910791397095 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 15.8402290344238 secs...
>>>
>>> avilella at magneto:~$ perl
>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>> ancestral_alleles.pl
>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>> [init] 1.21593475341797e-05 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.00294303894042969 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 0.510555982589722 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 1.6192569732666 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 3.86473417282104 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000203717.chr1.fasta]
>>> 6.99602198600769 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000196188.chr1.fasta]
>>> 7.26704716682434 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000025800.chr1.fasta]
>>> 8.44332504272461 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000117475.chr1.fasta]
>>> 12.103296995163 secs...
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Sun Nov 25 11:13:44 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 25 Nov 2007 17:13:44 +0100
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>

Hi,

I am not sure if this is related, but I remember SimpleAlign was
adapted to cope with more gap symbols that can occur in
alignments/FastA sequences, as: . _ - =
Previous versions would throw an error on 'illegal' gap characters,

Regards,
Bernd

On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> I was getting a single AlignIO.t fail which appeared to be related to
> this:
>
> ...
> ok 122 - The object isa Bio::Align::AlignI
> ok 123 - consensus_string on metafasta
>
> not ok 124 - symbol_chars() using metafasta
> #   Failed test 'symbol_chars() using metafasta'
> #   in t/AlignIO.t at line 346.
> #          got: '0'
> #     expected: '23'
>
> It was b/c the symbol hash was initialized in the constructor (so it
> was present, just empty).  I have changed that in CVS; all tests pass
> now.
>
> chris
>
>
> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>
> > cvs commited now. it is calculated anyway when calling symbol_chars
> > so...
> >
> > On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Albert,
> >>
> >> Found it:
> >>
> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >> Bio/
> >> SimpleAlign.pm.diff?r1=1.36&r2=1.37
> >>
> >> If it slows performance that dramatically, maybe we can move this to
> >> a separate AlignUtils method instead.  Maybe something to ask Jason
> >> about?
> >>
> >> chris
> >>
> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
> >>> only
> >>> used if one calls the symbol_chars method?
> >>>
> >>> When I comment out this line:
> >>>
> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> >>> $seq->seq; # line 257
> >>>
> >>> I get a nice speed boost on loading alignments.
> >>>
> >>> Can I comment this line out in the CVS HEAD?
> >>>
> >>> Cheers,
> >>>
> >>>     Albert.
> >>>
> >>> [init] 5.96046447753906e-06 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.0022270679473877 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 2.14348912239075 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 6.91910791397095 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 15.8402290344238 secs...
> >>>
> >>> avilella at magneto:~$ perl
> >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> >>> ancestral_alleles.pl
> >>> -dir /home/avilella/ensembl/exoseq/test -verbose
> >>> [init] 1.21593475341797e-05 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.00294303894042969 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 0.510555982589722 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 1.6192569732666 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 3.86473417282104 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000203717.chr1.fasta]
> >>> 6.99602198600769 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000196188.chr1.fasta]
> >>> 7.26704716682434 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000025800.chr1.fasta]
> >>> 8.44332504272461 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000117475.chr1.fasta]
> >>> 12.103296995163 secs...
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Sun Nov 25 11:39:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 10:39:01 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
	<716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
Message-ID: <B849A608-7C12-4C87-BB93-D846959F0523@uiuc.edu>

Bernd,

That would be when generating Bio::LocatableSeq instances for  
building a Bio::SimpleAlign object.  Judging by test suite results  
that doesn't appear to be affected.

chris

On Nov 25, 2007, at 10:13 AM, Bernd Web wrote:

> Hi,
>
> I am not sure if this is related, but I remember SimpleAlign was
> adapted to cope with more gap symbols that can occur in
> alignments/FastA sequences, as: . _ - =
> Previous versions would throw an error on 'illegal' gap characters,
>
> Regards,
> Bernd
>
> On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> I was getting a single AlignIO.t fail which appeared to be related to
>> this:
>>
>> ...
>> ok 122 - The object isa Bio::Align::AlignI
>> ok 123 - consensus_string on metafasta
>>
>> not ok 124 - symbol_chars() using metafasta
>> #   Failed test 'symbol_chars() using metafasta'
>> #   in t/AlignIO.t at line 346.
>> #          got: '0'
>> #     expected: '23'
>>
>> It was b/c the symbol hash was initialized in the constructor (so it
>> was present, just empty).  I have changed that in CVS; all tests pass
>> now.
>>
>> chris
>>
>>
>> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>>
>>> cvs commited now. it is calculated anyway when calling symbol_chars
>>> so...
>>>
>>> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>>>> Albert,
>>>>
>>>> Found it:
>>>>
>>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
>>>> Bio/
>>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>>>
>>>> If it slows performance that dramatically, maybe we can move  
>>>> this to
>>>> a separate AlignUtils method instead.  Maybe something to ask Jason
>>>> about?
>>>>
>>>> chris
>>>>
>>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
>>>>> only
>>>>> used if one calls the symbol_chars method?
>>>>>
>>>>> When I comment out this line:
>>>>>
>>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>>>> $seq->seq; # line 257
>>>>>
>>>>> I get a nice speed boost on loading alignments.
>>>>>
>>>>> Can I comment this line out in the CVS HEAD?
>>>>>
>>>>> Cheers,
>>>>>
>>>>>     Albert.
>>>>>
>>>>> [init] 5.96046447753906e-06 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.0022270679473877 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 2.14348912239075 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 6.91910791397095 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 15.8402290344238 secs...
>>>>>
>>>>> avilella at magneto:~$ perl
>>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>>>> ancestral_alleles.pl
>>>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>>>> [init] 1.21593475341797e-05 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.00294303894042969 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 0.510555982589722 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 1.6192569732666 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 3.86473417282104 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000203717.chr1.fasta]
>>>>> 6.99602198600769 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000196188.chr1.fasta]
>>>>> 7.26704716682434 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000025800.chr1.fasta]
>>>>> 8.44332504272461 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000117475.chr1.fasta]
>>>>> 12.103296995163 secs...
>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 13:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 12:51:42 -0600
Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live
Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu>

I have been making some significant changes to  
Bio::SeqIO::staden::read over the last few months which incorporate  
code from Bugzilla (bugs 2074 and 2329, very kindly donated from  
Chris Bailey and Joel Martin, cheers!).

Significant Changes:

* All Inline code in staden::read are now XS-based
* A new method has been added to Bio::SeqIO::staden::read for  
optionally getting trace data (i.e. for drawing graphs).

The method ode is now implemented in Bio::SeqIO::abi, with example  
code in examples/quality/svgtrace.pl.  These changes should allow  
newer versions of Staden io_lib as well (the code is tested with  
io_lib 1.9.2), though they haven't been tested extensively as I am  
having problems compiling newer io_lib versions on my MacBook.  It's  
very likely more changes will need to be made along the way; some  
issues were found with XS compilation which appear harmless but need  
to be investigated, and trace data from other formats need to be  
evaluated.  The possibility exists that many of these changes break  
backward compatibility with older bioperl releases, though tests  
passed with bioperl 1.5.2.

Any feedback re: platform issues, test results using newer io_lib  
versions, older bioperl-versions, etc would be greatly appreciated.   
I'm hoping this will stimulate more interest in getting other bioperl- 
ext modules up-to-date with bioperl-live.

chris


From cjfields at uiuc.edu  Mon Nov 26 13:59:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 12:59:23 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
Message-ID: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>

Steve, Bernd, (and Jason, since you may have some input on this as  
well),

I am now looking into the bug Bernd submitted and it seems there is a  
serious discrepancy with the way the hit raw_score, bits, and  
significance is determined for Hit objects.  Unless I am mistaken  
these should always come from the best HSP when they are present,  
falling back to the hit table data only when no HSP alignments are  
present.  Under the latter conditions a minimal Hit object is made  
from data in the hit table, which reports the rounded bit score, not  
the raw score, so in those cases the raw score would be undefined  
(and you probably should get a nasty warning indicating there are no  
HSPs present to get the data from).

What is occurring now, though, is the raw_score and significance is  
explicitly set from the hit table in the BLAST parser for the Hit  
object at all times, while the bits are correctly derived from the  
best HSP (no fallback to the hit table).  Changing to the behavior  
above results in several tests failing via SearchIO.t, with each  
failed test reporting the expected (read:correct) raw score.

I'll look through the tests just in case, but I am planning on  
committing changes to the BLAST parsers, GenericHit, and SearchIO.t  
(to reflect the correct expected data) in the next day or two unless  
there are any objections.

chris

On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:

> On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
>> [snip]
>>
>> Further is is possible to get the raw_score of a hit. $hit->raw_score
>> actually gets the bitscore (w/o decimal point).
>
> Hmmm. raw_score should not be the same as bit score. So given an
> example blast hit line such as:
>
>        Score = 60.0 bits (30), Expect = 1e-06
>
> $hit->raw_score() should return 30, not 60, as you seem to be getting.
>
> Could you submit a bug report for this?  http://www.bioperl.org/ 
> wiki/Bugs
>
> Thanks,
> Steve
>
>>
>> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
>>> Hi Russell,
>>>
>>> I came across your question. At first I thought all was well on my
>>> system, but indeed I also have these colouring problems.
>>> I noted that scrore in the bgcolor callback gets a different value!
>>> Printing score during hit parsing($hit->raw_score) gives the same
>>> score as -description
>>> my $score = $feature->score; However, printing score in the bgcolor
>>> sub gives 2573!
>>> All scores in the bgcolor routine all different and higher than the
>>> real scores. Were you able to solve this colouring issue?
>>>
>>> Regards,
>>> Bernd
>>>
>>>
>>>> Hi all,
>>>> I'm using a modified version of Lincoln's tutorial
>>>> (http://www.bioperl.org/wiki/ 
>>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
>>>> and I'm colouring the HSPs by setting the -bgcolor by score with  
>>>> a sub
>>>> to give a similar image to that from NCBI but for some reason, my
>>>> colours are coming out wrong (see attached example)
>>>> They seem to be off by one but I can't see why.
>>>>
>>>> Any ideas?
>>>>
>>>> I can't be certain but I think it's only started doing this  
>>>> since our
>>>> BLAST upgrade to 2.2.17 a few weeks ago.
>>>>
>>>> Here's the colouring code:
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> -------
>>>> my $track = $panel->add_track(
>>>>                               -glyph       => 'segments',
>>>>                               -label       => 1,
>>>>                               -connector   => 'dashed',
>>>>                               -bgcolor     => sub {
>>>>                                 my $feature = shift;
>>>>                                 my $score = $feature->score;
>>>>                         return 'red'       if $score >= 200;
>>>>                                     return 'fuchsia' if $score  
>>>> >= 80;
>>>>                                     return 'lime'      if $score  
>>>> >= 50;
>>>>                         return 'blue'      if $score >= 40;
>>>>                                     return 'black';
>>>>                                },
>>>>                               -font2color  => 'gray',
>>>>                               -sort_order  => 'high_score',
>>>>                               -description => sub {
>>>>                                 my $feature = shift;
>>>>                                 return unless
>>>> $feature->has_tag('description');
>>>>                                 my ($description) =
>>>> $feature->each_tag_value('description');
>>>>                                 my $score = $feature->score;
>>>>                                 "$description, score=$score";
>>>>                                },
>>>>                              );
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> ---------
>>>>
>>>>
>>>> Thanx,
>>>>
>>>> Russell Smithies
>>>>
>>>>
>>>>
>>>>
>>>> =================================================================== 
>>>> ====
>>>> Attention: The information contained in this message and/or  
>>>> attachments
>>>> from AgResearch Limited is intended only for the persons or  
>>>> entities
>>>> to which it is addressed and may contain confidential and/or  
>>>> privileged
>>>> material. Any review, retransmission, dissemination or other use  
>>>> of, or
>>>> taking of any action in reliance upon, this information by  
>>>> persons or
>>>> entities other than the intended recipients is prohibited by  
>>>> AgResearch
>>>> Limited. If you have received this message in error, please  
>>>> notify the
>>>> sender immediately.
>>>> =================================================================== 
>>>> ====
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Mon Nov 26 14:08:41 2007
From: arvindvanam at gmail.com (vanam)
Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
Message-ID: <13955209.post@talk.nabble.com>


i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m
unable to find a downloadable version.all ther is a web interface for it.
can u tell frm wher to fdownload it????

or can you just tell me how to set the location in piseapplication to
localhost n wat to enter in the $email variable????
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Nov 26 15:08:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 14:08:24 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13955209.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
	<13955209.post@talk.nabble.com>
Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu>


On Nov 26, 2007, at 1:08 PM, vanam wrote:

> i searches for the embassy version of RNAFOLD (i guess its  
> vrnafold) but i m
> unable to find a downloadable version.all ther is a web interface  
> for it.
> can u tell frm wher to fdownload it????

You will need to install EMBOSS as well as the EMBASSY version of  
VIENNA (something which is documented in the docs that come along  
with the distributions and I will not go into detail on):

ftp://emboss.open-bio.org/pub/EMBOSS/

This would be your best bet.  Understand that there is no specific  
class framework for dealing with RNA secondary structure in BioPerl,  
so you will be on your own for now.

My suggestion for using Pise had the very important caveats that (1)  
it very well may not work, (2) I have no experience with Pise, let  
alone setting it up locally, therefore (3) I haven't tested it (and  
don't intend to, as I don't have the time).

> or can you just tell me how to set the location in piseapplication to
> localhost n wat to enter in the $email variable????

I have pointed out documentation previously which comes with the  
modules in question.  Remember perldoc is your friend; consulting it  
saves me (and everyone else) time.

 From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise':

----------------------------------------------

DESCRIPTION

Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli-
cation objects, that let you submit jobs on a Pise server.

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                             -email => 'me at myhome');

The email is optional (there is default one). It can be useful, though.
Your program might enter infinite loops, or just run many jobs: the
Pise server maintainer needs a contact (s/he could of course cancel any
requests from your address...). And if you plan to run a lot of heavy
jobs, or to do a course with many students, please ask the maintainer
before.

The location parameter stands for the actual CGI location, except when
set at the factory creation step, where it is rather the root of all
CGI.  There are default values for most of Pise programs.

You can either set location at:

1 factory creation:
      my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                     -location => 'http://somewhere/ 
Pise/cgi-bin',
                                     -email => 'me at myhome');

2 program creation:
      my $program = $factory->program('water',
                               -location => 'http://somewhere/Pise/ 
cgi-bin/water.pl'
                                      );

3 any time before running:
      $program->location('http://somewhere/Pise/cgi-bin/water.pl');
      $job = $program->run();

4 when running:
      $job = $program->run(-location => 'http://somewhere/Pise/cgi- 
bin/water.pl');

You can also retrieve a previous job results by providing its url:

   $job = $factory->job($url);

You get the url of a job by:

   $job->jobid;

----------------------------------------------

chris


From sac at bioperl.org  Mon Nov 26 20:41:59 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 17:41:59 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>

Chris,

Cood catch. You're on track here with one exception: WU blast and NCBI
blast behave differently in what they report in the hit table: WU
blast puts the raw score in the table not the bit score as NCBI blast
does (see example below for reference). WU blast also swaps their
location in the HSP header relative to how NCBI reports it. It would
be good to verify that the blast parser isn't befuddled by this. A
quick look at SearchIO::blast and it appears that data from the hit
table is always getting stored as score, not bits for WU blast. Not
sure if the HSP section data are parsed correctly. I'd recommend
looking into these things when you do your fixes.

So in the end, WU blast HSPs that are built from the hit table should
report a value for raw_score and punt on bits, but NCBI HSPs so
constructed should do the opposite. The downside to this arrangement
is that code that works for NCBI blast hits will need modification to
work for WU blast hits, but that is just the nature of the data. It
shouldn't be an issue for the majority of users that stick with one
flavor of blast and don't switch back and forth, or for users that get
their HSP scoring data from HSP sections rather than relying on the
hit table.

Ideally, the HSP object would know whether it was NCBI or WU-based and
issue an informative warning when attempting to access data it doesn't
have. One solution might be for the parser to put a 'WU-' in front of
the algorithm name for WU blast reports, so it would then be available
for the contained hit/hsp objects. This could break anything dependent
on algorithm name, so it would need some testing.

Steve

Example WU blast table and HSP header:
                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh...  4141  0.0       1
gb|AAC76922.1| (AE000468) aspartokinase II and homoserine...   844  3.1e-86   1
gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi...   483  2.8e-47   1
gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c...    97  0.0010    1

>gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I
            [Escherichia coli]
        Length = 820

 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0
 Identities = 820/820 (100%), Positives = 820/820 (100%)


Example NCBI blast table and HSP header:

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189...
115   8e-26

>ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397
           transcript:ENST00000357569
          Length = 425

 Score =  120 bits (301), Expect = 3e-27
 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%)


On Nov 26, 2007 10:59 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Steve, Bernd, (and Jason, since you may have some input on this as
> well),
>
> I am now looking into the bug Bernd submitted and it seems there is a
> serious discrepancy with the way the hit raw_score, bits, and
> significance is determined for Hit objects.  Unless I am mistaken
> these should always come from the best HSP when they are present,
> falling back to the hit table data only when no HSP alignments are
> present.  Under the latter conditions a minimal Hit object is made
> from data in the hit table, which reports the rounded bit score, not
> the raw score, so in those cases the raw score would be undefined
> (and you probably should get a nasty warning indicating there are no
> HSPs present to get the data from).
>
> What is occurring now, though, is the raw_score and significance is
> explicitly set from the hit table in the BLAST parser for the Hit
> object at all times, while the bits are correctly derived from the
> best HSP (no fallback to the hit table).  Changing to the behavior
> above results in several tests failing via SearchIO.t, with each
> failed test reporting the expected (read:correct) raw score.
>
> I'll look through the tests just in case, but I am planning on
> committing changes to the BLAST parsers, GenericHit, and SearchIO.t
> (to reflect the correct expected data) in the next day or two unless
> there are any objections.
>
> chris
>
>
> On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:
>
> > On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> >> [snip]
> >>
> >> Further is is possible to get the raw_score of a hit. $hit->raw_score
> >> actually gets the bitscore (w/o decimal point).
> >
> > Hmmm. raw_score should not be the same as bit score. So given an
> > example blast hit line such as:
> >
> >        Score = 60.0 bits (30), Expect = 1e-06
> >
> > $hit->raw_score() should return 30, not 60, as you seem to be getting.
> >
> > Could you submit a bug report for this?  http://www.bioperl.org/
> > wiki/Bugs
> >
> > Thanks,
> > Steve
> >
> >>
> >> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> >>> Hi Russell,
> >>>
> >>> I came across your question. At first I thought all was well on my
> >>> system, but indeed I also have these colouring problems.
> >>> I noted that scrore in the bgcolor callback gets a different value!
> >>> Printing score during hit parsing($hit->raw_score) gives the same
> >>> score as -description
> >>> my $score = $feature->score; However, printing score in the bgcolor
> >>> sub gives 2573!
> >>> All scores in the bgcolor routine all different and higher than the
> >>> real scores. Were you able to solve this colouring issue?
> >>>
> >>> Regards,
> >>> Bernd
> >>>
> >>>
> >>>> Hi all,
> >>>> I'm using a modified version of Lincoln's tutorial
> >>>> (http://www.bioperl.org/wiki/
> >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
> >>>> and I'm colouring the HSPs by setting the -bgcolor by score with
> >>>> a sub
> >>>> to give a similar image to that from NCBI but for some reason, my
> >>>> colours are coming out wrong (see attached example)
> >>>> They seem to be off by one but I can't see why.
> >>>>
> >>>> Any ideas?
> >>>>
> >>>> I can't be certain but I think it's only started doing this
> >>>> since our
> >>>> BLAST upgrade to 2.2.17 a few weeks ago.
> >>>>
> >>>> Here's the colouring code:
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> -------
> >>>> my $track = $panel->add_track(
> >>>>                               -glyph       => 'segments',
> >>>>                               -label       => 1,
> >>>>                               -connector   => 'dashed',
> >>>>                               -bgcolor     => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 my $score = $feature->score;
> >>>>                         return 'red'       if $score >= 200;
> >>>>                                     return 'fuchsia' if $score
> >>>> >= 80;
> >>>>                                     return 'lime'      if $score
> >>>> >= 50;
> >>>>                         return 'blue'      if $score >= 40;
> >>>>                                     return 'black';
> >>>>                                },
> >>>>                               -font2color  => 'gray',
> >>>>                               -sort_order  => 'high_score',
> >>>>                               -description => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 return unless
> >>>> $feature->has_tag('description');
> >>>>                                 my ($description) =
> >>>> $feature->each_tag_value('description');
> >>>>                                 my $score = $feature->score;
> >>>>                                 "$description, score=$score";
> >>>>                                },
> >>>>                              );
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> ---------
> >>>>
> >>>>
> >>>> Thanx,
> >>>>
> >>>> Russell Smithies
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ===================================================================
> >>>> ====
> >>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>> from AgResearch Limited is intended only for the persons or
> >>>> entities
> >>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>> material. Any review, retransmission, dissemination or other use
> >>>> of, or
> >>>> taking of any action in reliance upon, this information by
> >>>> persons or
> >>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>> Limited. If you have received this message in error, please
> >>>> notify the
> >>>> sender immediately.
> >>>> ===================================================================
> >>>> ====
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From sac at bioperl.org  Mon Nov 26 22:27:09 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 19:27:09 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com>

Hi Jon,

I'd recommend downloading it into a separate location of your choosing
(~/lib/bioperl-ext for example) and running the installer as specified
in the docs that come with the download. Then you can include the
location you installed it into via a "use lib '~/lib/bioperl-ext'"
statement at the top of your script. It may be handy to install it as
a non-root user so that you don't alter the main perl installation.

This way your ext install will stay separate from your main bioperl
and perl installations.

There are some docs about the ext packages you might want to check out
at http://www.bioperl.org/wiki/Ext_package.

Steve

On Nov 21, 2007 4:35 PM, Jonathan Binkley <binkley at genome.stanford.edu> wrote:
> Hi,
>
> I installed bioperl on a Mac (OS 10.4, Intel) via fink,
> which put it here:
>
> /sw/lib/perl5/5.8.6/Bio/
>
> It seems to work fine, but I need bioperl-ext for
> Smith-Waterman alignments.
>
> So, into which directory should I download bioperl-ext and
> run the Makefile?
>
> Thanks.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From a_arya2000 at yahoo.com  Tue Nov 27 09:51:41 2007
From: a_arya2000 at yahoo.com (a_arya2000)
Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST)
Subject: [Bioperl-l] Bioperl-ext test fails
Message-ID: <615478.1036.qm@web60113.mail.yahoo.com>

Hello,
I downloaded latest bioperl-ext from bioperl website,
and I have io_lib v1.8.11 installed, and I was trying
to install Bio::SeqIO::staden::read (of bioperl-ext).
It compiled fine without any error but when I run make
test I got following output. 


ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib/lib', 'blib/arch')" t/*.t
t/staden_read....ok 3/94# Test 7 got: "0"
(t/staden_read.t at line 110 *TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
#  t/staden_read.t line 110 is:         ok(0, undef,
"We don't have the ability to write files for $format
format") for 1..7;
# Test 8 got: "0" (t/staden_read.t at line 110 fail #2
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 9 got: "0" (t/staden_read.t at line 110 fail #3
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 10 got: "0" (t/staden_read.t at line 110 fail
#4 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 11 got: "0" (t/staden_read.t at line 110 fail
#5 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 12 got: "0" (t/staden_read.t at line 110 fail
#6 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 13 got: "0" (t/staden_read.t at line 110 fail
#7 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 14 got: "0" (t/staden_read.t at line 62 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
#  t/staden_read.t line 62 is:  ok(0, undef, "Still
missing test files for $format format") for
(1..$formatlooptests);
# Test 15 got: "0" (t/staden_read.t at line 62 fail #2
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 16 got: "0" (t/staden_read.t at line 62 fail #3
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 17 got: "0" (t/staden_read.t at line 62 fail #4
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 18 got: "0" (t/staden_read.t at line 62 fail #5
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 19 got: "0" (t/staden_read.t at line 62 fail #6
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 20 got: "0" (t/staden_read.t at line 62 fail #7
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 21 got: "0" (t/staden_read.t at line 62 fail #8
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 22 got: "0" (t/staden_read.t at line 62 fail #9
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 23 got: "0" (t/staden_read.t at line 62 fail
#10 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 24 got: "0" (t/staden_read.t at line 62 fail
#11 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 25 got: "0" (t/staden_read.t at line 62 fail
#12 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 31 got: "0" (t/staden_read.t at line 107
*TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
#  t/staden_read.t line 107 is:             ok(0,
undef, "Can't write valid ctf files until we have a
trace object") for 1..7;
# Test 32 got: "0" (t/staden_read.t at line 107 fail
#2 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 33 got: "0" (t/staden_read.t at line 107 fail
#3 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 34 got: "0" (t/staden_read.t at line 107 fail
#4 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 35 got: "0" (t/staden_read.t at line 107 fail
#5 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 36 got: "0" (t/staden_read.t at line 107 fail
#6 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 37 got: "0" (t/staden_read.t at line 107 fail
#7 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
t/staden_read....ok                                   
                      
All tests successful.
Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
0.15 csys =  1.71 CPU)


Anyone has any idea what might be going wrong here? By
the way, my OS is Linux. Thank you very much.

Arya


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/


From bix at sendu.me.uk  Tue Nov 27 10:41:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 15:41:38 +0000
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com>
References: <615478.1036.qm@web60113.mail.yahoo.com>
Message-ID: <474C3AB2.5050208@sendu.me.uk>

a_arya2000 wrote:
> Hello,
> I downloaded latest bioperl-ext from bioperl website,
> and I have io_lib v1.8.11 installed, and I was trying
> to install Bio::SeqIO::staden::read (of bioperl-ext).
> It compiled fine without any error but when I run make
> test I got following output. 
[...]
> All tests successful.
> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
> 0.15 csys =  1.71 CPU)
> 
> 
> Anyone has any idea what might be going wrong here? By
> the way, my OS is Linux. Thank you very much.

Not being familiar with the test script or ext, I can at least say that 
nothing actually went wrong: 'All tests successful'. Apparently there 
are some things in the test script that are known by the author to not 
work quite right, so he marked them as 'todo'. The problems seem 
harmless in any case, with things returning 0 instead of undef.

So, unless you've reason to believe there is something significant going 
on, all is well.


From alison.waller at utoronto.ca  Mon Nov 26 16:06:35 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Mon, 26 Nov 2007 16:06:35 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
Message-ID: <005a01c83070$3a814580$d81efea9@AWALL>

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs help
with bioinformatics so I can analyze huge amounts of data to finish my
thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want to
have the flexibility to obtain information on say the top5.  I am pretty
sure I have done this wrong, any advice on how to correct my script to do
this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many hits
to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From bix at sendu.me.uk  Tue Nov 27 12:01:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 17:01:36 +0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <474C4D70.2010206@sendu.me.uk>

alison waller wrote:
> I am trying to write a script that will parse large blast files (usually
> blastx) I also want to be able to specify how many hits I want to report
> information on.
> 
> Most of the time I will only want information on the top hit, but I want to
> have the flexibility to obtain information on say the top5.  I am pretty
> sure I have done this wrong, any advice on how to correct my script to do
> this, would be great.

[snip]

>       if ($top_hit=$result->next_hit) # this might be wrong - I want to
> specify how many hits to print results for

I didn't really pay attention to the rest of your code, but assuming it 
all works except for only ever giving you info for the top hit, you just 
need to change this 'if' to a loop of some kind.

# ...
my $hits = 0;

while (my $hit = $result->next_hit) {
  $hits++;
  last if $hits > $tophit;
  # ...
}


From David.Messina at sbc.su.se  Tue Nov 27 12:55:44 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 27 Nov 2007 18:55:44 +0100
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <474C4D70.2010206@sendu.me.uk>
References: <005a01c83070$3a814580$d81efea9@AWALL>
	<474C4D70.2010206@sendu.me.uk>
Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>

Hi Alison,
As Sendu mentioned, the key bit is adding a condition to the hit loop to
limit the number of hits that are printed. I didn't test the below
extensively, but give it a try...


Dave


#!/usr/local/bin/perl -w

# Parsing BLAST reports with BioPerl's Bio::SearchIO module
# alison waller November 2007

use strict;
use warnings;
use Bio::SearchIO;

my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }

my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query

#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";

my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);

print OUT
  "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

# Go through BLAST reports one by one
while ( my $result = $report->next_result ) {
    my $i = 0;
    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
        while ( my $hsp = $hit->next_hsp ) {

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t";
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t";
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t";
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n";
        }
    }

    if ($i == 0) { print OUT "no hits\n"; }
}


From Russell.Smithies at agresearch.co.nz  Tue Nov 27 14:31:29 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 28 Nov 2007 08:31:29 +1300
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>

Do the hits need to be sorted first or is this done automagicly?
I ask this as I know Megablast doesn't provide sorted output for most of
it's formats.

Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Dave Messina
> Sent: Wednesday, 28 November 2007 6:56 a.m.
> To: alison waller
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
> 
> Hi Alison,
> As Sendu mentioned, the key bit is adding a condition to the hit loop
to
> limit the number of hits that are printed. I didn't test the below
> extensively, but give it a try...
> 
> 
> Dave
> 
> 
> 
> #!/usr/local/bin/perl -w
> 
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
> 
> use strict;
> use warnings;
> use Bio::SearchIO;
> 
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
hits>\n";
> if (@ARGV != 2) { die $usage; }
> 
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                         # to report for each query
> 
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
$!\n";
> 
> my $report = new Bio::SearchIO(
>     -file   => "$infile",
>     -format => "blast"
> );
> 
> print OUT
>
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
> Qstrand\tHstrand\n";
> 
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>     my $i = 0;
>     while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>         while ( my $hsp = $hit->next_hsp ) {
> 
>             # Print some tab-delimited data about this hit
>             print OUT $result->query_name,     "\t";
>             print OUT $hit->name,              "\t";
>             print OUT $hit->significance,      "\t";
>             print OUT $hit->bits,              "\t";
>             print OUT $hsp->evalue,            "\t";
>             print OUT $hsp->percent_identity,  "\t";
>             print OUT $hsp->length('total'),   "\t";
>             print OUT $hsp->num_identical,     "\t";
>             print OUT $hsp->gaps,              "\t";
>             print OUT $hsp->strand('query'),   "\t";
>             print OUT $hsp->strand('hit'),     "\n";
>         }
>     }
> 
>     if ($i == 0) { print OUT "no hits\n"; }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Nov 27 16:09:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:09:43 -0600
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <474C3AB2.5050208@sendu.me.uk>
References: <615478.1036.qm@web60113.mail.yahoo.com>
	<474C3AB2.5050208@sendu.me.uk>
Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu>

You can always test it within the bioperl suite after it's installed;  
several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read.  In general  
though if it's passing tests it should be fine.

chris

On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote:

> a_arya2000 wrote:
>> Hello,
>> I downloaded latest bioperl-ext from bioperl website,
>> and I have io_lib v1.8.11 installed, and I was trying
>> to install Bio::SeqIO::staden::read (of bioperl-ext).
>> It compiled fine without any error but when I run make
>> test I got following output.
> [...]
>> All tests successful.
>> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr +
>> 0.15 csys =  1.71 CPU)
>>
>>
>> Anyone has any idea what might be going wrong here? By
>> the way, my OS is Linux. Thank you very much.
>
> Not being familiar with the test script or ext, I can at least say  
> that
> nothing actually went wrong: 'All tests successful'. Apparently there
> are some things in the test script that are known by the author to not
> work quite right, so he marked them as 'todo'. The problems seem
> harmless in any case, with things returning 0 instead of undef.
>
> So, unless you've reason to believe there is something significant  
> going
> on, all is well.


From cjfields at uiuc.edu  Tue Nov 27 16:00:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:00:33 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Tue Nov 27 20:06:30 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT)
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
Message-ID: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>

Hello,

I was playing around with Primer3, and I hit a problem. Not sure if it's a
bug or if I was doing something I wasn't supposed to, but if it's the
latter, I thought it might save someone else half an hour of banging their
head of a keyboard if I mentioned it:

What I was doing was roughly:

# create a primer3 obj
my $p3 = ...Primer3->new();

# loop through some sequences generating primers for
# each of them using the same primer3 obj
while (@some_bio_seqs){
  my $res = $p3->run;
  ...
}

This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC,
at which point it worked for a few sequences then I got a "can't place
primer on sequence"  error.

After a bit of faffing about, I think the problem occurs when no primers
are found. In which case $p3 still has the primers from the previous run,
which don't come from the current sequence, so can't be placed on it. I
tried calling $p3->cleanup in the loop, but that didn't work either.
Creating a new $p3 every time works fine.

Are you supposed to create a new Primer3 object for every sequence?
(Apologies if I missed the relevant bit of the docs).

Cheers,
Cass xx


From alison.waller at utoronto.ca  Tue Nov 27 16:32:07 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Tue, 27 Nov 2007 16:32:07 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>
Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>

Thanks Everyone,

Your edits worked Dave, however after looking at the output I realized that
I only want information on the top hsp per query returned.  For example some
of the querys the top hit has two hsps so it returned both.

I tried to further edit it, but after 3 attempts they are all failing, I
think due to me using the loops wrong.

I also have another problem, I also want to retrieve the gi, this doesn't
seem to be straight forward as it should.  I found another method
_get_seq_identifiers, but this looks awkward, isn't there and object for the
gi?

I've pasted my non-working script below if there are any suggestions on how
to get it to print out just the first hsp per hit, that would be great.

Thanks,


#!/usr/local/bin/perl -w


# Parsing BLAST reports with BioPerl's Bio::SearchIO module 
# alison waller November 2007


use strict;
use warnings;
use Bio::SearchIO;


my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }


my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query


#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";


my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);


print OUT
 
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
strand\tHstrand\n";


# Go through BLAST reports one by one
while (my $result = $report->next_result) {
	my $i=0;
	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
        

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t"; 
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t"; 
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t"; 
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n"; 
        }
}
    if ($i == 0) { print OUT "no hits\n"; } 

}

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, November 27, 2007 4:01 PM
To: Smithies, Russell
Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dennis.prickett at bbsrc.ac.uk  Wed Nov 28 05:18:26 2007
From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C))
Date: Wed, 28 Nov 2007 10:18:26 -0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk>

Dear Alison
 
Or, if you are absolutely only interested in the top hit you could limit
it to that in the blast  command by adding the parameters  " -b 1 ".  

This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps,
etc).  Your blasts run faster and then you won't have to worry about how
to parse out the top blast hit(s).

However, if there are any caveats for using this parameter that I am not
aware of please let us know. 

Dennis Prickett
Institute of Animal Health
Compton, nr Newbury
RG2 9FS
United Kingdom


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller
Sent: 26 November 2007 21:07
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] help using SEARCH IO to parse blast results

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs
help with bioinformatics so I can analyze huge amounts of data to finish
my thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want
to have the flexibility to obtain information on say the top5.  I am
pretty sure I have done this wrong, any advice on how to correct my
script to do this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many
hits to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From t.nugent at cs.ucl.ac.uk  Wed Nov 28 08:10:41 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Wed, 28 Nov 2007 13:10:41 +0000
Subject: [Bioperl-l] Helical Wheel module
Message-ID: <474D68D1.3080602@cs.ucl.ac.uk>

Hi everyone,

I've been drawing a lot of helical wheels recently so put all my code 
into a module. I don't think there's anything in bioperl to do this yet 
though there are a few programs written in perl and flash on the web to 
do the same thing. I was thinking this could fit into biographics. Has 
lots of options to adjust labels, colours, ttf fonts and can output to 
png & svg.

Tim

...

Here's the output, converted to jpg from svg:
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg

Module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz

Works like this:

use DrawHelicalWheel;

my $im = DrawHelicalWheel->new(-title=>$title,
                               -sequence=>$sequence,
                               -helices=>\@helices,
                               -ttf_font=>$font);
open(OUTPUT, ">$svg");
binmode OUTPUT;
print OUTPUT $im->svg;
close OUTPUT;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From tristan.lefebure at gmail.com  Wed Nov 28 10:46:11 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:46:11 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281046.11146.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan


From bix at sendu.me.uk  Wed Nov 28 11:19:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Nov 2007 16:19:36 +0000
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <200711281046.11146.tnl7@cornell.edu>
References: <200711281046.11146.tnl7@cornell.edu>
Message-ID: <474D9518.7010201@sendu.me.uk>

Tristan Lefebure wrote:
> Hello!
> 
> I was wondering if there was a function to remove sites/columns of an 
> alignment. Something like: $aln->remove_sites(@sites_to_remove)
> I looked around Bio::SimpleAlign but did not find exactly that. There is 
> remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

You might want to take a second look at the docs. You can supply column 
number ranges to remove_columns(), so it does exactly what you want.


From tnl7 at cornell.edu  Wed Nov 28 10:44:17 2007
From: tnl7 at cornell.edu (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:44:17 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281044.17770.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan


From cjfields at uiuc.edu  Wed Nov 28 08:57:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:57:27 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
Message-ID: <B3E0F9EA-9452-483E-AA17-5174B743B164@uiuc.edu>

I had some code which does this which I committed yesterday to CVS; it  
catches the GI for the query and the hits:

$result->query_gi;
$hit->ncbi_gi;

I am in the midst of fixing additional problems with WU-BLAST parsing  
but you are more than welcome to try it.

chris

On Nov 27, 2007, at 3:32 PM, alison waller wrote:

> Thanks Everyone,
>
> Your edits worked Dave, however after looking at the output I  
> realized that
> I only want information on the top hsp per query returned.  For  
> example some
> of the querys the top hit has two hsps so it returned both.
>
> I tried to further edit it, but after 3 attempts they are all  
> failing, I
> think due to me using the loops wrong.
>
> I also have another problem, I also want to retrieve the gi, this  
> doesn't
> seem to be straight forward as it should.  I found another method
> _get_seq_identifiers, but this looks awkward, isn't there and object  
> for the
> gi?
>
> I've pasted my non-working script below if there are any suggestions  
> on how
> to get it to print out just the first hsp per hit, that would be  
> great.
>
> Thanks,
>
>
> #!/usr/local/bin/perl -w
>
>
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
>
>
> use strict;
> use warnings;
> use Bio::SearchIO;
>
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits>\n";
> if (@ARGV != 2) { die $usage; }
>
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                        # to report for each query
>
>
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
>
> my $report = new Bio::SearchIO(
>    -file   => "$infile",
>    -format => "blast"
> );
>
>
> print OUT
>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tgaps\t
> strand\tHstrand\n";
>
>
> # Go through BLAST reports one by one
> while (my $result = $report->next_result) {
> 	my $i=0;
> 	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
>    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
>
>
>            # Print some tab-delimited data about this hit
>            print OUT $result->query_name,     "\t";
>            print OUT $hit->name,              "\t";
>            print OUT $hit->significance,      "\t";
>            print OUT $hit->bits,              "\t";
>            print OUT $hsp->evalue,            "\t";
>            print OUT $hsp->percent_identity,  "\t";
>            print OUT $hsp->length('total'),   "\t";
>            print OUT $hsp->num_identical,     "\t";
>            print OUT $hsp->gaps,              "\t";
>            print OUT $hsp->strand('query'),   "\t";
>            print OUT $hsp->strand('hit'),     "\n";
>        }
> }
>    if ($i == 0) { print OUT "no hits\n"; }
>
> }
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, November 27, 2007 4:01 PM
> To: Smithies, Russell
> Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>
> The hits/HSPs are generally in the order they appear in the report.
>
> If you are looking for best/worst HSP after parsing you can use the
> $hit->hsp() method:
>
> # best and worst
> my $best = $hit->hsp('best'); # also 'first'
> my $worst = $hit->hsp('worst'); # also last
>
> The SearchIO text BLAST parser also has several options implemented
> for finer control:
>
>     -inclusion_threshold => e-value threshold for inclusion in the
>                             PSI-BLAST score matrix model (blastpgp)
>     -signif      => float or scientific notation number to be used
>                     as a P- or Expect value cutoff
>     -score       => integer or scientific notation number to be used
>                     as a blast score value cutoff
>     -bits        => integer or scientific notation number to be used
>                     as a bit score value cutoff
>     -hit_filter  => reference to a function to be used for
>                     filtering hits based on arbitrary criteria.
>                     All hits of each BLAST report must satisfy
>                     this criteria to be retained.
>                     If a hit fails this test, it is ignored.
>                     This function should take a
>                     Bio::Search::Hit::BlastHit.pm object as its first
>                     argument and return true
>                     if the hit should be retained.
>                     Sample filter function:
>                        -hit_filter => sub { $hit = shift;
>                                             $hit->gaps == 0; },
>                     (Note: -filt_func is synonymous with -hit_filter)
>     -overlap     => integer. The amount of overlap to permit between
>                     adjacent HSPs when tiling HSPs. A reasonable
> value is 2.
>                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.
>
> chris
>
> On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:
>
>> Do the hits need to be sorted first or is this done automagicly?
>> I ask this as I know Megablast doesn't provide sorted output for
>> most of
>> it's formats.
>>
>> Russell
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-
>>> bio.org] On Behalf Of Dave Messina
>>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>>> To: alison waller
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>>
>>> Hi Alison,
>>> As Sendu mentioned, the key bit is adding a condition to the hit  
>>> loop
>> to
>>> limit the number of hits that are printed. I didn't test the below
>>> extensively, but give it a try...
>>>
>>>
>>> Dave
>>>
>>>
>>>
>>> #!/usr/local/bin/perl -w
>>>
>>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>>> # alison waller November 2007
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SearchIO;
>>>
>>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
>> hits>\n";
>>> if (@ARGV != 2) { die $usage; }
>>>
>>> my $infile  = $ARGV[0];
>>> my $outfile = $infile . '.parsed';
>>> my $tophit  = $ARGV[1]; # to specify in the command line how many
>>> hits
>>>                       # to report for each query
>>>
>>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!
>>> \n";
>>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
>> $!\n";
>>>
>>> my $report = new Bio::SearchIO(
>>>   -file   => "$infile",
>>>   -format => "blast"
>>> );
>>>
>>> print OUT
>>>
>> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent
>> \tga
>> ps\t
>>> Qstrand\tHstrand\n";
>>>
>>> # Go through BLAST reports one by one
>>> while ( my $result = $report->next_result ) {
>>>   my $i = 0;
>>>   while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>>       while ( my $hsp = $hit->next_hsp ) {
>>>
>>>           # Print some tab-delimited data about this hit
>>>           print OUT $result->query_name,     "\t";
>>>           print OUT $hit->name,              "\t";
>>>           print OUT $hit->significance,      "\t";
>>>           print OUT $hit->bits,              "\t";
>>>           print OUT $hsp->evalue,            "\t";
>>>           print OUT $hsp->percent_identity,  "\t";
>>>           print OUT $hsp->length('total'),   "\t";
>>>           print OUT $hsp->num_identical,     "\t";
>>>           print OUT $hsp->gaps,              "\t";
>>>           print OUT $hsp->strand('query'),   "\t";
>>>           print OUT $hsp->strand('hit'),     "\n";
>>>       }
>>>   }
>>>
>>>   if ($i == 0) { print OUT "no hits\n"; }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =
>> = 
>> =====================================================================
>> Attention: The information contained in this message and/or
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or
>> privileged
>> material. Any review, retransmission, dissemination or other use of,
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by
>> AgResearch
>> Limited. If you have received this message in error, please notify  
>> the
>> sender immediately.
>> =
>> = 
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 08:54:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:54:39 -0600
Subject: [Bioperl-l] Helical Wheel module
In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk>
References: <474D68D1.3080602@cs.ucl.ac.uk>
Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu>

Looks good!  We recently added in your transmembrane module, so we  
could definitely add this in.

chris

On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've been drawing a lot of helical wheels recently so put all my code
> into a module. I don't think there's anything in bioperl to do this  
> yet
> though there are a few programs written in perl and flash on the web  
> to
> do the same thing. I was thinking this could fit into biographics. Has
> lots of options to adjust labels, colours, ttf fonts and can output to
> png & svg.
>
> Tim
>
> ...
>
> Here's the output, converted to jpg from svg:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg
>
> Module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz
>
> Works like this:
>
> use DrawHelicalWheel;
>
> my $im = DrawHelicalWheel->new(-title=>$title,
>                               -sequence=>$sequence,
>                               -helices=>\@helices,
>                               -ttf_font=>$font);
> open(OUTPUT, ">$svg");
> binmode OUTPUT;
> print OUTPUT $im->svg;
> close OUTPUT;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> http://www.cs.ucl.ac.uk/staff/T.Nugent
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 13:43:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 12:43:58 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
	<8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu>


On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote:

> Chris,
>
> Cood catch. You're on track here with one exception: WU blast and NCBI
> blast behave differently in what they report in the hit table: WU
> blast puts the raw score in the table not the bit score as NCBI blast
> does (see example below for reference). WU blast also swaps their
> location in the HSP header relative to how NCBI reports it. It would
> be good to verify that the blast parser isn't befuddled by this. A
> quick look at SearchIO::blast and it appears that data from the hit
> table is always getting stored as score, not bits for WU blast. Not
> sure if the HSP section data are parsed correctly. I'd recommend
> looking into these things when you do your fixes.

What I have now after commits is:

GenericHit - use the best HSP when possible for bits, score/raw_score,  
significance.  When there is no HSP, construct a minimal Hit object  
using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST  
maps to bits(), both map evalue/pvalue to significance).  HSP mapping  
seems to be correct.

One issue that has popped up is GenericHit::significance  
preferentially uses the best HSP.  However, GenericHSP::significance  
uses evalues preferentially over pvalues; both Expect and P appear to  
be parsed for WU-BLAST HSPs now (so the evalue is reported); this  
apparently wasn't always the case if I read the GenericHit docs  
correctly.  As NCBI BLAST doesn't report pvalues we could change that  
so it preferentially returns a pvalue if present, falling back to an  
evalue.   This would match what is found hit table more closely and  
resembles what is documented for the method (for significance(), WU- 
BLAST gets pvalues, NCBI BLAST gets evalues).

> So in the end, WU blast HSPs that are built from the hit table should
> report a value for raw_score and punt on bits, but NCBI HSPs so
> constructed should do the opposite. The downside to this arrangement
> is that code that works for NCBI blast hits will need modification to
> work for WU blast hits, but that is just the nature of the data. It
> shouldn't be an issue for the majority of users that stick with one
> flavor of blast and don't switch back and forth, or for users that get
> their HSP scoring data from HSP sections rather than relying on the
> hit table.

In general I get my data from the HSPs, so this shouldn't be a  
significant issue (bad pun).  I did find that changing it so that Hit  
objects use HSP data pointed out issues with test data; hit table raw/ 
bit scores were rounded from the HSP score data or vice versa since  
all data came from the hit table, so tests flunked.

I think changing the way minimal hit objects report data (particularly  
for NCBI BLAST) will lead to a lot of confusion unless we clarify  
warnings when one or the other is missing (as you also indicated).   
I'm working on that now.

> Ideally, the HSP object would know whether it was NCBI or WU-based and
> issue an informative warning when attempting to access data it doesn't
> have. One solution might be for the parser to put a 'WU-' in front of
> the algorithm name for WU blast reports, so it would then be available
> for the contained hit/hsp objects. This could break anything dependent
> on algorithm name, so it would need some testing.
>
> Steve


I can probably work around as noted above that unless you think it's  
warranted to add a 'WU' designation (the version info in the Result  
object has 'WashU' attached, so one could feasibly use that for  
distinguishing the two report types).

Anyway, I'm committing my first batch of fixes, the significance test  
will fail for at least a day until I can look into it more.

chris


From tristan.lefebure at gmail.com  Wed Nov 28 14:03:44 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 14:03:44 -0500
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <474D9518.7010201@sendu.me.uk>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
Message-ID: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>

Hoops. I was reading the BioPerl 1.4 documentation. Actually,
http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to
http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be
the 1.4documentation...

Thank you, it works great.


On Nov 28, 2007 11:19 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Tristan Lefebure wrote:
> > Hello!
> >
> > I was wondering if there was a function to remove sites/columns of an
> > alignment. Something like: $aln->remove_sites(@sites_to_remove)
> > I looked around Bio::SimpleAlign but did not find exactly that. There is
> > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch'
> criteria.
>
> You might want to take a second look at the docs. You can supply column
> number ranges to remove_columns(), so it does exactly what you want.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Russell.Smithies at agresearch.co.nz  Wed Nov 28 16:57:14 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 29 Nov 2007 10:57:14 +1300
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>

Has anyone got a good example of parsing ASN.1 with
Bio::SeqIO::entrezgene?
I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
and my Perl isn't that good  :-(

Russell
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From stefan.kirov at bms.com  Wed Nov 28 17:16:18 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time)
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
Message-ID: <Pine.WNT.4.64.0711281708590.21768@A103728.hpw.stf.bms.com>

Here is an example for GO, will send the one for KEGG later:
my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene',
 	-service_record=>'yes');#, -locuslink=>'convert');
while (my $seq=$eio->next_seq) {
 	my $gid=$seq->accession_number;
 	foreach my $ot ($ann->get_Annotations('OntologyTerm')) {
     		next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers
     		my $evid=$ot->comment;
     		$evid=~s/evidence: //i;
     		my @ref=$ot->term->get_references; #Really there should be just one?
     		my $id=$ot->identifier;
     		my $fid='GO:' . sprintf("%07u",$id);
     		print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n";
 	}
}
Please note there is a bug in the parser that makes it suck a lot of RAM. 
I am fixing this and will commit probably by the week's end- you will have 
to update at that point. If you work with few records this should not 
matter.
Stefan


On Thu, 29 Nov 2007, Smithies, Russell wrote:

> Has anyone got a good example of parsing ASN.1 with
> Bio::SeqIO::entrezgene?
> I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
> and my Perl isn't that good  :-(
>
> Russell
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Nov 29 18:06:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 17:06:42 -0600
Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml
Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu>

For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST  
parsing in Bio::SearchIO::blastxml (though it appears to be pretty  
stable!).  Since there isn't any easy way to distinguish between  
normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to  
BLAST, you have to indicate how the report is to be parsed by passing  
in a '-blasttype' parameter:

$searchio = Bio::SearchIO->new('-tempfile' => 1,
        '-format' => 'blastxml',
        '-file'   => 'psiblast.xml',
        '-blasttype' => 'psiblast');

Otherwise it chunks the individual iterations out as separate BLAST  
reports and parses them as separate reports.

Tests have also been added to SearchIO.t.  I will update the HOWTO and  
blastxml docs soon.

chris


From cjfields at uiuc.edu  Thu Nov 29 21:41:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 20:41:49 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
In-Reply-To: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
References: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu>

It's probably safer to create a new instance each time but it really  
shouldn't be necessary for a wrapper module; this sounds like a bug to  
me.  Could you file it in Bugzilla?

On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote:

> Hello,
>
> I was playing around with Primer3, and I hit a problem. Not sure if  
> it's a
> bug or if I was doing something I wasn't supposed to, but if it's the
> latter, I thought it might save someone else half an hour of banging  
> their
> head of a keyboard if I mentioned it:
>
> What I was doing was roughly:
>
> # create a primer3 obj
> my $p3 = ...Primer3->new();
>
> # loop through some sequences generating primers for
> # each of them using the same primer3 obj
> while (@some_bio_seqs){
>  my $res = $p3->run;
>  ...
> }
>
> This worked fine for a while, but broke when I tried to set  
> PRIMER_MIN_GC,
> at which point it worked for a few sequences then I got a "can't place
> primer on sequence"  error.
>
> After a bit of faffing about, I think the problem occurs when no  
> primers
> are found. In which case $p3 still has the primers from the previous  
> run,
> which don't come from the current sequence, so can't be placed on  
> it. I
> tried calling $p3->cleanup in the loop, but that didn't work either.
> Creating a new $p3 every time works fine.
>
> Are you supposed to create a new Primer3 object for every sequence?
> (Apologies if I missed the relevant bit of the docs).
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From paulhengen at coh.org  Wed Nov 28 20:20:42 2007
From: paulhengen at coh.org (Paul N. Hengen)
Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST)
Subject: [Bioperl-l]  Collecting genomic DNA sequences using Entrez IDs
Message-ID: <14017289.post@talk.nabble.com>


Hi.

I have a number of gene IDs from Entrez and I want to find the
start and end locations in the human genome. This seemed simple
enough, so I started working through some of the examples for
using the EntrezGene module at www.bioperl.org  Of course this
did not work because the core installation does not include this
module. So, I think I have two choices (1) install the module (how?),
or (2) find an easier way to get the locations in the human genome.
I want to use the locations to grab sequences out of the genome.
Can anyone offer advice on this? Thanks.

-Paul.

--
Paul N. Hengen, Ph.D.
Hematopoietic Stem Cell and Leukemia Research
City of Hope National Medical Center
1500 E. Duarte Road, Duarte, CA 91010 USA
mailto:paulhengen at coh.org

-- 
View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Viktor.Martyanov at Dartmouth.EDU  Thu Nov 29 15:20:19 2007
From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov)
Date: 29 Nov 2007 15:20:19 -0500
Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases
Message-ID: <193573097@newdonner.Dartmouth.EDU>

A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 445 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071129/a6380324/attachment-0003.bin>

From alison.waller at utoronto.ca  Thu Nov 29 11:20:59 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Thu, 29 Nov 2007 11:20:59 -0500
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from
	CVS)
Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL>

Hi all,

 
I would like to install the CVS version of bioperl  as I know of some code
changes that will be useful to me.  However, I am having problems installing
it.  

I am trying to install bioperl in my home directly on a linux cluster.  

 
I used

 
> cd bioperl-live

*       perl Build.PL -install /home/awaller

 
However after the build command I got a lot of errors.  Do I have to also
have perl installed in my home directory??  There is perl installed on the
cluster in /usr/bin.  Do I need to point to this or does Build.PL
automatically look there?  I noticed a few errors about not having
permission and a few about not being able to connect. I've copied a portion
of the messages after my Build.pl command.  

 
Any help would be appreciated,

 
alison 

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/02packages.details.txt.gz

Trying to get away with old file:

3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
/root/.cpan/sources/modules/02packages.details.txt.gz

Going to read /root/.cpan/sources/modules/02packages.details.txt.gz

  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT

 
  There's a new CPAN.pm version (v1.9204) available!

  [Current version is v1.7601]

  You might want to try

    install Bundle::CPAN

    reload cpan

  without quitting the current session. It should be a seamless upgrade

  while we are running...

 
Warning: You are not allowed to write into directory
"/root/.cpan/sources/modules".

    I'll continue, but if you encounter problems, they may be due

    to insufficient permissions.

Fetching with LWP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[Cannot write to
'/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied]

Fetching with Net::FTP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from ftp.nrc.ca

Fetching with LWP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[FTP close response: 500 Network seems to
have barfed - Let's all phone our ISP and go postal!

Unknown command.

]

Fetching with Net::FTP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca

Fetching with LWP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'cpan.mirror.cygnal.ca']

Fetching with Net::FTP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Fetching with LWP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'mirror.isurf.ca']

Fetching with Net::FTP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

 
Trying with "/usr/bin/lynx -source" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL
ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: cpan.mirror.cygnal.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/03modlist.data.gz

Trying to get away with old file:

3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
/root/.cpan/sources/modules/03modlist.data.gz

Going to read /root/.cpan/sources/modules/03modlist.data.gz

Going to write /root/.cpan/Metadata

can't create /root/.cpan/Metadata: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 3432

Running install for module Test::Harness

Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz

mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 2342

******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From cjfields at uiuc.edu  Thu Nov 29 23:53:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:53:09 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <D344C28E-BC9B-4226-AD15-149EA001FCAB@uiuc.edu>

Alison,

There are directions on how to do this here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA

(TinyURL link)
http://tinyurl.com/3263dd

Note the additional configuration for CPAN in that section; you'll  
need to set up CPAN so it installs everything locally.

chris

On Nov 29, 2007, at 10:20 AM, alison waller wrote:

> Hi all,
>
>
>
> I would like to install the CVS version of bioperl  as I know of  
> some code
> changes that will be useful to me.  However, I am having problems  
> installing
> it.
>
> I am trying to install bioperl in my home directly on a linux cluster.
>
>
>
> I used
>
>
>
>> cd bioperl-live
>
> *       perl Build.PL -install /home/awaller
>
>
>
> However after the build command I got a lot of errors.  Do I have to  
> also
> have perl installed in my home directory??  There is perl installed  
> on the
> cluster in /usr/bin.  Do I need to point to this or does Build.PL
> automatically look there?  I noticed a few errors about not having
> permission and a few about not being able to connect. I've copied a  
> portion
> of the messages after my Build.pl command.
>
>
>
> Any help would be appreciated,
>
>
>
> alison
>
>
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/02packages.details.txt.gz
>
> Trying to get away with old file:
>
> 3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
> /root/.cpan/sources/modules/02packages.details.txt.gz
>
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
>
>  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT
>
>
>
>  There's a new CPAN.pm version (v1.9204) available!
>
>  [Current version is v1.7601]
>
>  You might want to try
>
>    install Bundle::CPAN
>
>    reload cpan
>
>  without quitting the current session. It should be a seamless upgrade
>
>  while we are running...
>
>
>
> Warning: You are not allowed to write into directory
> "/root/.cpan/sources/modules".
>
>    I'll continue, but if you encounter problems, they may be due
>
>    to insufficient permissions.
>
> Fetching with LWP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[Cannot write to
> '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission  
> denied]
>
> Fetching with Net::FTP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from ftp.nrc.ca
>
> Fetching with LWP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[FTP close response: 500 Network  
> seems to
> have barfed - Let's all phone our ISP and go postal!
>
> Unknown command.
>
> ]
>
> Fetching with Net::FTP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca
>
> Fetching with LWP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'cpan.mirror.cygnal.ca']
>
> Fetching with Net::FTP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Fetching with LWP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'mirror.isurf.ca']
>
> Fetching with Net::FTP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: cpan.mirror.cygnal.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz 
> .
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/03modlist.data.gz
>
> Trying to get away with old file:
>
> 3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
> /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to write /root/.cpan/Metadata
>
> can't create /root/.cpan/Metadata: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 3432
>
> Running install for module Test::Harness
>
> Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz
>
> mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 2342
>
> ******************************************
> Alison S. Waller  M.A.Sc.
> Doctoral Candidate
> awaller at chem-eng.utoronto.ca
> 416-978-4222 (lab)
> Department of Chemical Engineering
> Wallberg Building
> 200 College st.
> Toronto, ON
> M5S 3E5
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov 29 23:57:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:57:36 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>

Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
core (I think they were added prior to the 1.5.1 release, but I'm not  
positive).  If possible you should try installing bioperl 1.5.2 or the  
latest code from CVS.

For directions on installing Bioperl for most OS's go here:

http://www.bioperl.org/wiki/Installing_BioPerl

 From CVS:

http://www.bioperl.org/wiki/Using_CVS

chris

On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:

>
> Hi.
>
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
>
> -Paul.
>
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org
>
> -- 
> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 30 03:45:57 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Nov 2007 08:45:57 +0000
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
 from	CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <474FCDC5.5020100@sendu.me.uk>

alison waller wrote:
> I would like to install the CVS version of bioperl  as I know of some code
> changes that will be useful to me.  However, I am having problems installing
> it.  
> 
> I am trying to install bioperl in my home directly on a linux cluster.  
[...]
> Please check, if the URLs I found in your configuration file
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are
> valid. The urllist can be edited. E.g. with 'o conf urllist push
> ftp://myurl/'

Either these urls are invalid as suggested (try setting the urllist to 
nothing), or your linux cluster doesn't have internet access. You can't 
do a 'proper' install of BioPerl and its dependencies without internet 
access.

However, for most purposes simply downloading the BioPerl modules (ie. 
from a different machine with internet access) and pointing your 
PERL5LIB to their location is sufficient. You can download CVS modules 
from the BioPerl website individually, or as a tarball or everything.


From MEC at stowers-institute.org  Fri Nov 30 09:12:09 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 30 Nov 2007 08:12:09 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>

How many, how often?

Use ensembl biomart!

First time interactively.

Then if you to pipeline it, take the perl code it generates for you and
run it - of course you'll have to install the Ensembl Perl API....


Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Paul N. Hengen
> Sent: Wednesday, November 28, 2007 7:21 PM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
> 
> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find 
> the start and end locations in the human genome. This seemed 
> simple enough, so I started working through some of the 
> examples for using the EntrezGene module at www.bioperl.org  
> Of course this did not work because the core installation 
> does not include this module. So, I think I have two choices 
> (1) install the module (how?), or (2) find an easier way to 
> get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research City of Hope 
> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 
> USA mailto:paulhengen at coh.org
> 
> --
> View this message in context: 
> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
> ntrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bosborne11 at verizon.net  Fri Nov 30 09:38:58 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 30 Nov 2007 09:38:58 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
Message-ID: <C3758AB2.10609%bosborne11@verizon.net>

Paul,

Have you taken a look at this page?

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

There's code there that looks similar to what you're proposing.


Brian O.


On 11/28/07 8:20 PM, "Paul N. Hengen" <paulhengen at coh.org> wrote:

> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org


From cjfields at uiuc.edu  Fri Nov 30 10:47:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 09:47:32 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47502C75.60809@bms.com>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>

My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
Mingyi Liu if he would like to include this parser with BioPerl (since  
it requires it, makes sense to me, and it avoids the circular  
dependency that has plagued these modules).

chris

On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:

> Chris Fields wrote:
> Chris,
> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
> low-level parser and is not part of bioperl. There is a circular
> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
> Paul, you can get it from CPAN and this should make
> Bio::SeqIO::entrezgene functional for you.
> Stefan
>
>
>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>> core (I think they were added prior to the 1.5.1 release, but I'm not
>> positive).  If possible you should try installing bioperl 1.5.2 or  
>> the
>> latest code from CVS.
>>
>> For directions on installing Bioperl for most OS's go here:
>>
>> http://www.bioperl.org/wiki/Installing_BioPerl
>>
>> From CVS:
>>
>> http://www.bioperl.org/wiki/Using_CVS
>>
>> chris
>>
>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>
>>
>>> Hi.
>>>
>>> I have a number of gene IDs from Entrez and I want to find the
>>> start and end locations in the human genome. This seemed simple
>>> enough, so I started working through some of the examples for
>>> using the EntrezGene module at www.bioperl.org  Of course this
>>> did not work because the core installation does not include this
>>> module. So, I think I have two choices (1) install the module  
>>> (how?),
>>> or (2) find an easier way to get the locations in the human genome.
>>> I want to use the locations to grab sequences out of the genome.
>>> Can anyone offer advice on this? Thanks.
>>>
>>> -Paul.
>>>
>>> --
>>> Paul N. Hengen, Ph.D.
>>> Hematopoietic Stem Cell and Leukemia Research
>>> City of Hope National Medical Center
>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>> mailto:paulhengen at coh.org
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Fri Nov 30 11:12:22 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 11:12:22 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
Message-ID: <47503666.8090004@bms.com>

Chris Fields wrote:
> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
> Mingyi Liu if he would like to include this parser with BioPerl (since  
> it requires it, makes sense to me, and it avoids the circular  
> dependency that has plagued these modules).
>   
Yes, I think this would be a good step.
Stefan
> chris
>
> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>
>   
>> Chris Fields wrote:
>> Chris,
>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>> low-level parser and is not part of bioperl. There is a circular
>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>> Paul, you can get it from CPAN and this should make
>> Bio::SeqIO::entrezgene functional for you.
>> Stefan
>>
>>
>>     
>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>> the
>>> latest code from CVS.
>>>
>>> For directions on installing Bioperl for most OS's go here:
>>>
>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>
>>> From CVS:
>>>
>>> http://www.bioperl.org/wiki/Using_CVS
>>>
>>> chris
>>>
>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>
>>>
>>>       
>>>> Hi.
>>>>
>>>> I have a number of gene IDs from Entrez and I want to find the
>>>> start and end locations in the human genome. This seemed simple
>>>> enough, so I started working through some of the examples for
>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>> did not work because the core installation does not include this
>>>> module. So, I think I have two choices (1) install the module  
>>>> (how?),
>>>> or (2) find an easier way to get the locations in the human genome.
>>>> I want to use the locations to grab sequences out of the genome.
>>>> Can anyone offer advice on this? Thanks.
>>>>
>>>> -Paul.
>>>>
>>>> --
>>>> Paul N. Hengen, Ph.D.
>>>> Hematopoietic Stem Cell and Leukemia Research
>>>> City of Hope National Medical Center
>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>> mailto:paulhengen at coh.org
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From stefan.kirov at bms.com  Fri Nov 30 10:29:57 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 10:29:57 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
Message-ID: <47502C75.60809@bms.com>

Chris Fields wrote:
Chris,
Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
low-level parser and is not part of bioperl. There is a circular
dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
Paul, you can get it from CPAN and this should make
Bio::SeqIO::entrezgene functional for you.
Stefan


> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
> core (I think they were added prior to the 1.5.1 release, but I'm not  
> positive).  If possible you should try installing bioperl 1.5.2 or the  
> latest code from CVS.
>
> For directions on installing Bioperl for most OS's go here:
>
> http://www.bioperl.org/wiki/Installing_BioPerl
>
>  From CVS:
>
> http://www.bioperl.org/wiki/Using_CVS
>
> chris
>
> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>
>   
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find the
>> start and end locations in the human genome. This seemed simple
>> enough, so I started working through some of the examples for
>> using the EntrezGene module at www.bioperl.org  Of course this
>> did not work because the core installation does not include this
>> module. So, I think I have two choices (1) install the module (how?),
>> or (2) find an easier way to get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research
>> City of Hope National Medical Center
>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>> mailto:paulhengen at coh.org
>>
>> -- 
>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From arareko at campus.iztacala.unam.mx  Fri Nov 30 12:01:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 30 Nov 2007 11:01:29 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47503666.8090004@bms.com>
References: <14017289.post@talk.nabble.com>	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>	<47502C75.60809@bms.com>	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
	<47503666.8090004@bms.com>
Message-ID: <475041E9.8050909@campus.iztacala.unam.mx>

I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the 
past, he mentioned he doesn't track the list closely).

Mauricio.

Stefan Kirov wrote:
> Chris Fields wrote:
>> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
>> Mingyi Liu if he would like to include this parser with BioPerl (since  
>> it requires it, makes sense to me, and it avoids the circular  
>> dependency that has plagued these modules).
>>   
> Yes, I think this would be a good step.
> Stefan
>> chris
>>
>> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>>
>>   
>>> Chris Fields wrote:
>>> Chris,
>>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>>> low-level parser and is not part of bioperl. There is a circular
>>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>>> Paul, you can get it from CPAN and this should make
>>> Bio::SeqIO::entrezgene functional for you.
>>> Stefan
>>>
>>>
>>>     
>>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>>> the
>>>> latest code from CVS.
>>>>
>>>> For directions on installing Bioperl for most OS's go here:
>>>>
>>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>>
>>>> From CVS:
>>>>
>>>> http://www.bioperl.org/wiki/Using_CVS
>>>>
>>>> chris
>>>>
>>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>>
>>>>
>>>>       
>>>>> Hi.
>>>>>
>>>>> I have a number of gene IDs from Entrez and I want to find the
>>>>> start and end locations in the human genome. This seemed simple
>>>>> enough, so I started working through some of the examples for
>>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>>> did not work because the core installation does not include this
>>>>> module. So, I think I have two choices (1) install the module  
>>>>> (how?),
>>>>> or (2) find an easier way to get the locations in the human genome.
>>>>> I want to use the locations to grab sequences out of the genome.
>>>>> Can anyone offer advice on this? Thanks.
>>>>>
>>>>> -Paul.
>>>>>
>>>>> --
>>>>> Paul N. Hengen, Ph.D.
>>>>> Hematopoietic Stem Cell and Leukemia Research
>>>>> City of Hope National Medical Center
>>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>>> mailto:paulhengen at coh.org
>>>>>
>>>>> -- 
>>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>         
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Fri Nov 30 15:21:13 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Nov 2007 12:21:13 -0800
Subject: [Bioperl-l] Trying to find multiple homologs in multiple
	databases
In-Reply-To: <193573097@newdonner.Dartmouth.EDU>
References: <193573097@newdonner.Dartmouth.EDU>
Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org>

Viktor -
Bio::SearchIO helps you parse BLAST reports, but don't underestimate  
the power of going as low-tech as possible and outputting scores with  
the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular  
format that is parseable with the 'split' function in Perl.

See the wiki http://bioperl.org/wiki for HOWTOs and examples of using  
the parsers.

You might also consider already-written tools like OrthoMCL,  
InParanoid, and other that help you define relationships like   
orthologs and paralogs among species. There also exist a few  
published web resources that have pre-computed homologs for you,  
might take a look around first unless the point of the project is to  
learn how to run these kinds of searches.

For general Perl help consider Perlmonks.org and some of  the  
introductory books that are available.
-jason
--
Jason Stajich
jason at bioperl.org

On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote:

> Hello,
>
> My name is Viktor Martyanov and I am a Ph.D. student in biology at  
> Dartmouth.
>
> I need to be able to use a set of genes or FASTA sequences from S.  
> cerevisiae and retrieve a set of corresponding homologs from other  
> fungal species via BLASTP searches.
>
> I would like to find out if there are Perl scripts that approach  
> this problem. By the way, is there a Perl community or forum where  
> I could post this question?
>
> Thanks very much.  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri Nov 30 17:03:23 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 30 Nov 2007 15:03:23 -0700
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
References: <14017289.post@talk.nabble.com>
	<CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
Message-ID: <B839F4C3-C225-40B2-B7B0-C2940A35B964@genetics.utah.edu>

Paul,

One other alternative is to use the UCSC table browser (http:// 
genome.ucsc.edu/cgi-bin/hgTables?command=start).  Select your  
organism, upload your ID list.  Select you output options.  You can  
download the coordinates or the fasta directly.  You have options for  
including or excluding various parts of the gene, and upstream/ 
downstream sequences.  This is similar the solution that Malcom  
suggested except the Ensembl option can be run repeatedly as perl  
code as he pointed out.  UCSC allows you to do remote connections to  
their MySQL server so you could set up a repeated task and more  
complex queries that way with the UCSC method.

Barry

On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote:

> How many, how often?
>
> Use ensembl biomart!
>
> First time interactively.
>
> Then if you to pipeline it, take the perl code it generates for you  
> and
> run it - of course you'll have to install the Ensembl Perl API....
>
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Paul N. Hengen
>> Sent: Wednesday, November 28, 2007 7:21 PM
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez  
>> IDs
>>
>>
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find
>> the start and end locations in the human genome. This seemed
>> simple enough, so I started working through some of the
>> examples for using the EntrezGene module at www.bioperl.org
>> Of course this did not work because the core installation
>> does not include this module. So, I think I have two choices
>> (1) install the module (how?), or (2) find an easier way to
>> get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research City of Hope
>> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010
>> USA mailto:paulhengen at coh.org
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
>> ntrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Nov 30 23:37:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 22:37:50 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from	CVS)
In-Reply-To: <000901c833bf$33d53500$0a02a8c0@AWALL>
References: <000901c833bf$33d53500$0a02a8c0@AWALL>
Message-ID: <75FF7E93-1633-4D43-9BC0-8BE2A6A7711D@uiuc.edu>

Make sure to keep this on the list.

ncbi_gi() is only in bioperl-live (CVS); my guess is you either  
somehow got 1.5.2 instead or the bioperl-live version is not found in  
your path.  It's very likely the latter, as perl's looking for  
whatever else is present (which appears to be an older version of  
bioperl). That should give you a hint that the problem may be with  
your lib path.  Try changing the 'Use lib '/home/awaller/bioperl-live/ 
Bio'' to:

use lib '/home/awaller/bioperl-live';

chris

On Nov 30, 2007, at 8:09 PM, alison waller wrote:

> Okay so Now I'm really confused.
> I edited > #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live/Bio.
> I ran the script below with the *special hit->ncbi from Chris.  It  
> worked,
> it was great, I got the gi! No errors, no bugs that I saw in  
> checking the
> output.  Then I went back in, edited the script to retrieve further  
> info
> (specifically the strand).  Saved it, now when I try to run it I get  
> the
> same error message that I was previously getting.
>
> barrett ~ $ perl blast_parse_awcf.pl OldMoBlastxGiTest.txt 1
> Can't locate object method "ncbi_gi" via package
> "Bio::Search::Hit::BlastHit" at blast_parse_awcf.pl line 50, <GEN1>  
> line
> 189.
>
> Thanks soo much,
>
>
> #!usr/bin/perl
>
> use strict;
> use warnings;
> use lib "/home/awaller/bioperl-live/Bio";
> use Bio::Perl;
> use Bio::SearchIO;
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits per
> query> \n"; if (@ARGV != 2) { die $usage; }
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                      # to report for each query
>
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
> my $report = Bio::SearchIO->new(
>  -file   => $infile,
>  -format => "blast"
> );
>
> print OUT join("\t",qw(
>              Query
>              HitDesc
>              HitAccess
>              HitGi
> 		HitBits
>              Evalue
>              %id
>              AlignLen
>              NumIdent
>              NumPos
>              gaps
>              Qframe
>              Qstrand
>              Hframe
> 		Hstrand))."\n";
>
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>  my $ct = 0;
>  my @tophits = grep {$ct++ < $tophit } $result->hits;
>  if (scalar(@tophits) == 0) {
>     print OUT "no hits\n";
>  }
>  for my $hit (@tophits) {
>     my $tophsp=$hit->hsp('best');
>     # Print some tab-delimited data about this hit
>     print OUT join("\t",
>                    $result->query_name,
>                    $hit->description,
>                    $hit->accession,
>                    $hit->ncbi_gi,
>                    $hit->bits,
>                    $tophsp->evalue,
>                    $tophsp->percent_identity,
>                    $tophsp->length('total'),
>                    $tophsp->num_identical,
>                    $tophsp->num_conserved,
>                    $tophsp->gaps,
>                    $tophsp->query->frame,
> 		      $tophsp->strand('query'),	
>                    $tophsp->hit->frame,
> 		      $tophsp->strand('hit'),	
>                   )."\n";
>  }
> }
>
>
>
>
> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Friday, November 30, 2007 6:24 PM
> To: alison waller
> Subject: Re: [Bioperl-l] Problems installing bioperl (bioperl-live  
> tarball
> from CVS)
>
> alison waller wrote:
>> Thank you Sendu,
>>
>> So I'm trying the second option.  I have downloaded the bioperl-live
> tarball
>> from the CVS on my windows laptop, and then moved it to my home  
>> directory
> in
>> the linux cluster where I unzipped and tared it.  So I now have a
> directory
>> /home/awaller/bioperl-live.
>>
>> I edited my .bashrc file as below:
>> Export PERL5LIB='/home/awaller/bioperl-live'
>>
>> I also edited a sample script to include:
>> #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live'
>
> Does this directory contain a 'Bio' directory with all the BioPerl
> modules inside it?
>
>
>> But it still isn't working.
>> At the prompt I typed$ perl script.pl
>> It gave me the warning - can't locate object method ncbi_gi which  
>> is why
> I'm
>> trying to download the CVS version as Chris Fields added code to  
>> make the
>> ncbi-gi object.
>
> You'll have to give me the complete, unedited error message and  
> ideally
> the script itself before I can help you further.
>
>
>> Don't I have to do something similar to what the Build.PL file does?
>
> Probably not. It doesn't matter where your perl executable is, btw, as
> long as the system knows how to run perl, which it obviously does.
> <OldMoBlastxGiTest.txt.parsed><OldMoBlastxGiTest.txt>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From torsten.seemann at infotech.monash.edu.au  Thu Nov  1 01:27:29 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 1 Nov 2007 12:27:29 +1100
Subject: [Bioperl-l] BLAST output parsing
In-Reply-To: <13519112.post@talk.nabble.com>
References: <13519112.post@talk.nabble.com>
Message-ID: <a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>

Swapna,

> I am new to bioperl.  I did BLAST search of ~4000 genes and I need to parse
> it.  I did use -m 9 option to get a tabular information of the blast data.
> But it does not include the gene names or the names of the organisms of each
> hit.  Are there any parsers that can do this job ??

The -m 9 tabular output does not include gene descriptions and
organisms. It only includes the "gene id" that was present immediately
after the ">" sign in the FASTA file that was used to create the BLAST
database you specified with the -d option when you ran BLAST.

Hence, no parser will help you. You either have to re-do the BLAST
with a different -m value that includes the information you desire, or
write code to convert your gene IDs into what you want.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From swapnatbhat at gmail.com  Thu Nov  1 03:49:45 2007
From: swapnatbhat at gmail.com (swapna26)
Date: Wed, 31 Oct 2007 20:49:45 -0700 (PDT)
Subject: [Bioperl-l] BLAST output parsing
In-Reply-To: <a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
References: <13519112.post@talk.nabble.com>
	<a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
Message-ID: <13523150.post@talk.nabble.com>


which -m option do you think will be helpful.  

swapna

Torsten Seemann wrote:
> 
> Swapna,
> 
>> I am new to bioperl.  I did BLAST search of ~4000 genes and I need to
>> parse
>> it.  I did use -m 9 option to get a tabular information of the blast
>> data.
>> But it does not include the gene names or the names of the organisms of
>> each
>> hit.  Are there any parsers that can do this job ??
> 
> The -m 9 tabular output does not include gene descriptions and
> organisms. It only includes the "gene id" that was present immediately
> after the ">" sign in the FASTA file that was used to create the BLAST
> database you specified with the -d option when you ran BLAST.
> 
> Hence, no parser will help you. You either have to re-do the BLAST
> with a different -m value that includes the information you desire, or
> write code to convert your gene IDs into what you want.
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/BLAST-output-parsing-tf4728082.html#a13523150
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From barry.moore at genetics.utah.edu  Thu Nov  1 04:03:01 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 31 Oct 2007 22:03:01 -0600
Subject: [Bioperl-l] BLAST output parsing
In-Reply-To: <a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
References: <13519112.post@talk.nabble.com>
	<a79f6a4b0710311827m22365e5et8cfd2c9cca47f1ce@mail.gmail.com>
Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu>

Swapna-

If you are using NCBI fasta files you can use files from NCBIs gene  
database to map your gene IDs to names and organisms.  Look in  
particular at the files gene2accession, gene2refseq, and gene_info.   
For example, if you had RefSeq protein IDs like NP_123456, you could  
use gene2refseq to map those RefSeq accessions to gene IDs and then  
gene_info to map the gene IDs to organisms and gene name.

B

On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote:

> Swapna,
>
>> I am new to bioperl.  I did BLAST search of ~4000 genes and I need  
>> to parse
>> it.  I did use -m 9 option to get a tabular information of the  
>> blast data.
>> But it does not include the gene names or the names of the  
>> organisms of each
>> hit.  Are there any parsers that can do this job ??
>
> The -m 9 tabular output does not include gene descriptions and
> organisms. It only includes the "gene id" that was present immediately
> after the ">" sign in the FASTA file that was used to create the BLAST
> database you specified with the -d option when you ran BLAST.
>
> Hence, no parser will help you. You either have to re-do the BLAST
> with a different -m value that includes the information you desire, or
> write code to convert your gene IDs into what you want.
>
> --
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 09:45:43 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 10:45:43 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on
	windows
Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de>

Dear all,

I have emboss installed on a windows machine. (Embosswin). I can run
this from the dos command line and the path is present. However, when I 
try to call
an emboss application from bioperl I get a "Application not found error"


  my $f = Bio::Factory::EMBOSS->new();
  # get an EMBOSS application  object from the factory
  my $fuzznuc = $f->program('fuzznuc');
    $fuzznuc->run(
                  { -sequence  => $infile,
                        -pattern   => $motif,
                       -outfile   => $outfile                       
              });
 gives the following error

-------------------- WARNING ---------------------
MSG: Application [fuzznuc] is not available!
---------------------------------------------------
Can't call method "run" on an undefined value at searchPatterns.pl line 
102.

Can somebody help me fix this ?

best regards
Rohit

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 14:22:14 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:22:14 -0400
Subject: [Bioperl-l] PAML/Codeml parsing
Message-ID: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>

PAML4 breaks our PAML parser right now because the order of things in  
the result file has changed.  Now sequences precede the information  
about the version or the program run.  This means that $result- 
 >get_seqs() fails because we don't parse the sequences.

We'll see what we can do, but as usual with supporting 3rd party  
programs it is brittle when file formats change.  Th

-jason

--
Jason Stajich
jason at bioperl.org


From jason at bioperl.org  Thu Nov  1 14:32:06 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 10:32:06 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>

Presumably the PATH is not getting set properly - you should play  
around printing the $ENV{PATH} variable in a perl script to see if  
actually contains the directory where the emboss programs are  
installed.  Bioperl can only guess so much as to where to find an  
application.  It is also possible that we aren't creating the proper  
path to the executable - you can print the executable path with
print $fuzznuc->executable
I believe unless it is throwing an error at the program() line.

It looks like the code in the Factory object is a little fragile  
assuming that the programs HAVE to be in your $PATH.  I don't know if  
windows+perl is special in any way that it run things so I can't  
really tell if there is specific things you have to do here. You may  
have to run this through cygwin in case PATH and such are just not  
available properly to windowsPerl.

-jason
On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:

> Dear all,
>
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. However,  
> when I
> try to call
> an emboss application from bioperl I get a "Application not found  
> error"
>
>
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile
>               });
>  gives the following error
>
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at searchPatterns.pl  
> line
> 102.
>
> Can somebody help me fix this ?
>
> best regards
> Rohit
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Thu Nov  1 14:54:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 09:54:09 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	on windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu>

This worked for me previously when I tested with WinXP on my old  
machine using EMBOSS v5:

ftp://emboss.open-bio.org/pub/EMBOSS/windows

I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably  
better to use the latest EMBOSS version anyway so I suggest trying  
the version in the above link.  I'll test it again today and let you  
know what I find.

chris

On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote:

> Presumably the PATH is not getting set properly - you should play
> around printing the $ENV{PATH} variable in a perl script to see if
> actually contains the directory where the emboss programs are
> installed.  Bioperl can only guess so much as to where to find an
> application.  It is also possible that we aren't creating the proper
> path to the executable - you can print the executable path with
> print $fuzznuc->executable
> I believe unless it is throwing an error at the program() line.
>
> It looks like the code in the Factory object is a little fragile
> assuming that the programs HAVE to be in your $PATH.  I don't know if
> windows+perl is special in any way that it run things so I can't
> really tell if there is specific things you have to do here. You may
> have to run this through cygwin in case PATH and such are just not
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However,
>> when I
>> try to call
>> an emboss application from bioperl I get a "Application not found
>> error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl
>> line
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Thu Nov  1 15:31:40 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 11:31:40 -0400
Subject: [Bioperl-l] PAML3 vs 4
Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org>

Small tweaks were needed to parse PAML4 results.

Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly  
now on both PAML 3 and 4.
You'll need to get the latest code from CVS in order to see the  
changes to Bio/Tools/Phylo/PAML.pm

I've added tests for PAML4 in the parser and the run code.

If you have scripts that use codeml please give it a try.  I have not  
attempted to play with BASEML or AAML results at this point so if you  
also have codes that use those programs, please try it out and  
provide bugreports if we need to fix things.

-jason

--
Jason Stajich
jason at bioperl.org


From Kevin.M.Brown at asu.edu  Thu Nov  1 17:25:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 1 Nov 2007 10:25:30 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl
	onwindows
In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu>

Sounds like a path issue.  Try to tell bioperl the full path to the
executable rather than just the executable name. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 2:46 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl: cannot run emboss programs 
> using bioperl onwindows
> 
> Dear all,
> 
> I have emboss installed on a windows machine. (Embosswin). I can run
> this from the dos command line and the path is present. 
> However, when I 
> try to call
> an emboss application from bioperl I get a "Application not 
> found error"
> 
> 
>   my $f = Bio::Factory::EMBOSS->new();
>   # get an EMBOSS application  object from the factory
>   my $fuzznuc = $f->program('fuzznuc');
>     $fuzznuc->run(
>                   { -sequence  => $infile,
>                         -pattern   => $motif,
>                        -outfile   => $outfile                       
>               });
>  gives the following error
> 
> -------------------- WARNING ---------------------
> MSG: Application [fuzznuc] is not available!
> ---------------------------------------------------
> Can't call method "run" on an undefined value at 
> searchPatterns.pl line 
> 102.
> 
> Can somebody help me fix this ?
> 
> best regards
> Rohit
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 18:06:48 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:06:48 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de>


Thanks for all the suggestions... but I unfortunately still cannot run 
emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
and the
path is set correctly. I printed $ENV{$PATH} and this contains 
C:\EMBOSSwin which is the correct location.
I also tried setting the path directly but I'm not sure how to do this, 
so I tried this...

my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');

this also did not work.

Also tried printing...
$fuzznuc->executable()

gave the following error again
-------------------- WARNING ---------------------
MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
---------------------------------------------------

Any more ideas ?

thanks !
Rohit


here's the code...

use strict;
use Bio::Factory::EMBOSS;
use Data::Dumper;

#
# print "PATH=$ENV{PATH}\n";
# path contains C:\EMBOSSwin which is the correct location
# embossversion is 2.10.0-Win-0.8

 my $f = Bio::Factory::EMBOSS->new();
 # get an EMBOSS application  object from the factory
 print Dumper ($f);
 my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
as well,
 print Dump ($fuzznuc);
 
 #dump of fuzznuc
 #$VAR1 = bless( {
 #                '_programgroup' => {},
 #                '_programs' => {},
 #                '_groups' => {}
 #              }, 'Bio::Factory::EMBOSS' );
 
 #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
 
 my $infile = "temp.fasta";
 my $motif  = "ATGTCGATC";
 my $outfile = "test.out";

 
 $fuzznuc->run(
                  { -sequence  => $infile,
                    -pattern   => $motif,
                    -outfile   => $outfile                      
              });
    
Here's the error again....

#-------------------- WARNING ---------------------
#MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
#---------------------------------------------------


Jason Stajich wrote:
> Presumably the PATH is not getting set properly - you should play 
> around printing the $ENV{PATH} variable in a perl script to see if 
> actually contains the directory where the emboss programs are 
> installed.  Bioperl can only guess so much as to where to find an 
> application.  It is also possible that we aren't creating the proper 
> path to the executable - you can print the executable path with 
> print $fuzznuc->executable 
> I believe unless it is throwing an error at the program() line.  
>
> It looks like the code in the Factory object is a little fragile 
> assuming that the programs HAVE to be in your $PATH.  I don't know if 
> windows+perl is special in any way that it run things so I can't 
> really tell if there is specific things you have to do here. You may 
> have to run this through cygwin in case PATH and such are just not 
> available properly to windowsPerl.
>
> -jason
> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>
>> Dear all,
>>
>> I have emboss installed on a windows machine. (Embosswin). I can run
>> this from the dos command line and the path is present. However, when I 
>> try to call
>> an emboss application from bioperl I get a "Application not found error"
>>
>>
>>   my $f = Bio::Factory::EMBOSS->new();
>>   # get an EMBOSS application  object from the factory
>>   my $fuzznuc = $f->program('fuzznuc');
>>     $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                         -pattern   => $motif,
>>                        -outfile   => $outfile                       
>>               });
>>  gives the following error
>>
>> -------------------- WARNING ---------------------
>> MSG: Application [fuzznuc] is not available!
>> ---------------------------------------------------
>> Can't call method "run" on an undefined value at searchPatterns.pl line 
>> 102.
>>
>> Can somebody help me fix this ?
>>
>> best regards
>> Rohit
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From jason at bioperl.org  Thu Nov  1 18:37:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Nov 2007 14:37:24 -0400
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>

You could try this - can't test it though so not sure.
my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\fuzznuc');

-jason
On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:

>
>
> Thanks for all the suggestions... but I unfortunately still cannot run
> emboss. I am running the latest version of embosswin  (2.10.0- 
> Win-0.8),
> and the
> path is set correctly. I printed $ENV{$PATH} and this contains
> C:\EMBOSSwin which is the correct location.
> I also tried setting the path directly but I'm not sure how to do  
> this,
> so I tried this...
>
> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>
> this also did not work.
>
> Also tried printing...
> $fuzznuc->executable()
>
> gave the following error again
> -------------------- WARNING ---------------------
> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> ---------------------------------------------------
>
> Any more ideas ?
>
> thanks !
> Rohit
>
>
> here's the code...
>
> use strict;
> use Bio::Factory::EMBOSS;
> use Data::Dumper;
>
> #
> # print "PATH=$ENV{PATH}\n";
> # path contains C:\EMBOSSwin which is the correct location
> # embossversion is 2.10.0-Win-0.8
>
>  my $f = Bio::Factory::EMBOSS->new();
>  # get an EMBOSS application  object from the factory
>  print Dumper ($f);
>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
> fuzznuc.exe
> as well,
>  print Dump ($fuzznuc);
>
>  #dump of fuzznuc
>  #$VAR1 = bless( {
>  #                '_programgroup' => {},
>  #                '_programs' => {},
>  #                '_groups' => {}
>  #              }, 'Bio::Factory::EMBOSS' );
>
>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>
>  my $infile = "temp.fasta";
>  my $motif  = "ATGTCGATC";
>  my $outfile = "test.out";
>
>
>  $fuzznuc->run(
>                   { -sequence  => $infile,
>                     -pattern   => $motif,
>                     -outfile   => $outfile
>               });
>
> Here's the error again....
>
> #-------------------- WARNING ---------------------
> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> #---------------------------------------------------
>
>
>
>
> Jason Stajich wrote:
>> Presumably the PATH is not getting set properly - you should play
>> around printing the $ENV{PATH} variable in a perl script to see if
>> actually contains the directory where the emboss programs are
>> installed.  Bioperl can only guess so much as to where to find an
>> application.  It is also possible that we aren't creating the proper
>> path to the executable - you can print the executable path with
>> print $fuzznuc->executable
>> I believe unless it is throwing an error at the program() line.
>>
>> It looks like the code in the Factory object is a little fragile
>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>> windows+perl is special in any way that it run things so I can't
>> really tell if there is specific things you have to do here. You may
>> have to run this through cygwin in case PATH and such are just not
>> available properly to windowsPerl.
>>
>> -jason
>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>
>>> Dear all,
>>>
>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>> this from the dos command line and the path is present. However,  
>>> when I
>>> try to call
>>> an emboss application from bioperl I get a "Application not found  
>>> error"
>>>
>>>
>>>   my $f = Bio::Factory::EMBOSS->new();
>>>   # get an EMBOSS application  object from the factory
>>>   my $fuzznuc = $f->program('fuzznuc');
>>>     $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                         -pattern   => $motif,
>>>                        -outfile   => $outfile
>>>               });
>>>  gives the following error
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Application [fuzznuc] is not available!
>>> ---------------------------------------------------
>>> Can't call method "run" on an undefined value at  
>>> searchPatterns.pl line
>>> 102.
>>>
>>> Can somebody help me fix this ?
>>>
>>> best regards
>>> Rohit
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Thu Nov  1 18:41:41 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Thu, 01 Nov 2007 19:41:41 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de>

Hi Jason

I tried this as well. This also gives the same error message.

-Rohit

Jason Stajich wrote:
> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>
>>
>>
>> Thanks for all the suggestions... but I unfortunately still cannot run 
>> emboss. I am running the latest version of embosswin  (2.10.0-Win-0.8), 
>> and the
>> path is set correctly. I printed $ENV{$PATH} and this contains 
>> C:\EMBOSSwin which is the correct location.
>> I also tried setting the path directly but I'm not sure how to do this, 
>> so I tried this...
>>
>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>
>> this also did not work.
>>
>> Also tried printing...
>> $fuzznuc->executable()
>>
>> gave the following error again
>> -------------------- WARNING ---------------------
>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> ---------------------------------------------------
>>
>> Any more ideas ?
>>
>> thanks !
>> Rohit
>>
>>
>> here's the code...
>>
>> use strict;
>> use Bio::Factory::EMBOSS;
>> use Data::Dumper;
>>
>> #
>> # print "PATH=$ENV{PATH}\n";
>> # path contains C:\EMBOSSwin which is the correct location
>> # embossversion is 2.10.0-Win-0.8
>>
>>  my $f = Bio::Factory::EMBOSS->new();
>>  # get an EMBOSS application  object from the factory
>>  print Dumper ($f);
>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,
>>  print Dump ($fuzznuc);
>>
>>  #dump of fuzznuc
>>  #$VAR1 = bless( {
>>  #                '_programgroup' => {},
>>  #                '_programs' => {},
>>  #                '_groups' => {}
>>  #              }, 'Bio::Factory::EMBOSS' );
>>
>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>
>>  my $infile = "temp.fasta";
>>  my $motif  = "ATGTCGATC";
>>  my $outfile = "test.out";
>>
>>
>>  $fuzznuc->run(
>>                   { -sequence  => $infile,
>>                     -pattern   => $motif,
>>                     -outfile   => $outfile                      
>>               });
>>
>> Here's the error again....
>>
>> #-------------------- WARNING ---------------------
>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>> #---------------------------------------------------
>>
>>
>>
>>
>> Jason Stajich wrote:
>>> Presumably the PATH is not getting set properly - you should play 
>>> around printing the $ENV{PATH} variable in a perl script to see if 
>>> actually contains the directory where the emboss programs are 
>>> installed.  Bioperl can only guess so much as to where to find an 
>>> application.  It is also possible that we aren't creating the proper 
>>> path to the executable - you can print the executable path with 
>>> print $fuzznuc->executable 
>>> I believe unless it is throwing an error at the program() line.  
>>>
>>> It looks like the code in the Factory object is a little fragile 
>>> assuming that the programs HAVE to be in your $PATH.  I don't know if 
>>> windows+perl is special in any way that it run things so I can't 
>>> really tell if there is specific things you have to do here. You may 
>>> have to run this through cygwin in case PATH and such are just not 
>>> available properly to windowsPerl.
>>>
>>> -jason
>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>> this from the dos command line and the path is present. However, 
>>>> when I 
>>>> try to call
>>>> an emboss application from bioperl I get a "Application not found 
>>>> error"
>>>>
>>>>
>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>   # get an EMBOSS application  object from the factory
>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>     $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                         -pattern   => $motif,
>>>>                        -outfile   => $outfile                       
>>>>               });
>>>>  gives the following error
>>>>
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>> Can't call method "run" on an undefined value at searchPatterns.pl 
>>>> line 
>>>> 102.
>>>>
>>>> Can somebody help me fix this ?
>>>>
>>>> best regards
>>>> Rohit
>>>>
>>>> -- 
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de> 
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  : 0049 (0)641-9946413
>> Fax  : 0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de 
>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
>

-- 

Dr. Rohit Ghai
Institute of Medical Microbiology
Faculty of Medicine
Justus-Liebig University
Frankfurter Strasse 107
35392 - Giessen
GERMANY

Tel  :	0049 (0)641-9946413
Fax  :	0049 (0)641-9946409
Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de


From MEC at stowers-institute.org  Thu Nov  1 18:57:33 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 1 Nov 2007 13:57:33 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs
	usingbioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>


in the code
http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 

there is a call to `wossname` (c.f.
http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
)

is wossname in your path?

Maybe it needs to be wossname.exe under windows?


Malcolm Cook
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
> Sent: Thursday, November 01, 2007 1:42 PM
> To: Jason Stajich
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
> usingbioperlonwindows
> 
> Hi Jason
> 
> I tried this as well. This also gives the same error message.
> 
> -Rohit
> 
> Jason Stajich wrote:
> > You could try this - can't test it though so not sure.
> > my $fuzznuc = $f->program('fuzznuc');
> > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> >
> > -jason
> > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> >
> >>
> >>
> >> Thanks for all the suggestions... but I unfortunately still cannot 
> >> run emboss. I am running the latest version of embosswin  
> >> (2.10.0-Win-0.8), and the path is set correctly. I printed 
> >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
> >> location.
> >> I also tried setting the path directly but I'm not sure how to do 
> >> this, so I tried this...
> >>
> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >>
> >> this also did not work.
> >>
> >> Also tried printing...
> >> $fuzznuc->executable()
> >>
> >> gave the following error again
> >> -------------------- WARNING ---------------------
> >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> ---------------------------------------------------
> >>
> >> Any more ideas ?
> >>
> >> thanks !
> >> Rohit
> >>
> >>
> >> here's the code...
> >>
> >> use strict;
> >> use Bio::Factory::EMBOSS;
> >> use Data::Dumper;
> >>
> >> #
> >> # print "PATH=$ENV{PATH}\n";
> >> # path contains C:\EMBOSSwin which is the correct location # 
> >> embossversion is 2.10.0-Win-0.8
> >>
> >>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
> application  
> >> object from the factory  print Dumper ($f);  my $fuzznuc = 
> >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
> as well,  
> >> print Dump ($fuzznuc);
> >>
> >>  #dump of fuzznuc
> >>  #$VAR1 = bless( {
> >>  #                '_programgroup' => {},
> >>  #                '_programs' => {},
> >>  #                '_groups' => {}
> >>  #              }, 'Bio::Factory::EMBOSS' );
> >>
> >>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
> doesn't work
> >>
> >>  my $infile = "temp.fasta";
> >>  my $motif  = "ATGTCGATC";
> >>  my $outfile = "test.out";
> >>
> >>
> >>  $fuzznuc->run(
> >>                   { -sequence  => $infile,
> >>                     -pattern   => $motif,
> >>                     -outfile   => $outfile                      
> >>               });
> >>
> >> Here's the error again....
> >>
> >> #-------------------- WARNING ---------------------
> >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> >> #---------------------------------------------------
> >>
> >>
> >>
> >>
> >> Jason Stajich wrote:
> >>> Presumably the PATH is not getting set properly - you should play 
> >>> around printing the $ENV{PATH} variable in a perl script 
> to see if 
> >>> actually contains the directory where the emboss programs are 
> >>> installed.  Bioperl can only guess so much as to where to find an 
> >>> application.  It is also possible that we aren't creating 
> the proper 
> >>> path to the executable - you can print the executable path with 
> >>> print $fuzznuc->executable I believe unless it is 
> throwing an error 
> >>> at the program() line.
> >>>
> >>> It looks like the code in the Factory object is a little fragile 
> >>> assuming that the programs HAVE to be in your $PATH.  I 
> don't know 
> >>> if
> >>> windows+perl is special in any way that it run things so I can't
> >>> really tell if there is specific things you have to do 
> here. You may 
> >>> have to run this through cygwin in case PATH and such are 
> just not 
> >>> available properly to windowsPerl.
> >>>
> >>> -jason
> >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> I have emboss installed on a windows machine. (Embosswin). I can 
> >>>> run this from the dos command line and the path is present. 
> >>>> However, when I try to call an emboss application from bioperl I 
> >>>> get a "Application not found error"
> >>>>
> >>>>
> >>>>   my $f = Bio::Factory::EMBOSS->new();
> >>>>   # get an EMBOSS application  object from the factory
> >>>>   my $fuzznuc = $f->program('fuzznuc');
> >>>>     $fuzznuc->run(
> >>>>                   { -sequence  => $infile,
> >>>>                         -pattern   => $motif,
> >>>>                        -outfile   => $outfile            
>            
> >>>>               });
> >>>>  gives the following error
> >>>>
> >>>> -------------------- WARNING ---------------------
> >>>> MSG: Application [fuzznuc] is not available!
> >>>> ---------------------------------------------------
> >>>> Can't call method "run" on an undefined value at 
> searchPatterns.pl 
> >>>> line 102.
> >>>>
> >>>> Can somebody help me fix this ?
> >>>>
> >>>> best regards
> >>>> Rohit
> >>>>
> >>>> --
> >>>>
> >>>> Dr. Rohit Ghai
> >>>> Institute of Medical Microbiology
> >>>> Faculty of Medicine
> >>>> Justus-Liebig University
> >>>> Frankfurter Strasse 107
> >>>> 35392 - Giessen
> >>>> GERMANY
> >>>>
> >>>> Tel  : 0049 (0)641-9946413
> >>>> Fax  : 0049 (0)641-9946409
> >>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org 
> <mailto:Bioperl-l at lists.open-bio.org>
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> --
> >>> Jason Stajich
> >>> jason at bioperl.org <mailto:jason at bioperl.org>
> >>>
> >>
> >> --
> >>
> >> Dr. Rohit Ghai
> >> Institute of Medical Microbiology
> >> Faculty of Medicine
> >> Justus-Liebig University
> >> Frankfurter Strasse 107
> >> 35392 - Giessen
> >> GERMANY
> >>
> >> Tel  : 0049 (0)641-9946413
> >> Fax  : 0049 (0)641-9946409
> >> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> >
> 
> -- 
> 
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
> 
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From arareko at campus.iztacala.unam.mx  Thu Nov  1 19:51:41 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 01 Nov 2007 13:51:41 -0600
Subject: [Bioperl-l] bioperl: cannot run emboss
	programs	usingbioperlonwindows
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<CED81D34E37D5043A1211565277A51E50A2ED425@exchkc02.stowers-institute.org>
Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx>

Doesn't EMBOSS binaries live under 'bin'? Perhaps setting 
PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this:

my $fuzznuc = $f->program('fuzznuc');
$fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc');

Adding .exe might be worth trying as well.

Mauricio.

Cook, Malcolm wrote:
> in the code
> http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 
> 
> there is a call to `wossname` (c.f.
> http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html
> )
> 
> is wossname in your path?
> 
> Maybe it needs to be wossname.exe under windows?
> 
> 
> Malcolm Cook
>   
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai
>> Sent: Thursday, November 01, 2007 1:42 PM
>> To: Jason Stajich
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs 
>> usingbioperlonwindows
>>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot 
>>>> run emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8), and the path is set correctly. I printed 
>>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct 
>>>> location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this, so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location # 
>>>> embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();  # get an EMBOSS 
>> application  
>>>> object from the factory  print Dumper ($f);  my $fuzznuc = 
>>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe 
>> as well,  
>>>> print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # 
>> doesn't work
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile                      
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play 
>>>>> around printing the $ENV{PATH} variable in a perl script 
>> to see if 
>>>>> actually contains the directory where the emboss programs are 
>>>>> installed.  Bioperl can only guess so much as to where to find an 
>>>>> application.  It is also possible that we aren't creating 
>> the proper 
>>>>> path to the executable - you can print the executable path with 
>>>>> print $fuzznuc->executable I believe unless it is 
>> throwing an error 
>>>>> at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile 
>>>>> assuming that the programs HAVE to be in your $PATH.  I 
>> don't know 
>>>>> if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do 
>> here. You may 
>>>>> have to run this through cygwin in case PATH and such are 
>> just not 
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can 
>>>>>> run this from the dos command line and the path is present. 
>>>>>> However, when I try to call an emboss application from bioperl I 
>>>>>> get a "Application not found error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile            
>>            
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at 
>> searchPatterns.pl 
>>>>>> line 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Dr. Rohit Ghai
>>>>>> Institute of Medical Microbiology
>>>>>> Faculty of Medicine
>>>>>> Justus-Liebig University
>>>>>> Frankfurter Strasse 107
>>>>>> 35392 - Giessen
>>>>>> GERMANY
>>>>>>
>>>>>> Tel  : 0049 (0)641-9946413
>>>>>> Fax  : 0049 (0)641-9946409
>>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org 
>> <mailto:Bioperl-l at lists.open-bio.org>
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> --
>>>>> Jason Stajich
>>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>>
>>>> --
>>>>
>>>> Dr. Rohit Ghai
>>>> Institute of Medical Microbiology
>>>> Faculty of Medicine
>>>> Justus-Liebig University
>>>> Frankfurter Strasse 107
>>>> 35392 - Giessen
>>>> GERMANY
>>>>
>>>> Tel  : 0049 (0)641-9946413
>>>> Fax  : 0049 (0)641-9946409
>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> --
>>> Jason Stajich
>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>
>> -- 
>>
>> Dr. Rohit Ghai
>> Institute of Medical Microbiology
>> Faculty of Medicine
>> Justus-Liebig University
>> Frankfurter Strasse 107
>> 35392 - Giessen
>> GERMANY
>>
>> Tel  :	0049 (0)641-9946413
>> Fax  :	0049 (0)641-9946409
>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu Nov  1 20:07:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Nov 2007 15:07:39 -0500
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlonwindows
In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>

I did a little investigating using my old PC and was able to get  
fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a  
hoop or two but I managed to get it working.

First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.   
You need to remove EMBOSSWin and install the one I linked to  
previously (this is an actual EMBOSS beta release).  It's possible  
older EMBOSSWin can be configured, but I don't plan on checking it  
out myself.

Next, you need to ensure the binaries are in your PATH env. variable  
(test by running 'wossname' on the command line), then set  
EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like  
path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me  
and WinXP recognizes the UNIX'y form as a valid path.  If you don't  
know how to set env. variables go here:

http://vlaurie.com/computers2/Articles/environment.htm

Once that is set up you should be able to run the script using the  
latest (greatest?) EMBOSS.

chris

On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:

> Hi Jason
>
> I tried this as well. This also gives the same error message.
>
> -Rohit
>
> Jason Stajich wrote:
>> You could try this - can't test it though so not sure.
>> my $fuzznuc = $f->program('fuzznuc');
>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>
>> -jason
>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>
>>>
>>>
>>> Thanks for all the suggestions... but I unfortunately still  
>>> cannot run
>>> emboss. I am running the latest version of embosswin  (2.10.0- 
>>> Win-0.8),
>>> and the
>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>> C:\EMBOSSwin which is the correct location.
>>> I also tried setting the path directly but I'm not sure how to do  
>>> this,
>>> so I tried this...
>>>
>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>
>>> this also did not work.
>>>
>>> Also tried printing...
>>> $fuzznuc->executable()
>>>
>>> gave the following error again
>>> -------------------- WARNING ---------------------
>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> ---------------------------------------------------
>>>
>>> Any more ideas ?
>>>
>>> thanks !
>>> Rohit
>>>
>>>
>>> here's the code...
>>>
>>> use strict;
>>> use Bio::Factory::EMBOSS;
>>> use Data::Dumper;
>>>
>>> #
>>> # print "PATH=$ENV{PATH}\n";
>>> # path contains C:\EMBOSSwin which is the correct location
>>> # embossversion is 2.10.0-Win-0.8
>>>
>>>  my $f = Bio::Factory::EMBOSS->new();
>>>  # get an EMBOSS application  object from the factory
>>>  print Dumper ($f);
>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried  
>>> fuzznuc.exe
>>> as well,
>>>  print Dump ($fuzznuc);
>>>
>>>  #dump of fuzznuc
>>>  #$VAR1 = bless( {
>>>  #                '_programgroup' => {},
>>>  #                '_programs' => {},
>>>  #                '_groups' => {}
>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>
>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't  
>>> work
>>>
>>>  my $infile = "temp.fasta";
>>>  my $motif  = "ATGTCGATC";
>>>  my $outfile = "test.out";
>>>
>>>
>>>  $fuzznuc->run(
>>>                   { -sequence  => $infile,
>>>                     -pattern   => $motif,
>>>                     -outfile   => $outfile
>>>               });
>>>
>>> Here's the error again....
>>>
>>> #-------------------- WARNING ---------------------
>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>> #---------------------------------------------------
>>>
>>>
>>>
>>>
>>> Jason Stajich wrote:
>>>> Presumably the PATH is not getting set properly - you should play
>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>> actually contains the directory where the emboss programs are
>>>> installed.  Bioperl can only guess so much as to where to find an
>>>> application.  It is also possible that we aren't creating the  
>>>> proper
>>>> path to the executable - you can print the executable path with
>>>> print $fuzznuc->executable
>>>> I believe unless it is throwing an error at the program() line.
>>>>
>>>> It looks like the code in the Factory object is a little fragile
>>>> assuming that the programs HAVE to be in your $PATH.  I don't  
>>>> know if
>>>> windows+perl is special in any way that it run things so I can't
>>>> really tell if there is specific things you have to do here. You  
>>>> may
>>>> have to run this through cygwin in case PATH and such are just not
>>>> available properly to windowsPerl.
>>>>
>>>> -jason
>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have emboss installed on a windows machine. (Embosswin). I  
>>>>> can run
>>>>> this from the dos command line and the path is present. However,
>>>>> when I
>>>>> try to call
>>>>> an emboss application from bioperl I get a "Application not found
>>>>> error"
>>>>>
>>>>>
>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>   # get an EMBOSS application  object from the factory
>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>     $fuzznuc->run(
>>>>>                   { -sequence  => $infile,
>>>>>                         -pattern   => $motif,
>>>>>                        -outfile   => $outfile
>>>>>               });
>>>>>  gives the following error
>>>>>
>>>>> -------------------- WARNING ---------------------
>>>>> MSG: Application [fuzznuc] is not available!
>>>>> ---------------------------------------------------
>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>> line
>>>>> 102.
>>>>>
>>>>> Can somebody help me fix this ?
>>>>>
>>>>> best regards
>>>>> Rohit
>>>>>
>>>>> -- 
>>>>>
>>>>> Dr. Rohit Ghai
>>>>> Institute of Medical Microbiology
>>>>> Faculty of Medicine
>>>>> Justus-Liebig University
>>>>> Frankfurter Strasse 107
>>>>> 35392 - Giessen
>>>>> GERMANY
>>>>>
>>>>> Tel  : 0049 (0)641-9946413
>>>>> Fax  : 0049 (0)641-9946409
>>>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason at bioperl.org <mailto:jason at bioperl.org>
>>>>
>>>
>>> -- 
>>>
>>> Dr. Rohit Ghai
>>> Institute of Medical Microbiology
>>> Faculty of Medicine
>>> Justus-Liebig University
>>> Frankfurter Strasse 107
>>> 35392 - Giessen
>>> GERMANY
>>>
>>> Tel  : 0049 (0)641-9946413
>>> Fax  : 0049 (0)641-9946409
>>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org <mailto:jason at bioperl.org>
>>
>
> -- 
>
> Dr. Rohit Ghai
> Institute of Medical Microbiology
> Faculty of Medicine
> Justus-Liebig University
> Frankfurter Strasse 107
> 35392 - Giessen
> GERMANY
>
> Tel  :	0049 (0)641-9946413
> Fax  :	0049 (0)641-9946409
> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From neetisomaiya at gmail.com  Fri Nov  2 04:20:27 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 2 Nov 2007 09:50:27 +0530
Subject: [Bioperl-l] need help
Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>

Hi,

This is a perl question, not bioperl.
Can anyone point me to a perl program/code/function which can calculate the
number of days between any two given dates.
Any help will be deeply appreciated.
Thanks.

-- 
-Neeti
Even my blood says, B positive


From whs at ebi.ac.uk  Fri Nov  2 05:01:20 2007
From: whs at ebi.ac.uk (Will Spooner)
Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT)
Subject: [Bioperl-l] need help
In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0711020459530.17670@parrot.ebi.ac.uk>

Hi Neeti,

A non-bioperl answer to your perl questio; Date::Calc should do the trick.

Will

On Fri, 2 Nov 2007, neeti somaiya wrote:

> Hi,
>
> This is a perl question, not bioperl.
> Can anyone point me to a perl program/code/function which can calculate the
> number of days between any two given dates.
> Any help will be deeply appreciated.
> Thanks.
>
>


From smarkel at accelrys.com  Sat Nov  3 06:01:38 2007
From: smarkel at accelrys.com (Scott Markel)
Date: Fri, 2 Nov 2007 23:01:38 -0700
Subject: [Bioperl-l] bioperl: cannot run emboss programs using
	bioperlon	windows
In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
Message-ID: <OFD3D05334.F9E235EF-ON88257388.00209BED-88257388.00211BD7@accelrys.com>

I set multiple environment variables in my code.

    $ENV{EMBOSS_ROOT}    = $embossPath;
    $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); 
    $ENV{EMBOSS_DB_DIR}  = File::Spec->catdir($embossPath, "test");
    $ENV{EMBOSS_DATA}    = File::Spec->catdir($embossPath, "data"); 
    $ENV{PATH}           = $embossPath; 

I found it necessary to set both PATH and EMBOSS_ROOT.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com


bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24:

> You could try this - can't test it though so not sure.
> my $fuzznuc = $f->program('fuzznuc');
> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
> 
> -jason
> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
> 
> >
> >
> > Thanks for all the suggestions... but I unfortunately still cannot run
> > emboss. I am running the latest version of embosswin  (2.10.0- 
> > Win-0.8),
> > and the
> > path is set correctly. I printed $ENV{$PATH} and this contains
> > C:\EMBOSSwin which is the correct location.
> > I also tried setting the path directly but I'm not sure how to do 
> > this,
> > so I tried this...
> >
> > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
> >
> > this also did not work.
> >
> > Also tried printing...
> > $fuzznuc->executable()
> >
> > gave the following error again
> > -------------------- WARNING ---------------------
> > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > ---------------------------------------------------
> >
> > Any more ideas ?
> >
> > thanks !
> > Rohit
> >
> >
> > here's the code...
> >
> > use strict;
> > use Bio::Factory::EMBOSS;
> > use Data::Dumper;
> >
> > #
> > # print "PATH=$ENV{PATH}\n";
> > # path contains C:\EMBOSSwin which is the correct location
> > # embossversion is 2.10.0-Win-0.8
> >
> >  my $f = Bio::Factory::EMBOSS->new();
> >  # get an EMBOSS application  object from the factory
> >  print Dumper ($f);
> >  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
> > fuzznuc.exe
> > as well,
> >  print Dump ($fuzznuc);
> >
> >  #dump of fuzznuc
> >  #$VAR1 = bless( {
> >  #                '_programgroup' => {},
> >  #                '_programs' => {},
> >  #                '_groups' => {}
> >  #              }, 'Bio::Factory::EMBOSS' );
> >
> >  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
> >
> >  my $infile = "temp.fasta";
> >  my $motif  = "ATGTCGATC";
> >  my $outfile = "test.out";
> >
> >
> >  $fuzznuc->run(
> >                   { -sequence  => $infile,
> >                     -pattern   => $motif,
> >                     -outfile   => $outfile
> >               });
> >
> > Here's the error again....
> >
> > #-------------------- WARNING ---------------------
> > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
> > #---------------------------------------------------
> >
> >
> >
> >
> > Jason Stajich wrote:
> >> Presumably the PATH is not getting set properly - you should play
> >> around printing the $ENV{PATH} variable in a perl script to see if
> >> actually contains the directory where the emboss programs are
> >> installed.  Bioperl can only guess so much as to where to find an
> >> application.  It is also possible that we aren't creating the proper
> >> path to the executable - you can print the executable path with
> >> print $fuzznuc->executable
> >> I believe unless it is throwing an error at the program() line.
> >>
> >> It looks like the code in the Factory object is a little fragile
> >> assuming that the programs HAVE to be in your $PATH.  I don't know if
> >> windows+perl is special in any way that it run things so I can't
> >> really tell if there is specific things you have to do here. You may
> >> have to run this through cygwin in case PATH and such are just not
> >> available properly to windowsPerl.
> >>
> >> -jason
> >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
> >>
> >>> Dear all,
> >>>
> >>> I have emboss installed on a windows machine. (Embosswin). I can run
> >>> this from the dos command line and the path is present. However, 
> >>> when I
> >>> try to call
> >>> an emboss application from bioperl I get a "Application not found 
> >>> error"
> >>>
> >>>
> >>>   my $f = Bio::Factory::EMBOSS->new();
> >>>   # get an EMBOSS application  object from the factory
> >>>   my $fuzznuc = $f->program('fuzznuc');
> >>>     $fuzznuc->run(
> >>>                   { -sequence  => $infile,
> >>>                         -pattern   => $motif,
> >>>                        -outfile   => $outfile
> >>>               });
> >>>  gives the following error
> >>>
> >>> -------------------- WARNING ---------------------
> >>> MSG: Application [fuzznuc] is not available!
> >>> ---------------------------------------------------
> >>> Can't call method "run" on an undefined value at 
> >>> searchPatterns.pl line
> >>> 102.
> >>>
> >>> Can somebody help me fix this ?
> >>>
> >>> best regards
> >>> Rohit
> >>>
> >>> -- 
> >>>
> >>> Dr. Rohit Ghai
> >>> Institute of Medical Microbiology
> >>> Faculty of Medicine
> >>> Justus-Liebig University
> >>> Frankfurter Strasse 107
> >>> 35392 - Giessen
> >>> GERMANY
> >>>
> >>> Tel  : 0049 (0)641-9946413
> >>> Fax  : 0049 (0)641-9946409
> >>> Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >>> <mailto:Rohit.Ghai at mikrobio.med.uni-giessen.de>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason at bioperl.org <mailto:jason at bioperl.org>
> >>
> >
> > -- 
> >
> > Dr. Rohit Ghai
> > Institute of Medical Microbiology
> > Faculty of Medicine
> > Justus-Liebig University
> > Frankfurter Strasse 107
> > 35392 - Giessen
> > GERMANY
> >
> > Tel  :   0049 (0)641-9946413
> > Fax  :   0049 (0)641-9946409
> > Email:  Rohit.Ghai at mikrobio.med.uni-giessen.de
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Sat Nov  3 14:07:52 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Sat, 03 Nov 2007 15:07:52 +0100
Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon
	windows
In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
References: <4729A047.2060507@mikrobio.med.uni-giessen.de>
	<80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org>
	<472A15B8.7040502@mikrobio.med.uni-giessen.de>
	<6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org>
	<472A1DE5.30207@mikrobio.med.uni-giessen.de>
	<28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu>
Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but
actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any
path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it
runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or
am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct 
version 5.0 (uses embossversion -auto internally, and seems to know 
where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ 
or with exe suffix doesn't work
    #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a 
Bio::Tools::Run::EMBOSSApplication object.
   
   
    #however, creating a EMBOSSApplication object directly makes it 
possible to run the program
    #
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


Chris Fields wrote:
> I did a little investigating using my old PC and was able to get 
> fuzznuc to run using BioPerl and EMBOSS v5.  I had to jump through a 
> hoop or two but I managed to get it working.
>
> First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows.  
> You need to remove EMBOSSWin and install the one I linked to 
> previously (this is an actual EMBOSS beta release).  It's possible 
> older EMBOSSWin can be configured, but I don't plan on checking it out 
> myself.
>
> Next, you need to ensure the binaries are in your PATH env. variable 
> (test by running 'wossname' on the command line), then set EMBOSS_DATA 
> to point at the EMBOSS data directory using a UNIX-like path (i.e. 
> 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP 
> recognizes the UNIX'y form as a valid path.  If you don't know how to 
> set env. variables go here:
>
> http://vlaurie.com/computers2/Articles/environment.htm
>
> Once that is set up you should be able to run the script using the 
> latest (greatest?) EMBOSS.
>
> chris
>
> On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote:
>
>> Hi Jason
>>
>> I tried this as well. This also gives the same error message.
>>
>> -Rohit
>>
>> Jason Stajich wrote:
>>> You could try this - can't test it though so not sure.
>>> my $fuzznuc = $f->program('fuzznuc');
>>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc');
>>>
>>> -jason
>>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote:
>>>
>>>>
>>>>
>>>> Thanks for all the suggestions... but I unfortunately still cannot run
>>>> emboss. I am running the latest version of embosswin  
>>>> (2.10.0-Win-0.8),
>>>> and the
>>>> path is set correctly. I printed $ENV{$PATH} and this contains
>>>> C:\EMBOSSwin which is the correct location.
>>>> I also tried setting the path directly but I'm not sure how to do 
>>>> this,
>>>> so I tried this...
>>>>
>>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc');
>>>>
>>>> this also did not work.
>>>>
>>>> Also tried printing...
>>>> $fuzznuc->executable()
>>>>
>>>> gave the following error again
>>>> -------------------- WARNING ---------------------
>>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> ---------------------------------------------------
>>>>
>>>> Any more ideas ?
>>>>
>>>> thanks !
>>>> Rohit
>>>>
>>>>
>>>> here's the code...
>>>>
>>>> use strict;
>>>> use Bio::Factory::EMBOSS;
>>>> use Data::Dumper;
>>>>
>>>> #
>>>> # print "PATH=$ENV{PATH}\n";
>>>> # path contains C:\EMBOSSwin which is the correct location
>>>> # embossversion is 2.10.0-Win-0.8
>>>>
>>>>  my $f = Bio::Factory::EMBOSS->new();
>>>>  # get an EMBOSS application  object from the factory
>>>>  print Dumper ($f);
>>>>  my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried 
>>>> fuzznuc.exe
>>>> as well,
>>>>  print Dump ($fuzznuc);
>>>>
>>>>  #dump of fuzznuc
>>>>  #$VAR1 = bless( {
>>>>  #                '_programgroup' => {},
>>>>  #                '_programs' => {},
>>>>  #                '_groups' => {}
>>>>  #              }, 'Bio::Factory::EMBOSS' );
>>>>
>>>>  #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work
>>>>
>>>>  my $infile = "temp.fasta";
>>>>  my $motif  = "ATGTCGATC";
>>>>  my $outfile = "test.out";
>>>>
>>>>
>>>>  $fuzznuc->run(
>>>>                   { -sequence  => $infile,
>>>>                     -pattern   => $motif,
>>>>                     -outfile   => $outfile
>>>>               });
>>>>
>>>> Here's the error again....
>>>>
>>>> #-------------------- WARNING ---------------------
>>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available!
>>>> #---------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason Stajich wrote:
>>>>> Presumably the PATH is not getting set properly - you should play
>>>>> around printing the $ENV{PATH} variable in a perl script to see if
>>>>> actually contains the directory where the emboss programs are
>>>>> installed.  Bioperl can only guess so much as to where to find an
>>>>> application.  It is also possible that we aren't creating the proper
>>>>> path to the executable - you can print the executable path with
>>>>> print $fuzznuc->executable
>>>>> I believe unless it is throwing an error at the program() line.
>>>>>
>>>>> It looks like the code in the Factory object is a little fragile
>>>>> assuming that the programs HAVE to be in your $PATH.  I don't know if
>>>>> windows+perl is special in any way that it run things so I can't
>>>>> really tell if there is specific things you have to do here. You may
>>>>> have to run this through cygwin in case PATH and such are just not
>>>>> available properly to windowsPerl.
>>>>>
>>>>> -jason
>>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have emboss installed on a windows machine. (Embosswin). I can run
>>>>>> this from the dos command line and the path is present. However,
>>>>>> when I
>>>>>> try to call
>>>>>> an emboss application from bioperl I get a "Application not found
>>>>>> error"
>>>>>>
>>>>>>
>>>>>>   my $f = Bio::Factory::EMBOSS->new();
>>>>>>   # get an EMBOSS application  object from the factory
>>>>>>   my $fuzznuc = $f->program('fuzznuc');
>>>>>>     $fuzznuc->run(
>>>>>>                   { -sequence  => $infile,
>>>>>>                         -pattern   => $motif,
>>>>>>                        -outfile   => $outfile
>>>>>>               });
>>>>>>  gives the following error
>>>>>>
>>>>>> -------------------- WARNING ---------------------
>>>>>> MSG: Application [fuzznuc] is not available!
>>>>>> ---------------------------------------------------
>>>>>> Can't call method "run" on an undefined value at searchPatterns.pl
>>>>>> line
>>>>>> 102.
>>>>>>
>>>>>> Can somebody help me fix this ?
>>>>>>
>>>>>> best regards
>>>>>> Rohit
>>>>>>
>>>>>> -- 
>>>>>>
>
>


From hlapp at gmx.net  Sun Nov  4 17:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 4 Nov 2007 12:42:13 -0500
Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript
In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de>
Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net>

Hi Stefanie,

sorry for taking so long to respond - your email got buried in a pile  
while I was away on travel. The Bio::SeqFeature::Gene::* modules were  
written mostly with the motivation to have a model that can represent  
the results of gene predictors.

GenBank AFAIK doesn't annotate introns explicitly, though they should  
be implicit from cDNA (or mRNA? or gene, as you say) features on  
genomic sequence. The Bioperl SeqIO parsers won't transform those  
into a Bio::SeqFeature::Gene-based model, but instead will yield just  
plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent  
processing to build these into more hierarchical models.

I'm not sure whether someone's done this already for GenBank-type  
feature tables. There is a Unflattener that at least attempts to  
build a feature hierarchy from the flat array that's compliant with  
the Sequence Ontology (or so I recall).

I'm copying the list in case others have additional suggestions.

	-hilmar

On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote:

>
>
> Hello Hilmar,
>
> I have a question about your bioperl module  
> Bio::SeqFeature::Gene::Transcript:
>
> I can't figure out how to generate the $gene object for use in this  
> line:
> @introns = $gene->introns();
>
> The data I'm working with is a local file in genbank format, and  
> I'm interested in extracting intron sequences (and maybe flanking  
> exons) for certain genes. I have been trying to get the introns via  
> the sequence features ('CDS' or 'gene'), but this has not been  
> working. Which approach will I have to take?
> I'd be very grateful if you could point me into the right direction!
>
> Hope things are going well in Durham! And thank you in advance!
>
> Stefanie
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From downloadondemand at gmail.com  Sun Nov  4 18:39:42 2007
From: downloadondemand at gmail.com (download on demand)
Date: Sun, 4 Nov 2007 20:39:42 +0200
Subject: [Bioperl-l] Help with Bio::SeqIO
Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>

Hi to all.

I have a problem with a simplest script:


         use Bio::SeqIO;
         # get command-line arguments, or die with a usage statement
         my $usage = "x2y.pl infile infileformat outfile outfileformat\n";
         my $infile = shift or die $usage;
         my $infileformat = shift or die $usage;
#         my $outfile = shift or die $usage;
         my $outfileformat = shift or die $usage;

         # create one SeqIO object to read in,and another to write out
         my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => $infileformat);
         my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
                                       '-format' => $outfileformat);

         # write each entry in the input file to the output file
         while (my $inseq = $seq_in->next_seq) {

#            $seq_out->write_seq($inseq); # Whole sequence not needed

for my $feat_object ($inseq->get_SeqFeatures)
    {
    if ($feat_object->primary_tag eq "CDS")
        {
        print $feat_object->get_tag_values('product'),"\n";
        print
$feat_object->location->start,"..",$feat_object->location->end,"\n";
        print $feat_object->spliced_seq->seq,"\n\n";
        }
    }


The result seems OK to me, but in case of first CDS of NC_005213.gbk from
here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/> the
output is wrong:

It is:
hypothetical protein
1..490885
TAAATGCGATTGCTATTAGAA..................................Truncated
sequence...................................

Should be:
hypothetical protein
879..490883
ATGCGATTGCTATTAGAA...................................Truncated
sequence....................................TAA


This CDS have an unnatural location string:
CDS             complement(join(490883..490885,1..879)), but spliced_seq
should handle these things?

Please help me!
Best regards, N.


From cjfields at uiuc.edu  Mon Nov  5 00:08:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 4 Nov 2007 18:08:34 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>

Pass in (-nosort => 1) to spliced_seq:

print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";

This ensures no sorting of sublocations occurs, if you want for  
instance typical GenBank/EMBL 'join' behavior.

To the other devs: shouldn't -nosort be the default behavior when the  
split location is a 'join'?  In other words, should spliced_seq() be  
modified to take into account the split location type when returning  
sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly  
indicates the order of the sequences is important when joined  
together; the current behavior is more like that for 'order'.

chris

On Nov 4, 2007, at 12:39 PM, download on demand wrote:

> Hi to all.
>
> I have a problem with a simplest script:
>
>
>
>          use Bio::SeqIO;
>          # get command-line arguments, or die with a usage statement
>          my $usage = "x2y.pl infile infileformat outfile  
> outfileformat\n";
>          my $infile = shift or die $usage;
>          my $infileformat = shift or die $usage;
> #         my $outfile = shift or die $usage;
>          my $outfileformat = shift or die $usage;
>
>          # create one SeqIO object to read in,and another to write out
>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => $infileformat);
>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>                                        '-format' => $outfileformat);
>
>          # write each entry in the input file to the output file
>          while (my $inseq = $seq_in->next_seq) {
>
> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>
> for my $feat_object ($inseq->get_SeqFeatures)
>     {
>     if ($feat_object->primary_tag eq "CDS")
>         {
>         print $feat_object->get_tag_values('product'),"\n";
>         print
> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>         print $feat_object->spliced_seq->seq,"\n\n";
>         }
>     }
>
>
>
> The result seems OK to me, but in case of first CDS of  
> NC_005213.gbk from
> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/ 
> > the
> output is wrong:
>
> It is:
> hypothetical protein
> 1..490885
> TAAATGCGATTGCTATTAGAA..................................Truncated
> sequence...................................
>
> Should be:
> hypothetical protein
> 879..490883
> ATGCGATTGCTATTAGAA...................................Truncated
> sequence....................................TAA
>
>
>
> This CDS have an unnatural location string:
> CDS             complement(join(490883..490885,1..879)), but  
> spliced_seq
> should handle these things?
>
> Please help me!
> Best regards, N.
> _______________________________________________
>


From jean-luc.jany at univ-brest.fr  Mon Nov  5 08:26:52 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Mon, 05 Nov 2007 09:26:52 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to
	blastall
Message-ID: <472ED3CC.2050305@univ-brest.fr>

Dear Bioperl and Mac users,

I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables.

I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?)

Actually, my blast file is in myname directory and comprises a /bin and  a /data file. I have got my blastall and other executables in myname/blast/bin/blastall.

Thank you in anticipation for your help.

Jean-Luc


From Rohit.Ghai at mikrobio.med.uni-giessen.de  Mon Nov  5 11:36:16 2007
From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai)
Date: Mon, 05 Nov 2007 12:36:16 +0100
Subject: [Bioperl-l] bioperl and emboss on windows
Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de>

Dear all, thanks for all the different inputs on this topic, I was able 
to run emboss applications on windows (vista), but with the following 
workaround.

Chris suggested to remove EMBOSSwin and get another version. This I did. 
Scott suggested setting all the variables within the program. This I 
also tried, but actually these were already available to the program so this was also 
not the problem. The following line...

my $fuzznuc = $f->program('fuzznuc')

doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using 
Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't 
have any path issues. What is also curious is that $f->version returns the 
correct version of emboss running (no path problems here), and it looks 
like it runs the command "embossversion -auto" to get this information. If it 
can get at this command, its a bit peculiar why it cannot get the other 
programs. Or am I missing something here ?


Please take a look at the code, I have commented within this...


-Rohit


use Bio::Factory::EMBOSS;
use Data::Dumper;
use Bio::Tools::Run::EMBOSSApplication;


my $infile = "test.fasta";
my $motif  = "AGGAGG";
my $outfile = "test.out";


     my $f = Bio::Factory::EMBOSS->new();
     # get an EMBOSS application  object from the factory
    print Dumper $f;  
   
    print "location=",$f->location,"\n";   #returns local
    print "version=", $f->version,"\n";    #  this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is)
    print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing
    print "list=",$f->_program_list,"\n";  #returns nothing
   
    #
    # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work
    # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work
    # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object.
    #
    #
    #
    # however, creating a EMBOSSApplication object directly makes it possible to run the program
    #
    
    my $application = Bio::Tools::Run::EMBOSSApplication->new();
    $application->name('fuzznuc');   
    print Dumper $application;
    $application->run(
                   { -sequence  => $infile,
                     -pattern   => $motif,
                     -outfile   => $outfile                      
               });   
    print "Done\n";
   
    exit;


From neetisomaiya at gmail.com  Mon Nov  5 12:20:04 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 5 Nov 2007 17:50:04 +0530
Subject: [Bioperl-l] perl question
Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>

Again a perl question, and maybe a very trivial one.
How do I terminate a number like 3.1232010098 to only 3 decimal places in
perl?

-- 
-Neeti
Even my blood says, B positive


From biology0046 at hotmail.com  Mon Nov  5 12:16:13 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Mon, 05 Nov 2007 12:16:13 +0000
Subject: [Bioperl-l] how to extract intron information from gff files.
Message-ID: <BLU108-F34DC66B7BB1B9063DA2BC8B4880@phx.gbl>

Dear all:

i got a poplar genome gff file like this:
LG_I	src	exon	2598	3280	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	2598	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 4
LG_I	src	start_codon	3278	3280	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	stop_codon	2598	2600	.	-	0	name "fgenesh1_pg.C_LG_I000001"
LG_I	src	exon	3544	3918	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	3544	3918	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 3
LG_I	src	exon	4258	4740	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	4258	4740	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 2
LG_I	src	exon	5344	6388	.	-	.	name "fgenesh1_pg.C_LG_I000001"; transcriptId 
62649
LG_I	src	CDS	5344	6388	.	-	2	name "fgenesh1_pg.C_LG_I000001"; proteinId 
62649; exonNumber 1
LG_I	src	exon	8259	8528	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8259	8528	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 3
LG_I	src	stop_codon	8259	8261	.	-	0	name "fgenesh1_pg.C_LG_I000002"
LG_I	src	exon	8897	8987	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	8897	8987	.	-	0	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 2
LG_I	src	exon	9831	9892	.	-	.	name "fgenesh1_pg.C_LG_I000002"; transcriptId 
62650
LG_I	src	CDS	9831	9892	.	-	1	name "fgenesh1_pg.C_LG_I000002"; proteinId 
62650; exonNumber 1
LG_I	src	start_codon	9890	9892	.	-	0	name "fgenesh1_pg.C_LG_I000002"

I try to use Bio::DB::GFF, but this module only applies to methods given in 
the gff file.
what i want to get is "intron, 5utr, 3utr", but this information do not 
contain in this gff file.

how can i get these information through bioperl? This file do not contain 
intron information
if i consider gaps between exons as introns, non cds parts of the first and 
last exon as utrs, how can i extract them through this gff file.

Thanks~~

Wenkai

_________________________________________________________________
??????????????? MSN Hotmail?  http://www.hotmail.com  


From spiros at lokku.com  Mon Nov  5 12:36:36 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 5 Nov 2007 12:36:36 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <bba689ec0711050436r6016ae57le78db531f9eab55b@mail.gmail.com>

Hey,

use the `sprintf` function. More information can be found at ,
http://perldoc.perl.org/functions/sprintf.html.

For more proper rounding, you could use the Math::Round module,
http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm.

hope this helps,
spiros

On 11/5/07, neeti somaiya <neetisomaiya at gmail.com> wrote:
>
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ak at ebi.ac.uk  Mon Nov  5 12:43:06 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 12:43:06 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <20071105124305.GC4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

When displaying:

  printf( "The number is %.3f\n", $number );

When making a string:

  my $string = sprintf( "%.3f", $number );


BTW, this is cutting, not rounding.


Cheers,
Andreas


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------


From t.nugent at cs.ucl.ac.uk  Mon Nov  5 12:37:15 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 05 Nov 2007 12:37:15 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F0E7B.60303@cs.ucl.ac.uk>

Use Math:Round and nearest_ceil:

http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?
>
>   

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From bix at sendu.me.uk  Mon Nov  5 12:47:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 12:47:17 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <472F10D5.5060006@sendu.me.uk>

neeti somaiya wrote:
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> perl?

Please don't use this list to ask general Perl questions.
See these instead:

http://perldoc.perl.org/perlfaq4.html
http://lists.cpan.org/
http://www.perlmonks.org/


$rounded = sprintf("%.3f", $number);


From Marc.Logghe at DEVGEN.com  Mon Nov  5 12:39:36 2007
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Mon, 5 Nov 2007 13:39:36 +0100
Subject: [Bioperl-l] perl question
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com>

Hi,
Have a look at
http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w
idth

In your particular case:
my $f = 3.1232010098;
printf "%0.3f", $f;


HTH,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> neeti somaiya
> Sent: Monday, November 05, 2007 1:20 PM
> To: bioperl-l
> Subject: [Bioperl-l] perl question
> 
> Again a perl question, and maybe a very trivial one.
> How do I terminate a number like 3.1232010098 to only 3 
> decimal places in perl?
> 
> --
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From bix at sendu.me.uk  Mon Nov  5 13:24:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 05 Nov 2007 13:24:25 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <20071105124305.GC4491@ebi.ac.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk>
Message-ID: <472F1989.90105@sendu.me.uk>

Andreas Kahari wrote:
> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
>> Again a perl question, and maybe a very trivial one.
>> How do I terminate a number like 3.1232010098 to only 3 decimal places in
>> perl?
> 
> When displaying:
> 
>   printf( "The number is %.3f\n", $number );
> 
> When making a string:
> 
>   my $string = sprintf( "%.3f", $number );
> 
> 
> BTW, this is cutting, not rounding.

(s)printf rounds (ie. doesn't simply truncate), though for critical 
applications you should use your own rounding algorithm.


From ak at ebi.ac.uk  Mon Nov  5 13:56:24 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon, 5 Nov 2007 13:56:24 +0000
Subject: [Bioperl-l] perl question
In-Reply-To: <472F1989.90105@sendu.me.uk>
References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com>
	<20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk>
Message-ID: <20071105135624.GD4491@ebi.ac.uk>

On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote:
> Andreas Kahari wrote:
> > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote:
> >> Again a perl question, and maybe a very trivial one.
> >> How do I terminate a number like 3.1232010098 to only 3 decimal places in
> >> perl?
> > 
> > When displaying:
> > 
> >   printf( "The number is %.3f\n", $number );
> > 
> > When making a string:
> > 
> >   my $string = sprintf( "%.3f", $number );
> > 
> > 
> > BTW, this is cutting, not rounding.
> 
> (s)printf rounds (ie. doesn't simply truncate), though for critical 
> applications you should use your own rounding algorithm.

They do indeed.  Mea culpa.


Andreas

-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
--------------------------------------------


From jay at jays.net  Mon Nov  5 15:14:17 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 10:14:17 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net>

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
> To the other devs: shouldn't -nosort be the default behavior when the
> split location is a 'join'?

I certainly think so.

> In other words, should spliced_seq() be
> modified to take into account the split location type when returning
> sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly
> indicates the order of the sequences is important when joined
> together; the current behavior is more like that for 'order'.

I don't see any value to the sorting algorithm. All tests invoke - 
nosort => 1 (except a phase test where nosort doesn't matter anyway).  
In my limited experience the sorting only serves to break real-world  
splicing.

If there is no valid use then we can remove ~20 lines from  
SeqFeatureI.pm circa line 505. If there is a valid use and someone  
would be so kind as to educate me I'd be happy to add tests which  
demonstrate them.  :)

P.S.  CSHL is neato. I plan on understanding some of this stuff some  
day.  :)

j
http://www.bioperl.org/wiki/User:Jhannah


From hlapp at duke.edu  Mon Nov  5 16:03:16 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 11:03:16 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
Message-ID: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>

I agree that there should be a meaningful default that results in  
"doing the right thing" in most cases if the user doesn't intervene.  
I'm not sure I understand all the details, but it sounds sorting or  
not sorting should depend on the split location type unless the user  
overrides it by argument. That's what you're suggesting, right?

	-hilmar

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:

> Pass in (-nosort => 1) to spliced_seq:
>
> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>
> This ensures no sorting of sublocations occurs, if you want for  
> instance typical GenBank/EMBL 'join' behavior.
>
> To the other devs: shouldn't -nosort be the default behavior when  
> the split location is a 'join'?  In other words, should spliced_seq 
> () be modified to take into account the split location type when  
> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'  
> explicitly indicates the order of the sequences is important when  
> joined together; the current behavior is more like that for 'order'.
>
> chris
>
> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>
>> Hi to all.
>>
>> I have a problem with a simplest script:
>>
>>
>>
>>          use Bio::SeqIO;
>>          # get command-line arguments, or die with a usage statement
>>          my $usage = "x2y.pl infile infileformat outfile  
>> outfileformat\n";
>>          my $infile = shift or die $usage;
>>          my $infileformat = shift or die $usage;
>> #         my $outfile = shift or die $usage;
>>          my $outfileformat = shift or die $usage;
>>
>>          # create one SeqIO object to read in,and another to write  
>> out
>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>                                       '-format' => $infileformat);
>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>                                        '-format' => $outfileformat);
>>
>>          # write each entry in the input file to the output file
>>          while (my $inseq = $seq_in->next_seq) {
>>
>> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>>
>> for my $feat_object ($inseq->get_SeqFeatures)
>>     {
>>     if ($feat_object->primary_tag eq "CDS")
>>         {
>>         print $feat_object->get_tag_values('product'),"\n";
>>         print
>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>         print $feat_object->spliced_seq->seq,"\n\n";
>>         }
>>     }
>>
>>
>>
>> The result seems OK to me, but in case of first CDS of  
>> NC_005213.gbk from
>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ 
>> Nanoarchaeum_equitans/> the
>> output is wrong:
>>
>> It is:
>> hypothetical protein
>> 1..490885
>> TAAATGCGATTGCTATTAGAA..................................Truncated
>> sequence...................................
>>
>> Should be:
>> hypothetical protein
>> 879..490883
>> ATGCGATTGCTATTAGAA...................................Truncated
>> sequence....................................TAA
>>
>>
>>
>> This CDS have an unnatural location string:
>> CDS             complement(join(490883..490885,1..879)), but  
>> spliced_seq
>> should handle these things?
>>
>> Please help me!
>> Best regards, N.
>> _______________________________________________
>>
>
>
>


From bernd.web at gmail.com  Mon Nov  5 16:53:01 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Mon, 5 Nov 2007 17:53:01 +0100
Subject: [Bioperl-l] PSI-BLAST
Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com>

Hi,

Is it possible with SearchIO to select a specific iteration (Results
from round i) part of the PSI-blast report, when parsing this with
SearchIO::blast?
It seems the parser parses the complete report. If not implemented I
could of course extract the specific part of the psi-blast report and
then give it too SearchIO (e.g. with IO::String), but maybe I am
missing a built-in option?


Regards,
Bernd


From jay at jays.net  Mon Nov  5 16:54:13 2007
From: jay at jays.net (Jay Hannah)
Date: Mon, 5 Nov 2007 11:54:13 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>

On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?

If someone knows why spliced_seq() should ever sort then I'm  
suggesting we add a test demonstrating a useful example of that.

If no one has a useful example of when you would want spliced_seq()  
to sort then I'm suggesting we remove the sorting altogether and  
nosort goes away.

I can provide/add many examples where sorting is bad. I do not know  
of a case where sorting is good.

j
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Mon Nov  5 17:07:10 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Nov 2007 12:07:10 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>


At one point the location order was not respected/saved I believe. I  
guess we will just assume the user will build up a SplitLocation in  
order (i.e. add_SubLocation).  I'll try and remember if there were  
any other particular reasons.


-jason
On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar
>
> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>
>> Pass in (-nosort => 1) to spliced_seq:
>>
>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>
>> This ensures no sorting of sublocations occurs, if you want for
>> instance typical GenBank/EMBL 'join' behavior.
>>
>> To the other devs: shouldn't -nosort be the default behavior when
>> the split location is a 'join'?  In other words, should spliced_seq
>> () be modified to take into account the split location type when
>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>> explicitly indicates the order of the sequences is important when
>> joined together; the current behavior is more like that for 'order'.
>>
>> chris
>>
>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>
>>> Hi to all.
>>>
>>> I have a problem with a simplest script:
>>>
>>>
>>>
>>>          use Bio::SeqIO;
>>>          # get command-line arguments, or die with a usage statement
>>>          my $usage = "x2y.pl infile infileformat outfile
>>> outfileformat\n";
>>>          my $infile = shift or die $usage;
>>>          my $infileformat = shift or die $usage;
>>> #         my $outfile = shift or die $usage;
>>>          my $outfileformat = shift or die $usage;
>>>
>>>          # create one SeqIO object to read in,and another to write
>>> out
>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>                                       '-format' => $infileformat);
>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>                                        '-format' => $outfileformat);
>>>
>>>          # write each entry in the input file to the output file
>>>          while (my $inseq = $seq_in->next_seq) {
>>>
>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>> needed
>>>
>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>     {
>>>     if ($feat_object->primary_tag eq "CDS")
>>>         {
>>>         print $feat_object->get_tag_values('product'),"\n";
>>>         print
>>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>         }
>>>     }
>>>
>>>
>>>
>>> The result seems OK to me, but in case of first CDS of
>>> NC_005213.gbk from
>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>> Nanoarchaeum_equitans/> the
>>> output is wrong:
>>>
>>> It is:
>>> hypothetical protein
>>> 1..490885
>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>> sequence...................................
>>>
>>> Should be:
>>> hypothetical protein
>>> 879..490883
>>> ATGCGATTGCTATTAGAA...................................Truncated
>>> sequence....................................TAA
>>>
>>>
>>>
>>> This CDS have an unnatural location string:
>>> CDS             complement(join(490883..490885,1..879)), but
>>> spliced_seq
>>> should handle these things?
>>>
>>> Please help me!
>>> Best regards, N.
>>> _______________________________________________
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Mon Nov  5 17:16:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:16:10 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>

Yes, we would sort based on the splittype() and default to a  
particular behavior ('join') if one isn't designated, maybe with a  
warning indicating the splittype() isn't defined.  Using an 'order'  
or other defined types could also delineate a default sort/nosort  
behavior (probably the previous as it would replicate prior behavior).

chris

On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar


From cjfields at uiuc.edu  Mon Nov  5 17:20:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:20:35 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EE77891F-D6B5-47C8-8EC6-FA671D9C4868@jays.net>
Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu>


On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote:

> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>
> If someone knows why spliced_seq() should ever sort then I'm
> suggesting we add a test demonstrating a useful example of that.
>
> If no one has a useful example of when you would want spliced_seq()
> to sort then I'm suggesting we remove the sorting altogether and
> nosort goes away.
>
> I can provide/add many examples where sorting is bad. I do not know
> of a case where sorting is good.
>
> j
> http://www.bioperl.org/wiki/User:Jhannah

The behavior would be based on the current use of 'join', 'order',  
and 'bond' (the latter in GenPept records).  I documented some cases  
here a while back:

http://www.bioperl.org/wiki/BioPerl_Locations#Split

chris


From hlapp at duke.edu  Mon Nov  5 17:32:24 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 12:32:24 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu>
Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu>

Sounds good to me. -hilmar

On Nov 5, 2007, at 12:16 PM, Chris Fields wrote:

> Yes, we would sort based on the splittype() and default to a  
> particular behavior ('join') if one isn't designated, maybe with a  
> warning indicating the splittype() isn't defined.  Using an 'order'  
> or other defined types could also delineate a default sort/nosort  
> behavior (probably the previous as it would replicate prior behavior).
>
> chris
>
> On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 17:41:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 11:41:27 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
Message-ID: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>

It may have something to do with remote locations or setting strand()  
in sublocations.  This may have popped up in relation to a LocationI  
code audit I proposed a while back on the list which I never got  
around to.  Oh well...

I at least managed getting a wiki page started in case we decided to  
make changes, with the intention of making it a HOWTO at some point:

http://www.bioperl.org/wiki/BioPerl_Locations

If we go through with the changes to spliced_seq(), should it be  
implemented for inclusion in v1.6 or wait until v1.7?

chris

On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote:

>
> At one point the location order was not respected/saved I believe.  
> I guess we will just assume the user will build up a SplitLocation  
> in order (i.e. add_SubLocation).  I'll try and remember if there  
> were any other particular reasons.
>
>
> -jason
> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>>
>> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>>
>>> Pass in (-nosort => 1) to spliced_seq:
>>>
>>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>>
>>> This ensures no sorting of sublocations occurs, if you want for
>>> instance typical GenBank/EMBL 'join' behavior.
>>>
>>> To the other devs: shouldn't -nosort be the default behavior when
>>> the split location is a 'join'?  In other words, should spliced_seq
>>> () be modified to take into account the split location type when
>>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>>> explicitly indicates the order of the sequences is important when
>>> joined together; the current behavior is more like that for 'order'.
>>>
>>> chris
>>>
>>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>>
>>>> Hi to all.
>>>>
>>>> I have a problem with a simplest script:
>>>>
>>>>
>>>>
>>>>          use Bio::SeqIO;
>>>>          # get command-line arguments, or die with a usage  
>>>> statement
>>>>          my $usage = "x2y.pl infile infileformat outfile
>>>> outfileformat\n";
>>>>          my $infile = shift or die $usage;
>>>>          my $infileformat = shift or die $usage;
>>>> #         my $outfile = shift or die $usage;
>>>>          my $outfileformat = shift or die $usage;
>>>>
>>>>          # create one SeqIO object to read in,and another to write
>>>> out
>>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>>                                       '-format' => $infileformat);
>>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>>                                        '-format' =>  
>>>> $outfileformat);
>>>>
>>>>          # write each entry in the input file to the output file
>>>>          while (my $inseq = $seq_in->next_seq) {
>>>>
>>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>>> needed
>>>>
>>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>>     {
>>>>     if ($feat_object->primary_tag eq "CDS")
>>>>         {
>>>>         print $feat_object->get_tag_values('product'),"\n";
>>>>         print
>>>> $feat_object->location->start,"..",$feat_object->location- 
>>>> >end,"\n";
>>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>
>>>> The result seems OK to me, but in case of first CDS of
>>>> NC_005213.gbk from
>>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>>> Nanoarchaeum_equitans/> the
>>>> output is wrong:
>>>>
>>>> It is:
>>>> hypothetical protein
>>>> 1..490885
>>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>>> sequence...................................
>>>>
>>>> Should be:
>>>> hypothetical protein
>>>> 879..490883
>>>> ATGCGATTGCTATTAGAA...................................Truncated
>>>> sequence....................................TAA
>>>>
>>>>
>>>>
>>>> This CDS have an unnatural location string:
>>>> CDS             complement(join(490883..490885,1..879)), but
>>>> spliced_seq
>>>> should handle these things?
>>>>
>>>> Please help me!
>>>> Best regards, N.
>>>> _______________________________________________
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Mon Nov  5 16:05:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 05 Nov 2007 12:05:41 -0400
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <472ED3CC.2050305@univ-brest.fr>
Message-ID: <C354B795.10231%bosborne11@verizon.net>

Jean-luc,

>From what you written it sounds like you're using bash and not some other
shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
in your home directory, as well as a .ncbirc file. This should work.

I'm no Unix expert but I've always configured tcsh on the Mac in the same
ways I'd configure it on Linux machines. Similarly, if you're using bash
then it will read its .bashrc file, regardless of what flavor of Unix you
use (and the same thing holds true for zsh or csh or ...).

Brian O.


On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:

> Dear Bioperl and Mac users,
> 
> I am a Mac user and would like to run a script I made using
> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
> to Bioperl the pathway to Blastall and other executables.
> 
> I read carefully the following link
> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
> path to Blast, but I guess the way to proceed is slightly different in Mac and
> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
> .profile file instead of .bashrc?)
> 
> Actually, my blast file is in myname directory and comprises a /bin and  a
> /data file. I have got my blastall and other executables in
> myname/blast/bin/blastall.
> 
> Thank you in anticipation for your help.
> 
> Jean-Luc
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Nov  5 18:35:56 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 05 Nov 2007 12:35:56 -0600
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
In-Reply-To: <C354B795.10231%bosborne11@verizon.net>
References: <C354B795.10231%bosborne11@verizon.net>
Message-ID: <472F628C.2000506@campus.iztacala.unam.mx>

If the ~/.bashrc file doesn't work for you, try renaming it to 
~/.bash_profile and re-login, that might work best.

~/.bashrc works as an individual per-interactive-shell startup file, 
whereas ~/.bash_profile is a personal initialization file, executed for 
login shells.

Hope this helps.

Regards,
Mauricio.


Brian Osborne wrote:
> Jean-luc,
> 
>>From what you written it sounds like you're using bash and not some other
> shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file
> in your home directory, as well as a .ncbirc file. This should work.
> 
> I'm no Unix expert but I've always configured tcsh on the Mac in the same
> ways I'd configure it on Linux machines. Similarly, if you're using bash
> then it will read its .bashrc file, regardless of what flavor of Unix you
> use (and the same thing holds true for zsh or csh or ...).
> 
> Brian O.
> 
> 
> On 11/5/07 4:26 AM, "Jean-luc Jany" <jean-luc.jany at univ-brest.fr> wrote:
> 
>> Dear Bioperl and Mac users,
>>
>> I am a Mac user and would like to run a script I made using
>> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate
>> to Bioperl the pathway to Blastall and other executables.
>>
>> I read carefully the following link
>> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the
>> path to Blast, but I guess the way to proceed is slightly different in Mac and
>> that I should not create .ncbirc and .bashrc files (e.g. should I modify the
>> .profile file instead of .bashrc?)
>>
>> Actually, my blast file is in myname directory and comprises a /bin and  a
>> /data file. I have got my blastall and other executables in
>> myname/blast/bin/blastall.
>>
>> Thank you in anticipation for your help.
>>
>> Jean-Luc
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at duke.edu  Mon Nov  5 21:04:11 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 5 Nov 2007 16:04:11 -0500
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
Message-ID: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>


On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:

> If we go through with the changes to spliced_seq(), should it be  
> implemented for inclusion in v1.6 or wait until v1.7?

I would say they should be implemented ASAP because they 1) should  
not change behavior for those for which the current default behavior  
was already broken (and who therefore pass in --no_sort), and 2) fix  
the behavior for those who erroneously assumed that the code was  
going to do the right thing by default.

I.e., it sounds mostly like a bugfix to me. Am I overlooking something?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at uiuc.edu  Mon Nov  5 22:12:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 5 Nov 2007 16:12:23 -0600
Subject: [Bioperl-l] Help with Bio::SeqIO
In-Reply-To: <EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com>
	<8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu>
	<F0791856-B4F4-4EFF-8AFD-63636FF0F40A@duke.edu>
	<EB291133-556C-49E3-A4BA-9F86208B3247@bioperl.org>
	<D2491290-6B84-4CA4-96FD-FF464DC8D665@uiuc.edu>
	<EBDD3F1A-4F30-47DD-9693-8A50EBBD1D8D@duke.edu>
Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu>


On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote:

>
> On Nov 5, 2007, at 12:41 PM, Chris Fields wrote:
>
>> If we go through with the changes to spliced_seq(), should it be  
>> implemented for inclusion in v1.6 or wait until v1.7?
>
> I would say they should be implemented ASAP because they 1) should  
> not change behavior for those for which the current default  
> behavior was already broken (and who therefore pass in --no_sort),  
> and 2) fix the behavior for those who erroneously assumed that the  
> code was going to do the right thing by default.
>
> I.e., it sounds mostly like a bugfix to me. Am I overlooking  
> something?
>
> 	-hilmar
> -- 

Okay; I'll try to get this in soon.

chris


From jean-luc.jany at univ-brest.fr  Tue Nov  6 09:00:07 2007
From: jean-luc.jany at univ-brest.fr (Jean-luc Jany)
Date: Tue, 06 Nov 2007 10:00:07 +0100
Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path
 to blastall
Message-ID: <47302D17.2030500@univ-brest.fr>

Thanks Brian. Yes I use bash. I am going to follow your advice as soon 
as possible (for some reasons I am unable to run bioperl) and come back 
to you to tell you if it runs.
Jean-Luc


From jason at bioperl.org  Tue Nov  6 21:18:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 16:18:35 -0500
Subject: [Bioperl-l] lightweight sequence features
Message-ID: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>

I started a branch for implementing and playing with lightweight  
feature object. The branch is called 'lightweight_feature_branch'.

Right now it is about 70% faster just in object creation based on  
parsing features using Bio::Tools::GFF and swapping the types of  
features that are created.  It uses arrays instead of hashes under  
the hood.

So the objects don't have locations under the hood.  My hope is if  
this works okay we could use it for creating objects where we KNOW  
the underlying features have simple locations so such as parsing in  
GFF data.

-jason
--
Jason Stajich
jason at bioperl.org


From cjfields at uiuc.edu  Tue Nov  6 21:57:17 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Nov 2007 15:57:17 -0600
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>

Bravo!  I once benchmarked Location instance creation once and found  
it contributed quite a bit of overhead so the speedup with that and  
the use of arrays makes quite a bit of sense to me.

You mention only simple locations; I'm guessing this doesn't handle  
'fuzzy' ends?  If it did I could see layering the feature data from  
the get-go, so it could be used just about anywhere in the place of  
SF::Generic.  Maybe something to test out in 1.7?

chris

On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:

> I started a branch for implementing and playing with lightweight
> feature object. The branch is called 'lightweight_feature_branch'.
>
> Right now it is about 70% faster just in object creation based on
> parsing features using Bio::Tools::GFF and swapping the types of
> features that are created.  It uses arrays instead of hashes under
> the hood.
>
> So the objects don't have locations under the hood.  My hope is if
> this works okay we could use it for creating objects where we KNOW
> the underlying features have simple locations so such as parsing in
> GFF data.
>
> -jason
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Wed Nov  7 04:14:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Nov 2007 23:14:55 -0500
Subject: [Bioperl-l] lightweight sequence features
In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
References: <A4AC459E-A55E-4E36-A83D-F8A31C3A79B0@bioperl.org>
	<5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu>
Message-ID: <A021EE94-8DF8-467E-8303-E80127E3AEE2@bioperl.org>

Right - only for simple locations.  I've got a bunch more tests and  
fixes to put in.

I am hoping this can be fast replacement in the case where we're  
dealing with this "unflattened" data (i.e. GFF in FeatureIO &  
Gbrowse).  This is sort of a playground until I feel like it can  
really get  it tested a bit more.  I'll give an all clear when the  
dust settles in terms of the design if anyone wants to play/help.

-jason
On Nov 6, 2007, at 4:57 PM, Chris Fields wrote:

> Bravo!  I once benchmarked Location instance creation once and  
> found it contributed quite a bit of overhead so the speedup with  
> that and the use of arrays makes quite a bit of sense to me.
>
> You mention only simple locations; I'm guessing this doesn't handle  
> 'fuzzy' ends?  If it did I could see layering the feature data from  
> the get-go, so it could be used just about anywhere in the place of  
> SF::Generic.  Maybe something to test out in 1.7?
>
> chris
>
> On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote:
>
>> I started a branch for implementing and playing with lightweight
>> feature object. The branch is called 'lightweight_feature_branch'.
>>
>> Right now it is about 70% faster just in object creation based on
>> parsing features using Bio::Tools::GFF and swapping the types of
>> features that are created.  It uses arrays instead of hashes under
>> the hood.
>>
>> So the objects don't have locations under the hood.  My hope is if
>> this works okay we could use it for creating objects where we KNOW
>> the underlying features have simple locations so such as parsing in
>> GFF data.
>>
>> -jason
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From heikki at sanbi.ac.za  Wed Nov  7 10:05:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 7 Nov 2007 12:05:59 +0200
Subject: [Bioperl-l] Bio::Tools::Run::Mdust
Message-ID: <200711071205.59576.heikki@sanbi.ac.za>

Hi Donald,

I started using your Mdust module in bioperl-run and run into problems 
immediately.

* Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects,
  although the docs say otherwise
* Sequences are modified in place. That is really bad, because that 
  means that the user has to know to create a copy before 
  running Mdust on it.
* The docs say that you have to set MDUSTDIR envvar to tell the program 
  where to find the binary. That is actually optional if the 
  binary is on your path.
* The tests do not cover any of the options to the program


As a quick fix, I suggest that we:

* leave the current way of working for Bio::SeqI objects:
  sequence string is not masked but seqfeatures to that effect are added
* Modify run() to return the new masked sequence object when 
  the target is a Bio::PrimarySeqI.
* fix the documentation


After that it will be possible to simply write:

use Bio::Tools::Run::Mdust;
$mdust = Bio::Tools::Run::Mdust->new();
$seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI);


Are you happy for me to do this or do you want to do it yourself?


Yours,
	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    
    _/_/_/_/_/  heikki at_sanbi _ac _za    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Kevin.M.Brown at asu.edu  Wed Nov  7 18:04:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 7 Nov 2007 11:04:50 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>

I installed bioperl-ext from CVS, but can't figure out what else is
missing to utilize Bio::Tools::pSW.  The error I get from the example
script in the wiki is:

The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align)
has not been installed.
 Please read the install the bioperl-ext package

BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
Compilation failed in require at ./align_test.pl line 3.
BEGIN failed--compilation aborted at ./align_test.pl line 3.

In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
Align, but no Align.pm file.

I followed the directions in the wiki to install 1.5.2_102 (think I had
_100 installed previously).  Any thoughts on what I'm missing?


From jason at bioperl.org  Wed Nov  7 19:52:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 14:52:16 -0500
Subject: [Bioperl-l] (no subject)
Message-ID: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>

The array-based Bio::SeqFeature::Slim is only about 7% faster than  
Bio::Graphics::Feature so I suspect most of the speedup comes from  
removing location objects.

Generic     6.75        --      -37%      -41%
GraphicsF   4.26       58%        --       -7%
Slim        3.98       70%        7%        --

this is using code on the lightweight_feature_branch so
cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r  
lightweight_feature_branch -d core_lwf bioperl-live

http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl
and the GFF3 file I used to parse
http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2

-jason


From lstein at cshl.edu  Wed Nov  7 20:04:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Nov 2007 15:04:24 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>

I wonder if it is worth moving to the array-based version more generally,
then.

How does the array based feature object deal with tags?

Lincoln

On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:

> The array-based Bio::SeqFeature::Slim is only about 7% faster than
> Bio::Graphics::Feature so I suspect most of the speedup comes from removing
> location objects.
>
> Generic     6.75        --      -37%      -41%
> GraphicsF   4.26       58%        --       -7%
> Slim        3.98       70%        7%        --
>
> this is using code on the lightweight_feature_branch so
> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
> lightweight_feature_branch -d core_lwf bioperl-live
>
> http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/seqfeature_speed.pl>
> and the GFF3 file I used to parse
> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>
> -jason
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason at bioperl.org  Wed Nov  7 20:09:35 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 7 Nov 2007 15:09:35 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>

It uses hashes there so technically it is not entirely array based.

-jason
On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:

> I wonder if it is worth moving to the array-based version more  
> generally,
> then.
>
> How does the array based feature object deal with tags?
>
> Lincoln
>
> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>
>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>> Bio::Graphics::Feature so I suspect most of the speedup comes from  
>> removing
>> location objects.
>>
>> Generic     6.75        --      -37%      -41%
>> GraphicsF   4.26       58%        --       -7%
>> Slim        3.98       70%        7%        --
>>
>> this is using code on the lightweight_feature_branch so
>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>> lightweight_feature_branch -d core_lwf bioperl-live
>>
>> http://jason.open-bio.org/~jason/bioperl/ 
>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/ 
>> seqfeature_speed.pl>
>> and the GFF3 file I used to parse
>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http:// 
>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>
>> -jason
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Nov  7 21:12:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 15:12:35 -0600
Subject: [Bioperl-l] (no subject)
In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>

I can see preferring a lightweight simple SF over SF::Generic in the  
next BioPerl dev cycle.  I guess we would just layer split locations  
as simple sub-features/segments, typing when necessary?  That  
shouldn't be much more overhead than creating a layered Location::Split.

chris

On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:

> It uses hashes there so technically it is not entirely array based.
>
> -jason
> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>
>> I wonder if it is worth moving to the array-based version more
>> generally,
>> then.
>>
>> How does the array based feature object deal with tags?
>>
>> Lincoln
>>
>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>
>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>> removing
>>> location objects.
>>>
>>> Generic     6.75        --      -37%      -41%
>>> GraphicsF   4.26       58%        --       -7%
>>> Slim        3.98       70%        7%        --
>>>
>>> this is using code on the lightweight_feature_branch so
>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>
>>> http://jason.open-bio.org/~jason/bioperl/
>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>> seqfeature_speed.pl>
>>> and the GFF3 file I used to parse
>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>
>>> -jason
>>>
>>
>>
>>
>> -- 
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Wed Nov  7 23:19:15 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 7 Nov 2007 18:19:15 -0500
Subject: [Bioperl-l] lightweight features
In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
Message-ID: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>

It seems to me that there are applications where you're dealing with  
a huge number of features (such as GFF) and where therefore a  
lightweight object makes tremendous sense. But when you parse a  
genbank file, I'm not sure that's the bottleneck, unless maybe it's a  
large contig with lots of feature annotations.

I guess we'll ultimately want a way to control the type of feature  
being instantiated by a parser, e..g using a factory.

	-hilmar

On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:

> I can see preferring a lightweight simple SF over SF::Generic in the
> next BioPerl dev cycle.  I guess we would just layer split locations
> as simple sub-features/segments, typing when necessary?  That
> shouldn't be much more overhead than creating a layered  
> Location::Split.
>
> chris
>
> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>
>> It uses hashes there so technically it is not entirely array based.
>>
>> -jason
>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>
>>> I wonder if it is worth moving to the array-based version more
>>> generally,
>>> then.
>>>
>>> How does the array based feature object deal with tags?
>>>
>>> Lincoln
>>>
>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>
>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>> removing
>>>> location objects.
>>>>
>>>> Generic     6.75        --      -37%      -41%
>>>> GraphicsF   4.26       58%        --       -7%
>>>> Slim        3.98       70%        7%        --
>>>>
>>>> this is using code on the lightweight_feature_branch so
>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r
>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>
>>>> http://jason.open-bio.org/~jason/bioperl/
>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>> seqfeature_speed.pl>
>>>> and the GFF3 file I used to parse
>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>
>>>> -jason
>>>>
>>>
>>>
>>>
>>> -- 
>>> Lincoln D. Stein
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724
>>> (516) 367-8380 (voice)
>>> (516) 367-8389 (fax)
>>> FOR URGENT MESSAGES & SCHEDULING,
>>> PLEASE CONTACT MY ASSISTANT,
>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Nov  8 01:04:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 19:04:05 -0600
Subject: [Bioperl-l] lightweight features
In-Reply-To: <D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
References: <F5D7912C-97E3-465A-99FA-FAAF984C8748@bioperl.org>
	<6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com>
	<494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org>
	<219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu>
	<D6E7B859-928F-4527-9E87-14002EAA0A76@gmx.net>
Message-ID: <E541C60D-6741-4923-A71D-E14CE6FE176D@uiuc.edu>

I'm also thinking a factory is a good possibility; maybe something to  
take the place of FTHelper.

chris

On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote:

> It seems to me that there are applications where you're dealing with
> a huge number of features (such as GFF) and where therefore a
> lightweight object makes tremendous sense. But when you parse a
> genbank file, I'm not sure that's the bottleneck, unless maybe it's a
> large contig with lots of feature annotations.
>
> I guess we'll ultimately want a way to control the type of feature
> being instantiated by a parser, e..g using a factory.
>
> 	-hilmar
>
> On Nov 7, 2007, at 4:12 PM, Chris Fields wrote:
>
>> I can see preferring a lightweight simple SF over SF::Generic in the
>> next BioPerl dev cycle.  I guess we would just layer split locations
>> as simple sub-features/segments, typing when necessary?  That
>> shouldn't be much more overhead than creating a layered
>> Location::Split.
>>
>> chris
>>
>> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote:
>>
>>> It uses hashes there so technically it is not entirely array based.
>>>
>>> -jason
>>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote:
>>>
>>>> I wonder if it is worth moving to the array-based version more
>>>> generally,
>>>> then.
>>>>
>>>> How does the array based feature object deal with tags?
>>>>
>>>> Lincoln
>>>>
>>>> On Nov 7, 2007 2:52 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>>
>>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than
>>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from
>>>>> removing
>>>>> location objects.
>>>>>
>>>>> Generic     6.75        --      -37%      -41%
>>>>> GraphicsF   4.26       58%        --       -7%
>>>>> Slim        3.98       70%        7%        --
>>>>>
>>>>> this is using code on the lightweight_feature_branch so
>>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl  
>>>>> co -r
>>>>> lightweight_feature_branch -d core_lwf bioperl-live
>>>>>
>>>>> http://jason.open-bio.org/~jason/bioperl/
>>>>> seqfeature_speed.pl<http://jason.open-bio.org/%7Ejason/bioperl/
>>>>> seqfeature_speed.pl>
>>>>> and the GFF3 file I used to parse
>>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2<http://
>>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2>
>>>>>
>>>>> -jason
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Lincoln D. Stein
>>>> Cold Spring Harbor Laboratory
>>>> 1 Bungtown Road
>>>> Cold Spring Harbor, NY 11724
>>>> (516) 367-8380 (voice)
>>>> (516) 367-8389 (fax)
>>>> FOR URGENT MESSAGES & SCHEDULING,
>>>> PLEASE CONTACT MY ASSISTANT,
>>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Nov  8 04:45:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Nov 2007 22:45:26 -0600
Subject: [Bioperl-l] test please ignore
Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>


From cjfields at uiuc.edu  Thu Nov  8 15:50:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Nov 2007 09:50:02 -0600
Subject: [Bioperl-l] test please ignore
In-Reply-To: <47332534.5090205@bms.com>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
	<47332534.5090205@bms.com>
Message-ID: <D0ADF51D-92BE-4645-BB1C-564536732368@uiuc.edu>

And respond back!  Just checking the mail list; the open-bio wiki  
pages were down last night.

chris

On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote:

> Chris Fields wrote:
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> This is the best way to make everyone open this e-mail ;-)
> Stefan


From stefan.kirov at bms.com  Thu Nov  8 15:03:16 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 08 Nov 2007 10:03:16 -0500
Subject: [Bioperl-l] test please ignore
In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu>
Message-ID: <47332534.5090205@bms.com>

Chris Fields wrote:
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
This is the best way to make everyone open this e-mail ;-)
Stefan


From Kevin.M.Brown at asu.edu  Thu Nov  8 22:30:24 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Nov 2007 15:30:24 -0700
Subject: [Bioperl-l] Bio::Ext::Align?
In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org>
References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu>
	<20071108003638.GA5892@eniac.jgi-psf.org>
Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu>

OK, found the issue.  For whatever reason the Align.pm file is inside
the Align folder and so the package name and path don't match up once it
is installed.  This would cause it to have a name of
"Bio::Ext::Align::Align" instead of "Bio::Ext::Align".  Not sure why
this wasn't caught when I did "perl Makefile.pl && make && make test &&
make install" 

> -----Original Message-----
> From: Joel Martin [mailto:j_martin at lbl.gov] 
> Sent: Wednesday, November 07, 2007 5:37 PM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Bio::Ext::Align?
> 
> Hello,
>     Might be a side effect of fixing the other bioperl-ext package, 
> what steps exactly did this entail:
> 
> > I installed bioperl-ext from CVS, 
> 
> ?
> 
> you can probably bypass it at the moment by doing this after 
> unpacking the
> bioperl-ext package 
> 
> cd Bio/Ext/Align
> perl Makefile.PL
> make
> make test
> make install
> 
> and
> 
> cd Bio/Ext/HMM
> perl Makefile.PL
> make 
> make test
> make install
> 
> Joel
> 
> but can't figure out what else is
> > missing to utilize Bio::Tools::pSW.  The error I get from 
> the example
> > script in the wiki is:
> > 
> > The C-compiled engine for Smith Waterman alignments 
> (Bio::Ext::Align)
> > has not been installed.
> >  Please read the install the bioperl-ext package
> > 
> > BEGIN failed--compilation aborted at
> > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128.
> > Compilation failed in require at ./align_test.pl line 3.
> > BEGIN failed--compilation aborted at ./align_test.pl line 3.
> > 
> > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called
> > Align, but no Align.pm file.
> > 
> > I followed the directions in the wiki to install 1.5.2_102 
> (think I had
> > _100 installed previously).  Any thoughts on what I'm missing?
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From akarger at CGR.Harvard.edu  Fri Nov  9 14:53:02 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 9 Nov 2007 09:53:02 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
Message-ID: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>

When I tblastn ENSP00000349467 against the human genome, I get a few
hits on chr10, among which are:


 Score =  192 bits (487), Expect(2) = 5e-64
 Identities = 99/109 (90%), Positives = 99/109 (90%)
 Frame = +2

Query: 40
LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99
                L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F
VFDKDGNG
Sbjct: 71593562
LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741

Query: 100      YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148
                YIS  EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA
Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA
71593885


 Score = 75.1 bits (183), Expect(2) = 5e-64
 Identities = 36/43 (83%), Positives = 39/43 (90%)
 Frame = +1

Query: 1        MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43
                MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS  ++
Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575


As you can see from Sbjct lines, these two hits are basically
contiguous.
I was surprised to see that the bit scores and identities and alignment
lengths here are totally different but the expectation values are
identical. 

After a bit of grepping in the BLAST source, I found reference to "sum
segments" and "a collection [of] multiple distinct alignments with
asymmetric gaps between the alignments" and decided it was time to cry
for help. When does BLAST decide that two or more alignments belong
"together" and how does the affect the evalue? Is the evalue really
showing how good those two alignments combined are, despite the frame
shift? (It so happens that that's what I want.)

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

Thanks,

- Amir Karger
Research Computing
Life Sciences Division
Harvard University


From cjfields at uiuc.edu  Fri Nov  9 17:58:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Nov 2007 11:58:16 -0600
Subject: [Bioperl-l] GFF3loader and indexing
Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu>

Quick question: shouldn't the new Index attribute be passed on to  
seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping  
purposes (for instance, properly reloading dumped gff3 data)?  I'm  
testing out a feature editor using volvox.gff3 data in GBrowse and  
the mRNA features appear to drop this attribute once loaded:

Original data:

ctgA	example	gene	1050	9000	.	+	.	ID=EDEN;Name=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	ID=EDEN.1;Parent=EDEN;Name=EDEN. 
1;Note=Eden splice form 1;Index=1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=EDEN.1

partial gff3_string(1) output:

ctgA	example	gene	1050	9000	.	+	.	 
Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	Name=EDEN. 
1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=51;ID=52
...

chris


From David.Messina at sbc.su.se  Sat Nov 10 11:04:25 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 10 Nov 2007 12:04:25 +0100
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>

Hi Amir,

I don't have my BLAST book handy, and my memory is a little fuzzy, but I
think the Expect(2) you're seeing is the E-value based on both HSPs
combined. And I think this is why you see the same Expect value for both --
because it is shared between them (which sounds like what you wanted).

Again, this is just from memory, but I think this is an option that has to
be turned on rather than something which Blast decides to do on its own.


I don't know whether BioPerl reports this or not. Would you mind e-mailing
me a entire BLAST report as a sample? When I have some time I'd like to play
around with this a bit.

Thanks,
Dave


From sac at bioperl.org  Sat Nov 10 22:59:28 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Nov 2007 14:59:28 -0800
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>

The Bioperl blast parser should extract that value and you can obtain
it from an HSP object, via the HSPI::n() method, documented here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23

Dave's basically correct in his explanation. It's a result of the
application of sum statistics by the blast algorithm. You can read all
about it in Korf et al's BLAST book. Here's the relevant section:

http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1

Steve

On Nov 10, 2007 3:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Amir,
>
> I don't have my BLAST book handy, and my memory is a little fuzzy, but I
> think the Expect(2) you're seeing is the E-value based on both HSPs
> combined. And I think this is why you see the same Expect value for both --
> because it is shared between them (which sounds like what you wanted).
>
> Again, this is just from memory, but I think this is an option that has to
> be turned on rather than something which Blast decides to do on its own.
>
>
> I don't know whether BioPerl reports this or not. Would you mind e-mailing
> me a entire BLAST report as a sample? When I have some time I'd like to play
> around with this a bit.
>
> Thanks,
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bernd.web at gmail.com  Tue Nov 13 11:57:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 13 Nov 2007 12:57:04 +0100
Subject: [Bioperl-l] Panel link
Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com>

Hi,

Is it possible with Panel to provide javascript event handlers?
With -link we can provide hrefs as:
  -link => 'http://www.google.com/search?q=$description'
or use a coderef that returns a href.

However, I'd like to set-up links as:
<area .... href="#id" onmouseover="function()" onmouseout="function()">

Is this possible by default with Panel?

Regards,
Bernd


From akarger at CGR.Harvard.edu  Tue Nov 13 17:12:32 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:12:32 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
References: <Acgi4DovogbHeT/cS8WDzWOvfKrlzQ==>
	<B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A0165@huls5.nucleus.harvard.edu>

Thanks for the reply. I'm curious as to how BLAST decides to do this,
but not curious enough to buy the BLAST book.

If you want to see this, you could just tblastn the ENSP00000349467
sequence vs. the genome:
 
MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG
NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE
EVDEMIREADIDGDGQVNYEEFVQMMTAK
against the human genome at NCBI or locally.
 
I've attached the tblastn report for that protein, which includes the
results I quoted. (It was done as part of a blast of 150 proteins vs.
the genome.)
 
-Amir


________________________________

	From: dave at davemessina.com [mailto:dave at davemessina.com] On
Behalf Of Dave Messina
	Sent: Saturday, November 10, 2007 6:04 AM
	To: Amir Karger
	Cc: bioperl-l
	Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast
result?
	
	
	Hi Amir,

	I don't have my BLAST book handy, and my memory is a little
fuzzy, but I think the Expect(2) you're seeing is the E-value based on
both HSPs combined. And I think this is why you see the same Expect
value for both -- because it is shared between them (which sounds like
what you wanted). 

	Again, this is just from memory, but I think this is an option
that has to be turned on rather than something which Blast decides to do
on its own.

	 
	I don't know whether BioPerl reports this or not. Would you mind
e-mailing me a entire BLAST report as a sample? When I have some time
I'd like to play around with this a bit.

	Thanks,
	Dave


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ENSP00000349467_tblastn.txt.gz
Type: application/x-gzip
Size: 9755 bytes
Desc: ENSP00000349467_tblastn.txt.gz
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071113/f8853e76/attachment-0004.gz>

From akarger at CGR.Harvard.edu  Tue Nov 13 17:30:52 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 13 Nov 2007 12:30:52 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
Message-ID: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>

> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf 
> Of Steve Chervitz
> 
> The Bioperl blast parser should extract that value and you can obtain
> it from an HSP object, via the HSPI::n() method, documented here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
io/Search/HSP/HSPI.html#POD23

As I mentioned in my email:

And does anyone know off-hand if Bioperl will tell me when situations
like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine
would help, but I just get a bunch of empty strings for that, whether or
not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is
undef.)

And the docs for n() actually say, "This value is not defined with NCBI
Blast2 with gapping" although they don't say why. Which may explain why,
when I ran the following code on the blast result I included in my last
email, I got empty values for all of the n's. (Why is n() undefined for
gapped blast if I'm getting n's in my results from that blast?)

use warnings;
use strict;
use Bio::SearchIO;

my $blast_out = $ARGV[0];
my $in = new Bio::SearchIO(-format => 'blast',
                            -file   => $blast_out,
                            -report_type => 'tblastn');

print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
Evalue)), "\n";
while(my $query = $in->next_result) {
    while(my $subject = $query->next_hit) {
        while (my $hsp = $subject->next_hsp) {
            print join("\t",
                $query->query_name,
                $hsp->start("query"),
                $hsp->end("query"),
                $hsp->strand("hit"),
                $subject->name,
                $hsp->start("hit"),
                $hsp->end("hit"),
                $subject->frame,
                $hsp->n,
                $hsp->evalue,
            ),"\n";
        }
    }
}

> Dave's basically correct in his explanation. It's a result of the
> application of sum statistics by the blast algorithm. You can read all
> about it in Korf et al's BLAST book. Here's the relevant section:

[snip]

Thanks,

-Amir


From cjfields at uiuc.edu  Tue Nov 13 17:42:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Nov 2007 11:42:07 -0600
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0719FE13@huls5.nucleus.harvard.edu>
	<628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com>
	<8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com>
	<B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>

Amir,

Can you file this as a bug?  Dave mentioned he would look into it but  
I think it warrants tracking to make sure it gets fixed:

http://www.bioperl.org/wiki/Bugs

Attach the example BLAST report from your last post to the report.   
BTW, I wonder how this appears in XML output?

chris

On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:

>> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
>> Of Steve Chervitz
>>
>> The Bioperl blast parser should extract that value and you can obtain
>> it from an HSP object, via the HSPI::n() method, documented here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/Search/HSP/HSPI.html#POD23
>
> As I mentioned in my email:
>
> And does anyone know off-hand if Bioperl will tell me when situations
> like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> subroutine
> would help, but I just get a bunch of empty strings for that,  
> whether or
> not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> {"_n"} is
> undef.)
>
> And the docs for n() actually say, "This value is not defined with  
> NCBI
> Blast2 with gapping" although they don't say why. Which may explain  
> why,
> when I ran the following code on the blast result I included in my  
> last
> email, I got empty values for all of the n's. (Why is n() undefined  
> for
> gapped blast if I'm getting n's in my results from that blast?)
>
> use warnings;
> use strict;
> use Bio::SearchIO;
>
> my $blast_out = $ARGV[0];
> my $in = new Bio::SearchIO(-format => 'blast',
>                             -file   => $blast_out,
>                             -report_type => 'tblastn');
>
> print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N
> Evalue)), "\n";
> while(my $query = $in->next_result) {
>     while(my $subject = $query->next_hit) {
>         while (my $hsp = $subject->next_hsp) {
>             print join("\t",
>                 $query->query_name,
>                 $hsp->start("query"),
>                 $hsp->end("query"),
>                 $hsp->strand("hit"),
>                 $subject->name,
>                 $hsp->start("hit"),
>                 $hsp->end("hit"),
>                 $subject->frame,
>                 $hsp->n,
>                 $hsp->evalue,
>             ),"\n";
>         }
>     }
> }
>
>> Dave's basically correct in his explanation. It's a result of the
>> application of sum statistics by the blast algorithm. You can read  
>> all
>> about it in Korf et al's BLAST book. Here's the relevant section:
>
> [snip]
>
> Thanks,
>
> -Amir
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lskatz at gatech.edu  Wed Nov 14 01:27:45 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Tue, 13 Nov 2007 20:27:45 -0500
Subject: [Bioperl-l] chromatogram
Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>

Hi,
I would like to know how to draw a chromatogram file.  Does anyone
have any sample code where you read in an scf file and create a jpeg
or other image file?
For that matter, I want to be able to customize these images with base
calls if possible.  I really appreciate the help, so thanks!

-- 
Lee Katz


From mvrmakam at yahoo.com  Wed Nov 14 09:52:13 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST)
Subject: [Bioperl-l] Installing Bioperl on Windows XP
Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com>

Hi,

I am encountering a problem while installing Bioperl on Windows XP.  I have installed ActivePerl version 5.8.8.822.  I am using Perl Package Manager GUI.  Also, I am following the instructions outlined for installing Bioperl on Windows.  I am getting an error.  The error is as follows:

 Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com')

I do not know how to overcome this problem.  The other issue is when I type bioperl in the search box I do not see any packages of bioperl.  I do not know what the problem is.  If anyone of you could guide me through the installation process I would appreciate it.

Thanks,

Roshan


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/


From cjfields at uiuc.edu  Wed Nov 14 14:02:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Nov 2007 08:02:05 -0600
Subject: [Bioperl-l] Installing Bioperl on Windows XP
In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com>
References: <235423.72586.qm@web33703.mail.mud.yahoo.com>
Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu>

The instructions are pretty specific:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Note the section on adding new repositories.  As for the PPM  
connection error, it's more than likely an error with the default  
address but it isn't bioperl-related; maybe answers lie here:

http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- 
faq2.html#ppm_repositories

chris

On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote:

> Hi,
>
> I am encountering a problem while installing Bioperl on Windows  
> XP.  I have installed ActivePerl version 5.8.8.822.  I am using  
> Perl Package Manager GUI.  Also, I am following the instructions  
> outlined for installing Bioperl on Windows.  I am getting an  
> error.  The error is as follows:
>
>  Downloading ActiveState Package Repository packlist ... failed 500  
> Can't connect to ppm4.activestate.com:80 (Bad hostname  
> 'ppm4.activestate.com')
>
> I do not know how to overcome this problem.  The other issue is  
> when I type bioperl in the search box I do not see any packages of  
> bioperl.  I do not know what the problem is.  If anyone of you  
> could guide me through the installation process I would appreciate it.
>
> Thanks,
>
> Roshan


From reshetovdenis at gmail.com  Wed Nov 14 17:28:40 2007
From: reshetovdenis at gmail.com (Denis Reshetov)
Date: Wed, 14 Nov 2007 20:28:40 +0300
Subject: [Bioperl-l] how to load all genomes
Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>

Dear BioPerl-db Creators,

I`m trying to load all genomes from NCBI ftp site
to my BioSql database using common script load_seqdatabase.pl

But it seems very slow. Let me know what is the better way to do it?

Thank you very much,

Denis.


From barry.moore at genetics.utah.edu  Wed Nov 14 19:18:29 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Wed, 14 Nov 2007 12:18:29 -0700
Subject: [Bioperl-l] how to load all genomes
In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com>
Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu>

Denis,

You might be interested in this thread from a couple years ago.  I  
was having a similar problem, that I eventually resolved.   
Unfortunately the reason for the problem and the solution weren't  
entirely clear, but you may be able to glean some ideas from it.   
Also, you may have already done this, but I suggest searching the  
archives from this list because it seems like this comes up every now  
and then, so there may be other postings similar to the one I'm  
sending you that could help you.

http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html

Finally, if you are still having problems, you'll want to include a  
few more details about your situation.  What DB are you using, have  
you preloaded taxonomy data etc. How fast/slow are your sequences  
loading?

Barry

On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote:

> Dear BioPerl-db Creators,
>
> I`m trying to load all genomes from NCBI ftp site
> to my BioSql database using common script load_seqdatabase.pl
>
> But it seems very slow. Let me know what is the better way to do it?
>
> Thank you very much,
>
> Denis.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 19:57:49 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 08:57:49 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>

Here's my trace viewer.
Please excuse my dodgy Perl and debugging code as it's still under
development  :-)


Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz


------------------------------------------------------------------------
------------------

#!perl -w
use ABI;

use GD::Graph::lines;
use GD::Graph::colour;
use GD::Graph::Data;

use Data::Dumper;


use Getopt::Long;

use constant HEIGHT => 300;

GetOptions ('h|height=i' => \$HEIGHT,
            'f|file=s' => \$FILE,
            'o|out=s' => \$OUTFILE,
            'l|left=s' => \$LEFT_SEQ,
            'r|right=s' => \$RIGHT_SEQ,
            's|size=i' => \$SIZE,
            ) || die <<USAGE;
Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
test2.png -l actacgtacgta -r atgatcgtacgtac
or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
--out test2.png --left actacgtacgta --right atgatcgtacgtac

Options:
--height <pixels> Set height of image (${\HEIGHT} pixels default)
--file <trace file-name> Filename for the ABI trace file
--out <output file-name> Filename for the generated .png image
--left <left end sequence>
--right <right end sequence>
--size <size of clipped fasta sequence>

Parse an ABI trace file and render a PNG image.
See http://search.cpan.org/dist/ABI/ABI.pm
    or
    http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
USAGE

my $height = $HEIGHT || HEIGHT;
my $file = $FILE;
my $outfile = $OUTFILE;

my $abi = ABI->new(-file=> $file);

my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"

my @base_calls = $abi->get_base_calls(); # Get the base calls
my $sequence =$abi->get_sequence();
@bp = split(//, $sequence);


# iterate over array
$size = $abi->get_trace_length();
for ($i=0,$count = 0; $i<$size; $i++) {
     if(grep(/\b$i\b/, @base_calls)){
       $bases[$i] = $bp[$count];
       $count++;
     }else{
       $bases[$i] = ' ';
     }
}

# create the data. see GD::Graph::Data for details of the format
my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );

my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
   $graph->set(
   title => $abi->get_sample_name(),
#	y_max_value => $abi->get_max_trace() + 50,
	x_max_value => $abi->get_trace_length(),
	t_margin => 5,
    b_margin => 5,
    l_margin => 5,
    r_margin => 5,
    x_ticks => 0,
    text_space => 0,
	line_width 	=> 1,
	transparent	=> 0,
	b_margin => 30,
	t_margin => 35,
	x_plot_values => 0,
	interlaced => 1,
);

# allocate some colors for drawing the bases
#use colors same as Chromas
$graph->set( dclrs => [ qw( green blue black red pink) ] );

#plot the data
my $gd = $graph->plot(\@data);

$black = $gd->colorAllocate(0,0,0);       # A
$blue = $gd->colorAllocate(0,0,255);      # C
$red = $gd->colorAllocate(255,0,0);       # G
$green = $gd->colorAllocate(0,255,0);     # T
$magenta =$gd->colorAllocate(255,0,255);  # N
$white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
$gray = $gd->colorAllocate(210,210,210);
%colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
$magenta, " ",$white);

#$start_base = index(lc($sequence),lc($LEFT_SEQ));
$start_base = find_match($sequence,$LEFT_SEQ);

#if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
$end_base = find_match($sequence,$RIGHT_SEQ, 1);
if($end_base){
 $end_base += length($RIGHT_SEQ);
}


# get the coords of the features on the image
@coords = $graph->get_hotspot(1);
$size = @coords;
$printed_num = 1;
$basecount = 0;
$numstoprint = $basecount - $start_base;

# draw the colored bases and scale at top and bottom of image
for ($i=0,$count = 0; $i<$size; $i++) {
  $c = $coords[$i];
  (undef, $xs, undef, undef, undef, undef) = @$c;
  $base = $bases[$i];
  if($base =~ /[ACGTN]/){
   if($start_base - 1 == $basecount){$start_base_coord = $xs;}
   if($end_base - 1 == $basecount){$end_base_coord = $xs;}
   if(defined($SIZE) && $start_base+$SIZE -2 ==
$basecount){$end_base_coord_by_size = $xs;}
   $basecount++;
   $numstoprint++;
   $printed_num = 0;
  }
  # print the bases top and bottom
  $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
  $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});

  # print scale
  if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
    if($LEFT_SEQ){
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }else{
      $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
      $gd->string(GD::Font->Small(),$xs,$height -
15,$numstoprint,$black);
      $printed_num = 1;
    }
  }
  $top_right_corner = $xs;
}


# only draw the clipped region if the calculated size is + or - 6bp
#if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
- $SIZE >= -6 ){
# draw the clipped regions as gray
  #if LEFT_SEQ supplied and a match found
  if($LEFT_SEQ && $start_base > 0){
     $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
33,$red);
     $clipped = 1;
  }
 #if RIGHT_SEQ supplied and a match found
 if($RIGHT_SEQ && $end_base > 0){
   print join("\t", ($end_base)),"\n";
   $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
33,$gray);
   $clipped = 1;
 }
 #if no RIGHT_SEQ supplied or no match found, use left match + seq
length
 if(!$RIGHT_SEQ || $end_base < 0){
 
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
t - 33,$blue);
  $clipped = 1;
 }
 

# set height based on max trace within clipped region
   $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);

  # need to re-plot the data over the grayed out area
  $graph->plot(\@data) if $clipped;
  $gd->filledRectangle(0,0,$top_right_corner,33,$white);

#}

#print the graph
open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
binmode OUT;
print OUT $gd->png;
close OUT;


sub find_match{
  my ($sequence,$query,$last) = @_;
  return -1 if length($query) < 6;
  my($odds, $evens, $ones, $twos, $threes, $match_pos);
    # try exact match
    $match_pos = do_regex($query, $sequence,$last); return $match_pos if
$match_pos > 0;

    # try matching every second base starting from the second base e.g.
it will be .C.T.C.G.etc
    map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
($query=~m/(\w\w)/g);
    $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
if $match_pos > 0;

    # try matching every third base starting from the first base e.g. it
will be C..T..G..T etc
    map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
$threes.="..$3"} ($query =~m/(\w\w\w)/g);
    $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
if $match_pos > 0;
    $match_pos = do_regex($threes, $sequence,$last); return $match_pos
if $match_pos > 0;

     # not found
     return -1;
}

sub do_regex(){
	my ($query,$sequence,$last)= @_;
    #print "trying $query \n";
    my $result = -1;
      $result = pos($sequence)-length($query)+1 if $last && ($sequence
=~ m/.*($query)/ig);
      $result = pos($sequence)-length($query)+1 if($sequence =~
m/.*?($query)/ig);
    return $result;
}

------------------------------------------------------------------------
------------------

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 20:47:20 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 15:47:20 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
Message-ID: <473B5ED8.1090201@mail.nih.gov>

I guess you need chromatogram from SCF. I can't help in that. ABI.pm is 
not in Bioperl distribution. But to make the record straight, you can 
use one step chromatogram drawing in SVG from ABI file using my BioSVG
module, available at:

http://www.bioinformatics.org/~malay/biosvg/

Malay


Smithies, Russell wrote:
> Here's my trace viewer.
> Please excuse my dodgy Perl and debugging code as it's still under
> development  :-)
> 
> 
> Russell Smithies
> 
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> 
> Invermay  Research Centre
> Puddle Alley, 
> Mosgiel, 
> New Zealand
> T  +64 3 489 3809   
> F  +64 3 489 9174  
> www.agresearch.co.nz
> 
> 
> ------------------------------------------------------------------------
> ------------------
> 
> #!perl -w
> use ABI;
> 
> use GD::Graph::lines;
> use GD::Graph::colour;
> use GD::Graph::Data;
> 
> use Data::Dumper;
> 
> 
> use Getopt::Long;
> 
> use constant HEIGHT => 300;
> 
> GetOptions ('h|height=i' => \$HEIGHT,
>             'f|file=s' => \$FILE,
>             'o|out=s' => \$OUTFILE,
>             'l|left=s' => \$LEFT_SEQ,
>             'r|right=s' => \$RIGHT_SEQ,
>             's|size=i' => \$SIZE,
>             ) || die <<USAGE;
> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> test2.png -l actacgtacgta -r atgatcgtacgtac
> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> --out test2.png --left actacgtacgta --right atgatcgtacgtac
> 
> Options:
> --height <pixels> Set height of image (${\HEIGHT} pixels default)
> --file <trace file-name> Filename for the ABI trace file
> --out <output file-name> Filename for the generated .png image
> --left <left end sequence>
> --right <right end sequence>
> --size <size of clipped fasta sequence>
> 
> Parse an ABI trace file and render a PNG image.
> See http://search.cpan.org/dist/ABI/ABI.pm
>     or
>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> USAGE
> 
> my $height = $HEIGHT || HEIGHT;
> my $file = $FILE;
> my $outfile = $OUTFILE;
> 
> my $abi = ABI->new(-file=> $file);
> 
> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> 
> my @base_calls = $abi->get_base_calls(); # Get the base calls
> my $sequence =$abi->get_sequence();
> @bp = split(//, $sequence);
> 
> 
> 
> # iterate over array
> $size = $abi->get_trace_length();
> for ($i=0,$count = 0; $i<$size; $i++) {
>      if(grep(/\b$i\b/, @base_calls)){
>        $bases[$i] = $bp[$count];
>        $count++;
>      }else{
>        $bases[$i] = ' ';
>      }
> }
> 
> # create the data. see GD::Graph::Data for details of the format
> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> 
> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>    $graph->set(
>    title => $abi->get_sample_name(),
> #	y_max_value => $abi->get_max_trace() + 50,
> 	x_max_value => $abi->get_trace_length(),
> 	t_margin => 5,
>     b_margin => 5,
>     l_margin => 5,
>     r_margin => 5,
>     x_ticks => 0,
>     text_space => 0,
> 	line_width 	=> 1,
> 	transparent	=> 0,
> 	b_margin => 30,
> 	t_margin => 35,
> 	x_plot_values => 0,
> 	interlaced => 1,
> );
> 
> # allocate some colors for drawing the bases
> #use colors same as Chromas
> $graph->set( dclrs => [ qw( green blue black red pink) ] );
> 
> #plot the data
> my $gd = $graph->plot(\@data);
> 
> $black = $gd->colorAllocate(0,0,0);       # A
> $blue = $gd->colorAllocate(0,0,255);      # C
> $red = $gd->colorAllocate(255,0,0);       # G
> $green = $gd->colorAllocate(0,255,0);     # T
> $magenta =$gd->colorAllocate(255,0,255);  # N
> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> $gray = $gd->colorAllocate(210,210,210);
> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> $magenta, " ",$white);
> 
> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> $start_base = find_match($sequence,$LEFT_SEQ);
> 
> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> if($end_base){
>  $end_base += length($RIGHT_SEQ);
> }
> 
> 
> # get the coords of the features on the image
> @coords = $graph->get_hotspot(1);
> $size = @coords;
> $printed_num = 1;
> $basecount = 0;
> $numstoprint = $basecount - $start_base;
> 
> # draw the colored bases and scale at top and bottom of image
> for ($i=0,$count = 0; $i<$size; $i++) {
>   $c = $coords[$i];
>   (undef, $xs, undef, undef, undef, undef) = @$c;
>   $base = $bases[$i];
>   if($base =~ /[ACGTN]/){
>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>    if(defined($SIZE) && $start_base+$SIZE -2 ==
> $basecount){$end_base_coord_by_size = $xs;}
>    $basecount++;
>    $numstoprint++;
>    $printed_num = 0;
>   }
>   # print the bases top and bottom
>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>   $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base});
> 
>   # print scale
>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>     if($LEFT_SEQ){
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }else{
>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>       $gd->string(GD::Font->Small(),$xs,$height -
> 15,$numstoprint,$black);
>       $printed_num = 1;
>     }
>   }
>   $top_right_corner = $xs;
> }
> 
> 
> 
> # only draw the clipped region if the calculated size is + or - 6bp
> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base)
> - $SIZE >= -6 ){
> # draw the clipped regions as gray
>   #if LEFT_SEQ supplied and a match found
>   if($LEFT_SEQ && $start_base > 0){
>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> 33,$red);
>      $clipped = 1;
>   }
>  #if RIGHT_SEQ supplied and a match found
>  if($RIGHT_SEQ && $end_base > 0){
>    print join("\t", ($end_base)),"\n";
>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height -
> 33,$gray);
>    $clipped = 1;
>  }
>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> length
>  if(!$RIGHT_SEQ || $end_base < 0){
>  
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> t - 33,$blue);
>   $clipped = 1;
>  }
>  
> 
> 
> # set height based on max trace within clipped region
>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() + 50);
> 
>   # need to re-plot the data over the grayed out area
>   $graph->plot(\@data) if $clipped;
>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> 
> #}
> 
> #print the graph
> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> binmode OUT;
> print OUT $gd->png;
> close OUT;
> 
> 
> sub find_match{
>   my ($sequence,$query,$last) = @_;
>   return -1 if length($query) < 6;
>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>     # try exact match
>     $match_pos = do_regex($query, $sequence,$last); return $match_pos if
> $match_pos > 0;
> 
>     # try matching every second base starting from the second base e.g.
> it will be .C.T.C.G.etc
>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> ($query=~m/(\w\w)/g);
>     $match_pos = do_regex($odds, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($evens, $sequence,$last);  return $match_pos
> if $match_pos > 0;
> 
>     # try matching every third base starting from the first base e.g. it
> will be C..T..G..T etc
>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>     $match_pos = do_regex($ones, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($twos, $sequence,$last);   return $match_pos
> if $match_pos > 0;
>     $match_pos = do_regex($threes, $sequence,$last); return $match_pos
> if $match_pos > 0;
> 
>      # not found
>      return -1;
> }
> 
> sub do_regex(){
> 	my ($query,$sequence,$last)= @_;
>     #print "trying $query \n";
>     my $result = -1;
>       $result = pos($sequence)-length($query)+1 if $last && ($sequence
> =~ m/.*($query)/ig);
>       $result = pos($sequence)-length($query)+1 if($sequence =~
> m/.*?($query)/ig);
>     return $result;
> }
> 
> ------------------------------------------------------------------------
> ------------------
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Lee Katz
>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] chromatogram
>>
>> Hi,
>> I would like to know how to draw a chromatogram file.  Does anyone
>> have any sample code where you read in an scf file and create a jpeg
>> or other image file?
>> For that matter, I want to be able to customize these images with base
>> calls if possible.  I really appreciate the help, so thanks!
>>
>> --
>> Lee Katz
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Malay K Basu
www.malaybasu.net


From Russell.Smithies at agresearch.co.nz  Wed Nov 14 20:58:19 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 15 Nov 2007 09:58:19 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B5ED8.1090201@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>


We try and avoid SVG at all costs as installing plugins and viewers in a
locked down corporate environment can be more trouble than it's worth
whereas generating .png images works for any browser with no extras
required.
We actually call this trace drawing code from Python which then
generates webpages with the embedded image. 
It also means we don't need to licence, install and maintain a trace
viewer like Chromas.
:-)

Russell

> -----Original Message-----
> From: Malay [mailto:mbasu at mail.nih.gov]
> Sent: Thursday, 15 November 2007 9:47 a.m.
> To: Smithies, Russell
> Cc: Lee Katz; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] chromatogram
> 
> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
is
> not in Bioperl distribution. But to make the record straight, you can
> use one step chromatogram drawing in SVG from ABI file using my BioSVG
> module, available at:
> 
> http://www.bioinformatics.org/~malay/biosvg/
> 
> Malay
> 
> 
> 
> 
> Smithies, Russell wrote:
> > Here's my trace viewer.
> > Please excuse my dodgy Perl and debugging code as it's still under
> > development  :-)
> >
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> > #!perl -w
> > use ABI;
> >
> > use GD::Graph::lines;
> > use GD::Graph::colour;
> > use GD::Graph::Data;
> >
> > use Data::Dumper;
> >
> >
> > use Getopt::Long;
> >
> > use constant HEIGHT => 300;
> >
> > GetOptions ('h|height=i' => \$HEIGHT,
> >             'f|file=s' => \$FILE,
> >             'o|out=s' => \$OUTFILE,
> >             'l|left=s' => \$LEFT_SEQ,
> >             'r|right=s' => \$RIGHT_SEQ,
> >             's|size=i' => \$SIZE,
> >             ) || die <<USAGE;
> > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
> > test2.png -l actacgtacgta -r atgatcgtacgtac
> > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
> > --out test2.png --left actacgtacgta --right atgatcgtacgtac
> >
> > Options:
> > --height <pixels> Set height of image (${\HEIGHT} pixels default)
> > --file <trace file-name> Filename for the ABI trace file
> > --out <output file-name> Filename for the generated .png image
> > --left <left end sequence>
> > --right <right end sequence>
> > --size <size of clipped fasta sequence>
> >
> > Parse an ABI trace file and render a PNG image.
> > See http://search.cpan.org/dist/ABI/ABI.pm
> >     or
> >     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
> > USAGE
> >
> > my $height = $HEIGHT || HEIGHT;
> > my $file = $FILE;
> > my $outfile = $OUTFILE;
> >
> > my $abi = ABI->new(-file=> $file);
> >
> > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
> > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
> > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
> > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
> >
> > my @base_calls = $abi->get_base_calls(); # Get the base calls
> > my $sequence =$abi->get_sequence();
> > @bp = split(//, $sequence);
> >
> >
> >
> > # iterate over array
> > $size = $abi->get_trace_length();
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >      if(grep(/\b$i\b/, @base_calls)){
> >        $bases[$i] = $bp[$count];
> >        $count++;
> >      }else{
> >        $bases[$i] = ' ';
> >      }
> > }
> >
> > # create the data. see GD::Graph::Data for details of the format
> > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
> >
> > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
> >    $graph->set(
> >    title => $abi->get_sample_name(),
> > #	y_max_value => $abi->get_max_trace() + 50,
> > 	x_max_value => $abi->get_trace_length(),
> > 	t_margin => 5,
> >     b_margin => 5,
> >     l_margin => 5,
> >     r_margin => 5,
> >     x_ticks => 0,
> >     text_space => 0,
> > 	line_width 	=> 1,
> > 	transparent	=> 0,
> > 	b_margin => 30,
> > 	t_margin => 35,
> > 	x_plot_values => 0,
> > 	interlaced => 1,
> > );
> >
> > # allocate some colors for drawing the bases
> > #use colors same as Chromas
> > $graph->set( dclrs => [ qw( green blue black red pink) ] );
> >
> > #plot the data
> > my $gd = $graph->plot(\@data);
> >
> > $black = $gd->colorAllocate(0,0,0);       # A
> > $blue = $gd->colorAllocate(0,0,255);      # C
> > $red = $gd->colorAllocate(255,0,0);       # G
> > $green = $gd->colorAllocate(0,255,0);     # T
> > $magenta =$gd->colorAllocate(255,0,255);  # N
> > $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
> > $gray = $gd->colorAllocate(210,210,210);
> > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
> > $magenta, " ",$white);
> >
> > #$start_base = index(lc($sequence),lc($LEFT_SEQ));
> > $start_base = find_match($sequence,$LEFT_SEQ);
> >
> > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
> > $end_base = find_match($sequence,$RIGHT_SEQ, 1);
> > if($end_base){
> >  $end_base += length($RIGHT_SEQ);
> > }
> >
> >
> > # get the coords of the features on the image
> > @coords = $graph->get_hotspot(1);
> > $size = @coords;
> > $printed_num = 1;
> > $basecount = 0;
> > $numstoprint = $basecount - $start_base;
> >
> > # draw the colored bases and scale at top and bottom of image
> > for ($i=0,$count = 0; $i<$size; $i++) {
> >   $c = $coords[$i];
> >   (undef, $xs, undef, undef, undef, undef) = @$c;
> >   $base = $bases[$i];
> >   if($base =~ /[ACGTN]/){
> >    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
> >    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
> >    if(defined($SIZE) && $start_base+$SIZE -2 ==
> > $basecount){$end_base_coord_by_size = $xs;}
> >    $basecount++;
> >    $numstoprint++;
> >    $printed_num = 0;
> >   }
> >   # print the bases top and bottom
> >   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
> >   $gd->string(GD::Font->Small(),$xs,$height -
30,$base,$colors{$base});
> >
> >   # print scale
> >   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
> >     if($LEFT_SEQ){
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }else{
> >       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
> >       $gd->string(GD::Font->Small(),$xs,$height -
> > 15,$numstoprint,$black);
> >       $printed_num = 1;
> >     }
> >   }
> >   $top_right_corner = $xs;
> > }
> >
> >
> >
> > # only draw the clipped region if the calculated size is + or - 6bp
> > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
$start_base)
> > - $SIZE >= -6 ){
> > # draw the clipped regions as gray
> >   #if LEFT_SEQ supplied and a match found
> >   if($LEFT_SEQ && $start_base > 0){
> >      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
> > 33,$red);
> >      $clipped = 1;
> >   }
> >  #if RIGHT_SEQ supplied and a match found
> >  if($RIGHT_SEQ && $end_base > 0){
> >    print join("\t", ($end_base)),"\n";
> >    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
-
> > 33,$gray);
> >    $clipped = 1;
> >  }
> >  #if no RIGHT_SEQ supplied or no match found, use left match + seq
> > length
> >  if(!$RIGHT_SEQ || $end_base < 0){
> >
> >
$gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
> > t - 33,$blue);
> >   $clipped = 1;
> >  }
> >
> >
> >
> > # set height based on max trace within clipped region
> >    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
50);
> >
> >   # need to re-plot the data over the grayed out area
> >   $graph->plot(\@data) if $clipped;
> >   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
> >
> > #}
> >
> > #print the graph
> > open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
> > binmode OUT;
> > print OUT $gd->png;
> > close OUT;
> >
> >
> > sub find_match{
> >   my ($sequence,$query,$last) = @_;
> >   return -1 if length($query) < 6;
> >   my($odds, $evens, $ones, $twos, $threes, $match_pos);
> >     # try exact match
> >     $match_pos = do_regex($query, $sequence,$last); return
$match_pos if
> > $match_pos > 0;
> >
> >     # try matching every second base starting from the second base
e.g.
> > it will be .C.T.C.G.etc
> >     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
> > ($query=~m/(\w\w)/g);
> >     $match_pos = do_regex($odds, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($evens, $sequence,$last);  return
$match_pos
> > if $match_pos > 0;
> >
> >     # try matching every third base starting from the first base
e.g. it
> > will be C..T..G..T etc
> >     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
> > $threes.="..$3"} ($query =~m/(\w\w\w)/g);
> >     $match_pos = do_regex($ones, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($twos, $sequence,$last);   return
$match_pos
> > if $match_pos > 0;
> >     $match_pos = do_regex($threes, $sequence,$last); return
$match_pos
> > if $match_pos > 0;
> >
> >      # not found
> >      return -1;
> > }
> >
> > sub do_regex(){
> > 	my ($query,$sequence,$last)= @_;
> >     #print "trying $query \n";
> >     my $result = -1;
> >       $result = pos($sequence)-length($query)+1 if $last &&
($sequence
> > =~ m/.*($query)/ig);
> >       $result = pos($sequence)-length($query)+1 if($sequence =~
> > m/.*?($query)/ig);
> >     return $result;
> > }
> >
> >
------------------------------------------------------------------------
> > ------------------
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-
> >> bio.org] On Behalf Of Lee Katz
> >> Sent: Wednesday, 14 November 2007 2:28 p.m.
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] chromatogram
> >>
> >> Hi,
> >> I would like to know how to draw a chromatogram file.  Does anyone
> >> have any sample code where you read in an scf file and create a
jpeg
> >> or other image file?
> >> For that matter, I want to be able to customize these images with
base
> >> calls if possible.  I really appreciate the help, so thanks!
> >>
> >> --
> >> Lee Katz
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> =============================================================
> ==========
> > Attention: The information contained in this message and/or
attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
privileged
> > material. Any review, retransmission, dissemination or other use of,
or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> >
> =============================================================
> ==========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> --
> Malay K Basu
> www.malaybasu.net

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mbasu at mail.nih.gov  Wed Nov 14 21:04:25 2007
From: mbasu at mail.nih.gov (Malay)
Date: Wed, 14 Nov 2007 16:04:25 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
Message-ID: <473B62D9.8010004@mail.nih.gov>

You don't need any plugin. Firefox natively can show most of the SVG files.

-Malay

Smithies, Russell wrote:
> We try and avoid SVG at all costs as installing plugins and viewers in a
> locked down corporate environment can be more trouble than it's worth
> whereas generating .png images works for any browser with no extras
> required.
> We actually call this trace drawing code from Python which then
> generates webpages with the embedded image. 
> It also means we don't need to licence, install and maintain a trace
> viewer like Chromas.
> :-)
> 
> Russell
> 
>> -----Original Message-----
>> From: Malay [mailto:mbasu at mail.nih.gov]
>> Sent: Thursday, 15 November 2007 9:47 a.m.
>> To: Smithies, Russell
>> Cc: Lee Katz; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] chromatogram
>>
>> I guess you need chromatogram from SCF. I can't help in that. ABI.pm
> is
>> not in Bioperl distribution. But to make the record straight, you can
>> use one step chromatogram drawing in SVG from ABI file using my BioSVG
>> module, available at:
>>
>> http://www.bioinformatics.org/~malay/biosvg/
>>
>> Malay
>>
>>
>>
>>
>> Smithies, Russell wrote:
>>> Here's my trace viewer.
>>> Please excuse my dodgy Perl and debugging code as it's still under
>>> development  :-)
>>>
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>> #!perl -w
>>> use ABI;
>>>
>>> use GD::Graph::lines;
>>> use GD::Graph::colour;
>>> use GD::Graph::Data;
>>>
>>> use Data::Dumper;
>>>
>>>
>>> use Getopt::Long;
>>>
>>> use constant HEIGHT => 300;
>>>
>>> GetOptions ('h|height=i' => \$HEIGHT,
>>>             'f|file=s' => \$FILE,
>>>             'o|out=s' => \$OUTFILE,
>>>             'l|left=s' => \$LEFT_SEQ,
>>>             'r|right=s' => \$RIGHT_SEQ,
>>>             's|size=i' => \$SIZE,
>>>             ) || die <<USAGE;
>>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o
>>> test2.png -l actacgtacgta -r atgatcgtacgtac
>>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1
>>> --out test2.png --left actacgtacgta --right atgatcgtacgtac
>>>
>>> Options:
>>> --height <pixels> Set height of image (${\HEIGHT} pixels default)
>>> --file <trace file-name> Filename for the ABI trace file
>>> --out <output file-name> Filename for the generated .png image
>>> --left <left end sequence>
>>> --right <right end sequence>
>>> --size <size of clipped fasta sequence>
>>>
>>> Parse an ABI trace file and render a PNG image.
>>> See http://search.cpan.org/dist/ABI/ABI.pm
>>>     or
>>>     http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
>>> USAGE
>>>
>>> my $height = $HEIGHT || HEIGHT;
>>> my $file = $FILE;
>>> my $outfile = $OUTFILE;
>>>
>>> my $abi = ABI->new(-file=> $file);
>>>
>>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
>>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C"
>>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
>>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T"
>>>
>>> my @base_calls = $abi->get_base_calls(); # Get the base calls
>>> my $sequence =$abi->get_sequence();
>>> @bp = split(//, $sequence);
>>>
>>>
>>>
>>> # iterate over array
>>> $size = $abi->get_trace_length();
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>      if(grep(/\b$i\b/, @base_calls)){
>>>        $bases[$i] = $bp[$count];
>>>        $count++;
>>>      }else{
>>>        $bases[$i] = ' ';
>>>      }
>>> }
>>>
>>> # create the data. see GD::Graph::Data for details of the format
>>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, );
>>>
>>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height);
>>>    $graph->set(
>>>    title => $abi->get_sample_name(),
>>> #	y_max_value => $abi->get_max_trace() + 50,
>>> 	x_max_value => $abi->get_trace_length(),
>>> 	t_margin => 5,
>>>     b_margin => 5,
>>>     l_margin => 5,
>>>     r_margin => 5,
>>>     x_ticks => 0,
>>>     text_space => 0,
>>> 	line_width 	=> 1,
>>> 	transparent	=> 0,
>>> 	b_margin => 30,
>>> 	t_margin => 35,
>>> 	x_plot_values => 0,
>>> 	interlaced => 1,
>>> );
>>>
>>> # allocate some colors for drawing the bases
>>> #use colors same as Chromas
>>> $graph->set( dclrs => [ qw( green blue black red pink) ] );
>>>
>>> #plot the data
>>> my $gd = $graph->plot(\@data);
>>>
>>> $black = $gd->colorAllocate(0,0,0);       # A
>>> $blue = $gd->colorAllocate(0,0,255);      # C
>>> $red = $gd->colorAllocate(255,0,0);       # G
>>> $green = $gd->colorAllocate(0,255,0);     # T
>>> $magenta =$gd->colorAllocate(255,0,255);  # N
>>> $white = $gd->colorAllocate(255,255,255);  # undefined aren't drawn
>>> $gray = $gd->colorAllocate(210,210,210);
>>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N",
>>> $magenta, " ",$white);
>>>
>>> #$start_base = index(lc($sequence),lc($LEFT_SEQ));
>>> $start_base = find_match($sequence,$LEFT_SEQ);
>>>
>>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){
>>> $end_base = find_match($sequence,$RIGHT_SEQ, 1);
>>> if($end_base){
>>>  $end_base += length($RIGHT_SEQ);
>>> }
>>>
>>>
>>> # get the coords of the features on the image
>>> @coords = $graph->get_hotspot(1);
>>> $size = @coords;
>>> $printed_num = 1;
>>> $basecount = 0;
>>> $numstoprint = $basecount - $start_base;
>>>
>>> # draw the colored bases and scale at top and bottom of image
>>> for ($i=0,$count = 0; $i<$size; $i++) {
>>>   $c = $coords[$i];
>>>   (undef, $xs, undef, undef, undef, undef) = @$c;
>>>   $base = $bases[$i];
>>>   if($base =~ /[ACGTN]/){
>>>    if($start_base - 1 == $basecount){$start_base_coord = $xs;}
>>>    if($end_base - 1 == $basecount){$end_base_coord = $xs;}
>>>    if(defined($SIZE) && $start_base+$SIZE -2 ==
>>> $basecount){$end_base_coord_by_size = $xs;}
>>>    $basecount++;
>>>    $numstoprint++;
>>>    $printed_num = 0;
>>>   }
>>>   # print the bases top and bottom
>>>   $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base});
>>>   $gd->string(GD::Font->Small(),$xs,$height -
> 30,$base,$colors{$base});
>>>   # print scale
>>>   if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){
>>>     if($LEFT_SEQ){
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }else{
>>>       $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black);
>>>       $gd->string(GD::Font->Small(),$xs,$height -
>>> 15,$numstoprint,$black);
>>>       $printed_num = 1;
>>>     }
>>>   }
>>>   $top_right_corner = $xs;
>>> }
>>>
>>>
>>>
>>> # only draw the clipped region if the calculated size is + or - 6bp
>>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base -
> $start_base)
>>> - $SIZE >= -6 ){
>>> # draw the clipped regions as gray
>>>   #if LEFT_SEQ supplied and a match found
>>>   if($LEFT_SEQ && $start_base > 0){
>>>      $gd->filledRectangle(38,35,$start_base_coord - 1,$height -
>>> 33,$red);
>>>      $clipped = 1;
>>>   }
>>>  #if RIGHT_SEQ supplied and a match found
>>>  if($RIGHT_SEQ && $end_base > 0){
>>>    print join("\t", ($end_base)),"\n";
>>>    $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height
> -
>>> 33,$gray);
>>>    $clipped = 1;
>>>  }
>>>  #if no RIGHT_SEQ supplied or no match found, use left match + seq
>>> length
>>>  if(!$RIGHT_SEQ || $end_base < 0){
>>>
>>>
> $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh
>>> t - 33,$blue);
>>>   $clipped = 1;
>>>  }
>>>
>>>
>>>
>>> # set height based on max trace within clipped region
>>>    $graph->set(	y_max_value => 3000);#$abi->get_max_trace() +
> 50);
>>>   # need to re-plot the data over the grayed out area
>>>   $graph->plot(\@data) if $clipped;
>>>   $gd->filledRectangle(0,0,$top_right_corner,33,$white);
>>>
>>> #}
>>>
>>> #print the graph
>>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n";
>>> binmode OUT;
>>> print OUT $gd->png;
>>> close OUT;
>>>
>>>
>>> sub find_match{
>>>   my ($sequence,$query,$last) = @_;
>>>   return -1 if length($query) < 6;
>>>   my($odds, $evens, $ones, $twos, $threes, $match_pos);
>>>     # try exact match
>>>     $match_pos = do_regex($query, $sequence,$last); return
> $match_pos if
>>> $match_pos > 0;
>>>
>>>     # try matching every second base starting from the second base
> e.g.
>>> it will be .C.T.C.G.etc
>>>     map {m/(\w)(\w)/g;  $odds.="$1."; $evens.=".$2"}
>>> ($query=~m/(\w\w)/g);
>>>     $match_pos = do_regex($odds, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($evens, $sequence,$last);  return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>     # try matching every third base starting from the first base
> e.g. it
>>> will be C..T..G..T etc
>>>     map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2.";
>>> $threes.="..$3"} ($query =~m/(\w\w\w)/g);
>>>     $match_pos = do_regex($ones, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($twos, $sequence,$last);   return
> $match_pos
>>> if $match_pos > 0;
>>>     $match_pos = do_regex($threes, $sequence,$last); return
> $match_pos
>>> if $match_pos > 0;
>>>
>>>      # not found
>>>      return -1;
>>> }
>>>
>>> sub do_regex(){
>>> 	my ($query,$sequence,$last)= @_;
>>>     #print "trying $query \n";
>>>     my $result = -1;
>>>       $result = pos($sequence)-length($query)+1 if $last &&
> ($sequence
>>> =~ m/.*($query)/ig);
>>>       $result = pos($sequence)-length($query)+1 if($sequence =~
>>> m/.*?($query)/ig);
>>>     return $result;
>>> }
>>>
>>>
> ------------------------------------------------------------------------
>>> ------------------
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-
>>>> bio.org] On Behalf Of Lee Katz
>>>> Sent: Wednesday, 14 November 2007 2:28 p.m.
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] chromatogram
>>>>
>>>> Hi,
>>>> I would like to know how to draw a chromatogram file.  Does anyone
>>>> have any sample code where you read in an scf file and create a
> jpeg
>>>> or other image file?
>>>> For that matter, I want to be able to customize these images with
> base
>>>> calls if possible.  I really appreciate the help, so thanks!
>>>>
>>>> --
>>>> Lee Katz
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =============================================================
>> ==========
>>> Attention: The information contained in this message and/or
> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
> privileged
>>> material. Any review, retransmission, dissemination or other use of,
> or
>>> taking of any action in reliance upon, this information by persons
> or
>>> entities other than the intended recipients is prohibited by
> AgResearch
>>> Limited. If you have received this message in error, please notify
> the
>>> sender immediately.
>>>
>> =============================================================
>> ==========
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Malay K Basu
>> www.malaybasu.net
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================


-- 
Malay K Basu
www.malaybasu.net


From tomboy at cs.huji.ac.il  Thu Nov 15 02:43:43 2007
From: tomboy at cs.huji.ac.il (Tomer Hertz)
Date: Wed, 14 Nov 2007 18:43:43 -0800
Subject: [Bioperl-l] problems in stalling bio perl
Message-ID: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>

hi
when I try to install bioperl I get the following error message:

hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
$ perl Build.PL
Can't find file lib/Module/Build.pm to determine version at
/usr/lib/perl5/site_
perl/5.8/Module/Build/Base.pm line 950.
can you please help. I have tried reinstalling the build command and that
does not seem to help as well.

many thanks
--Tomer

-- 
--------------------------------------------------------------------------------
Tomer Hertz
Postdoctoral Researcher
Machine Learning and Applied Statistics
Microsoft Research
One Microsoft Way, Redmond, WA, 98052, USA

Homepage: www.cs.huji.ac.il/~tomboy
Email: hertz at microsoft dot com
Tel: (425)-421-8313               Fax: (425) 936-7329
--------------------------------------------------------------------------------


From lskatz at gatech.edu  Thu Nov 15 13:24:02 2007
From: lskatz at gatech.edu (Lee Katz)
Date: Thu, 15 Nov 2007 08:24:02 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <473B62D9.8010004@mail.nih.gov>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>

Thank you all.
Are you all sure in that there is no way to go from an scf to an image
though?  I do have abi files, but I am relying on Phred output for
base calls for other things and I want to stay consistent.  This means
that if I use the fasta files that I get from Phred in another part of
my program, I need to use the scf files it produces.

If this is not possible, do you know if drawing an scf is in the works?  Thanks.

-- 
Lee Katz
http://www.lskatz.com


From cain.cshl at gmail.com  Thu Nov 15 14:21:26 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 15 Nov 2007 09:21:26 -0500
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <1195136486.2785.12.camel@localhost.localdomain>

Hi Lee,

Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses
Bio::SCF to draw trace files onto a Bio::Graphics::Panel.  Bio::SCF is
not part of bioperl, so you have to get it from CPAN and it depends on
the Staden io-lib package, so you'll need that too.  You can get GBrowse
from http://www.gmod.org/gbrowse , and you can look at the tutorial for
more information on configuring the trace glyph.

Scott


On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote:
> Thank you all.
> Are you all sure in that there is no way to go from an scf to an image
> though?  I do have abi files, but I am relying on Phred output for
> base calls for other things and I want to stay consistent.  This means
> that if I use the fasta files that I get from Phred in another part of
> my program, I need to use the scf files it produces.
> 
> If this is not possible, do you know if drawing an scf is in the works?  Thanks.
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From bosborne11 at verizon.net  Thu Nov 15 14:18:05 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 09:18:05 -0500
Subject: [Bioperl-l] problems in stalling bio perl
In-Reply-To: <a87cf5d80711141843u3ba8a67dv7ff1b4838cdd9971@mail.gmail.com>
Message-ID: <C361BF4D.103D8%bosborne11@verizon.net>

Tomer,

Interesting. When I used Cygwin I always worked entirely within the C:
drive, it looks like you're executing the script from the E: drive. Is
Cygwin installed in C:/cygwin? You can see what I'm getting at, it's
possible that you need to set $PERL5LIB to something like
/cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say?

Brian O.


On 11/14/07 9:43 PM, "Tomer Hertz" <tomboy at cs.huji.ac.il> wrote:

> hi
> when I try to install bioperl I get the following error message:
> 
> hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102
> $ perl Build.PL
> Can't find file lib/Module/Build.pm to determine version at
> /usr/lib/perl5/site_
> perl/5.8/Module/Build/Base.pm line 950.
> can you please help. I have tried reinstalling the build command and that
> does not seem to help as well.
> 
> many thanks
> --Tomer


From bernd.web at gmail.com  Thu Nov 15 15:26:42 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 16:26:42 +0100
Subject: [Bioperl-l] Graphics::Panel
Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>

Hi,

Has someone been able to access '$description' for the production of
imagemaps with Graphics::Panel?
The map below does not print the "title" tag at all, '$description'
seems not available, although for the tracks ($panel->add_track) it is
available.
$map = $panel->create_web_map($mapname, $linkrule, '$description');

Replacing '$description' with a coderef for the titletag does work, if
I use the code below
my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };


I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }


Regards,
Bernd


From luciap at sas.upenn.edu  Thu Nov 15 15:44:21 2007
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Thu, 15 Nov 2007 10:44:21 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>

Hi
I was asked this question recently
and it occurred to me I must be doing things inefficiently
To produce gff file I was using SeqIO to parse the required fields, then
according to the conventions just printing out whatever was required tab
delimited, which is easy

but if I wanted to generate a genbank file, extracting features from a gff file
and a plain fasta file it was more complicated
is there support for gff in bioperl now?
anyone can contribute with  smart way to go from/to gff, genebank and embl?

thanks very much

Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From lstein at cshl.edu  Thu Nov 15 17:38:04 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Nov 2007 12:38:04 -0500
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>

Depending on which Feature object you use, you may have to use a tag named
"note" instead of "description".

Lincoln

On Nov 15, 2007 10:26 AM, Bernd Web <bernd.web at gmail.com> wrote:

> Hi,
>
> Has someone been able to access '$description' for the production of
> imagemaps with Graphics::Panel?
> The map below does not print the "title" tag at all, '$description'
> seems not available, although for the tracks ($panel->add_track) it is
> available.
> $map = $panel->create_web_map($mapname, $linkrule, '$description');
>
> Replacing '$description' with a coderef for the titletag does work, if
> I use the code below
> my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
>
>
> I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bernd.web at gmail.com  Thu Nov 15 18:03:19 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 15 Nov 2007 19:03:19 +0100
Subject: [Bioperl-l] Graphics::Panel
In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com>
	<6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com>
Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com>

On Nov 15, 2007 6:38 PM, Lincoln Stein <lstein at cshl.edu> wrote:
> Depending on which Feature object you use, you may have to use a tag named
> "note" instead of "description".
>
> Lincoln
>
>
>
> On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote:
> >
> >
> >
> > Hi,
> >
> > Has someone been able to access '$description' for the production of
> > imagemaps with Graphics::Panel?
> > The map below does not print the "title" tag at all, '$description'
> > seems not available, although for the tracks ($panel->add_track) it is
> > available.
> > $map = $panel->create_web_map($mapname, $linkrule, '$description');
> >
> > Replacing '$description' with a coderef for the titletag does work, if
> > I use the code below
> > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] };
> >
> >
> > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 }
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Thu Nov 15 18:43:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Nov 2007 12:43:02 -0600
Subject: [Bioperl-l] What's the best way to produce gff files from
	genebank/embl formats?
In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu>
Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>

There are currently many ways to get what you want, but not all are  
consistent (particularly re: GFF3).  We are aiming for more  
consistent, compliant GFF/GTF output in the next developer series  
(1.7) of Bioperl.

You can try using bp_genbank2gff or bp_genbank2gff3 (both in the  
scripts directory); these are probably the most common way when  
working directly from a seq record.  Bio::Tools::GFF is the most  
commonly used class though I'm unsure of it's status for GFF3  
output.  From within a Bio::SeqI you can call write_gff() (currently  
not very flexible) or from the SeqFeature itself gff_string().   
Bio::Graphics::Feature has the additional method gff3_string().   
Bio::FeatureIO is also an option, though I would consider it very  
experimental (it will likely undergo significant revision in the next  
bioperl dev series).

Any others anyone can think of, maybe non-BioPerl related as well?

chris

On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:

> Hi
> I was asked this question recently
> and it occurred to me I must be doing things inefficiently
> To produce gff file I was using SeqIO to parse the required fields,  
> then
> according to the conventions just printing out whatever was  
> required tab
> delimited, which is easy
>
> but if I wanted to generate a genbank file, extracting features  
> from a gff file
> and a plain fasta file it was more complicated
> is there support for gff in bioperl now?
> anyone can contribute with  smart way to go from/to gff, genebank  
> and embl?
>
> thanks very much
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Nov 15 19:19:41 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 15 Nov 2007 14:19:41 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu>
Message-ID: <C36205FD.103EA%bosborne11@verizon.net>

Chris,

There's also a genbank2gff3.PLS script in the GMOD package (
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
revision=1.5&view=markup). However, it has not been modified for a couple of
years, it may not be the "preferred" script.

See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information
on using Bioperl's bp_genbank2gff3 script.

Brian O.


On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> There are currently many ways to get what you want, but not all are
> consistent (particularly re: GFF3).  We are aiming for more
> consistent, compliant GFF/GTF output in the next developer series
> (1.7) of Bioperl.
> 
> You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> scripts directory); these are probably the most common way when
> working directly from a seq record.  Bio::Tools::GFF is the most
> commonly used class though I'm unsure of it's status for GFF3
> output.  From within a Bio::SeqI you can call write_gff() (currently
> not very flexible) or from the SeqFeature itself gff_string().
> Bio::Graphics::Feature has the additional method gff3_string().
> Bio::FeatureIO is also an option, though I would consider it very
> experimental (it will likely undergo significant revision in the next
> bioperl dev series).
> 
> Any others anyone can think of, maybe non-BioPerl related as well?
> 
> chris
> 
> On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> 
>> Hi
>> I was asked this question recently
>> and it occurred to me I must be doing things inefficiently
>> To produce gff file I was using SeqIO to parse the required fields,
>> then
>> according to the conventions just printing out whatever was
>> required tab
>> delimited, which is easy
>> 
>> but if I wanted to generate a genbank file, extracting features
>> from a gff file
>> and a plain fasta file it was more complicated
>> is there support for gff in bioperl now?
>> anyone can contribute with  smart way to go from/to gff, genebank
>> and embl?
>> 
>> thanks very much
>> 
>> Lucia Peixoto
>> Department of Biology,SAS
>> University of Pennsylvania
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Thu Nov 15 22:31:28 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 16 Nov 2007 11:31:28 +1300
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>

Just to add to this, does anyone have any code for reading .sff 'traces'
from 454 sequences?

Thanx,

Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Lee Katz
> Sent: Wednesday, 14 November 2007 2:28 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] chromatogram
> 
> Hi,
> I would like to know how to draw a chromatogram file.  Does anyone
> have any sample code where you read in an scf file and create a jpeg
> or other image file?
> For that matter, I want to be able to customize these images with base
> calls if possible.  I really appreciate the help, so thanks!
> 
> --
> Lee Katz
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From torsten.seemann at infotech.monash.edu.au  Fri Nov 16 01:13:22 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 16 Nov 2007 12:13:22 +1100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139BAD@imail.agresearch.co.nz>
Message-ID: <a79f6a4b0711151713g26905bc6g5b19202b992f4e08@mail.gmail.com>

> Just to add to this, does anyone have any code for reading .sff 'traces'
> from 454 sequences?

The .SFF files can be manipulated using the SFF tools which 454
distribute with their result data. eg. "sffinfo 454AllContigs.sff"
will list all the reads with the original flowgram values etc.
However, the SFF tools are i386.Linux binaries, so not really a
portable solution.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From mvrmakam at yahoo.com  Fri Nov 16 03:04:55 2007
From: mvrmakam at yahoo.com (Roshan Makam)
Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST)
Subject: [Bioperl-l] Problem with installing bioperl on Windows
Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com>

Hi,

I have installed Perl Package Manager ver 5.8.8.822 on windows XP.  I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View.  However, I am not able to see any packages in the view box.  Can anyone help me in this matter.

Roshan


      ____________________________________________________________________________________
Get easy, one-click access to your favorites. 
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs 


From David.Messina at sbc.su.se  Fri Nov 16 08:33:04 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 16 Nov 2007 09:33:04 +0100
Subject: [Bioperl-l] chromatogram
In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06139837@imail.agresearch.co.nz>
	<473B5ED8.1090201@mail.nih.gov>
	<D5DBA313349A4B458528BE63B387F36C06139886@imail.agresearch.co.nz>
	<473B62D9.8010004@mail.nih.gov>
	<7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com>
Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com>

> If this is not possible, do you know if drawing an scf is in the
> works?  Thanks.
>


One non-BioPerl solution is 4peaks:
http://mekentosj.com/4peaks/

Mac only, but really great software. I'm also a fan of their Papers journal
article PDF library program.


Dave


From neetisomaiya at gmail.com  Mon Nov 19 06:11:49 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 19 Nov 2007 11:41:49 +0530
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>

Hi,

I am using Bio::SeqIO for parsing KEGG gene ent files.

A part of my code is

foreach my $key ( $ac->get_all_annotation_keys() )
                                {
                                        if($key eq "dblink")
                                        {
                                                my %values =
$ac->get_Annotations($key);
                                                foreach my $value (
keys(%values ))
                                                {
                                                        print "\n*****VALUE
$value*****\n";
                                                }
                                        }
                                 }

Here not all dblinks present in the actual file get parsed. For eg, in the
data below,
ENTRY       116064            CDS       H.sapiens
NAME        LRRC58
DEFINITION  leucine rich repeat containing 58
POSITION    3q13.33
MOTIF       Pfam: SdiA-regulated LRR_1
            PROSITE: LEU_RICH
DBLINKS     NCBI-GI: 153792305
            NCBI-GeneID: 116064
            HGNC: 26968
            Ensembl: ENSG00000163428
            UniProt: Q96CX6

Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE,
but doesnt give me HGNC and UniProt. For other entries it gives me other
combinations of dbs.

Can anyone help me with this. Why is this happenning? I have no clue.

Thanks and Regards,
Neeti.
-- 
-Neeti
Even my blood says, B positive


From johnston at biochem.ucl.ac.uk  Mon Nov 19 11:44:59 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT)
Subject: [Bioperl-l] blast database names
Message-ID: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>

Hello,

Is there a list of the possible database names for -data =>
$dbname in RemoteBlast somwhere?

Cheers,
Cass


From cjfields at uiuc.edu  Mon Nov 19 13:44:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 07:44:46 -0600
Subject: [Bioperl-l] blast database names
In-Reply-To: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
Message-ID: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>

Here's a recent list (don't know if it's up-to-date):

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote:

> Hello,
>
> Is there a list of the possible database names for -data =>
> $dbname in RemoteBlast somwhere?
>
> Cheers,
> Cass
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Nov 19 14:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Nov 2007 08:33:46 -0600
Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently
In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com>
Message-ID: <F81EBCF4-20AD-486C-A9EC-301FE9475504@uiuc.edu>

It makes sense in the light that you're (erroneously) using a hash:

    my %values = $ac->get_Annotations($key);

This assigns key-value pairs of DBLink => DBLink; you don't see an  
error b/c the number of links happens to be even (I get 8) but you  
would if the number of links returned is odd (missing value for key  
error or something along those lines).  So when you call:

    foreach my $value (keys(%values)) {....}

you only get half of the DBLinks.  You should use an array:

    my @values = $ac->get_Annotations($key);
    foreach my $value (@values) {
       print $value->as_text,"\n";
    }

Note the loop change; Bio::Annotation are no longer operator  
overloaded so your print statement wouldn't work in a bioperl 1.6 world.

chris

On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote:

> Hi,
>
> I am using Bio::SeqIO for parsing KEGG gene ent files.
>
> A part of my code is
>
> foreach my $key ( $ac->get_all_annotation_keys() )
>                                 {
>                                         if($key eq "dblink")
>                                         {
>                                                 my %values =
> $ac->get_Annotations($key);
>                                                 foreach my $value (
> keys(%values ))
>                                                 {
>                                                         print  
> "\n*****VALUE
> $value*****\n";
>                                                 }
>                                         }
>                                  }
>
> Here not all dblinks present in the actual file get parsed. For eg,  
> in the
> data below,
> ENTRY       116064            CDS       H.sapiens
> NAME        LRRC58
> DEFINITION  leucine rich repeat containing 58
> POSITION    3q13.33
> MOTIF       Pfam: SdiA-regulated LRR_1
>             PROSITE: LEU_RICH
> DBLINKS     NCBI-GI: 153792305
>             NCBI-GeneID: 116064
>             HGNC: 26968
>             Ensembl: ENSG00000163428
>             UniProt: Q96CX6
>
> Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and  
> PROSITE,
> but doesnt give me HGNC and UniProt. For other entries it gives me  
> other
> combinations of dbs.
>
> Can anyone help me with this. Why is this happenning? I have no clue.
>
> Thanks and Regards,
> Neeti.
> -- 
> -Neeti
> Even my blood says, B positive
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From akarger at CGR.Harvard.edu  Mon Nov 19 15:38:26 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 19 Nov 2007 10:38:26 -0500
Subject: [Bioperl-l] What does Expect(2) mean in a blast result?
In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
References: <B9182BFF5B004245BABC12956EA6322E071A017D@huls5.nucleus.harvard.edu>
	<3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E0747C64A@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> Sent: Tuesday, November 13, 2007 12:42 PM
> To: Amir Karger
> Cc: Steve Chervitz; Dave Messina; bioperl-l
> Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result?
> 
> Amir,
> 
> Can you file this as a bug?  

Done.

http://bugzilla.open-bio.org/show_bug.cgi?id=2399

> Dave mentioned he would look 
> into it but  
> I think it warrants tracking to make sure it gets fixed:
> 
> http://www.bioperl.org/wiki/Bugs
> 
> Attach the example BLAST report from your last post to the report.   
> BTW, I wonder how this appears in XML output?
> 
> chris
> 
> On Nov 13, 2007, at 11:30 AM, Amir Karger wrote:
> 
> >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf
> >> Of Steve Chervitz
> >>
> >> The Bioperl blast parser should extract that value and you 
> can obtain
> >> it from an HSP object, via the HSPI::n() method, documented here:
> >>
> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> > io/Search/HSP/HSPI.html#POD23
> >
> > As I mentioned in my email:
> >
> > And does anyone know off-hand if Bioperl will tell me when 
> situations
> > like this happen? I thought the Bio::Search::HSP::BlastHSP::n  
> > subroutine
> > would help, but I just get a bunch of empty strings for that,  
> > whether or
> > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> 
> > {"_n"} is
> > undef.)
> >
> > And the docs for n() actually say, "This value is not defined with  
> > NCBI
> > Blast2 with gapping" although they don't say why. Which may 
> explain  
> > why,
> > when I ran the following code on the blast result I included in my  
> > last
> > email, I got empty values for all of the n's. (Why is n() 
> undefined  
> > for
> > gapped blast if I'm getting n's in my results from that blast?)
> >
> > use warnings;
> > use strict;
> > use Bio::SearchIO;
> >
> > my $blast_out = $ARGV[0];
> > my $in = new Bio::SearchIO(-format => 'blast',
> >                             -file   => $blast_out,
> >                             -report_type => 'tblastn');
> >
> > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart 
> Send Frame N
> > Evalue)), "\n";
> > while(my $query = $in->next_result) {
> >     while(my $subject = $query->next_hit) {
> >         while (my $hsp = $subject->next_hsp) {
> >             print join("\t",
> >                 $query->query_name,
> >                 $hsp->start("query"),
> >                 $hsp->end("query"),
> >                 $hsp->strand("hit"),
> >                 $subject->name,
> >                 $hsp->start("hit"),
> >                 $hsp->end("hit"),
> >                 $subject->frame,
> >                 $hsp->n,
> >                 $hsp->evalue,
> >             ),"\n";
> >         }
> >     }
> > }
> >
> >> Dave's basically correct in his explanation. It's a result of the
> >> application of sum statistics by the blast algorithm. You 
> can read  
> >> all
> >> about it in Korf et al's BLAST book. Here's the relevant section:
> >
> > [snip]
> >
> > Thanks,
> >
> > -Amir
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> 


From aaron.j.mackey at gsk.com  Mon Nov 19 16:50:53 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 19 Nov 2007 11:50:53 -0500
Subject: [Bioperl-l] What's the best way to produce gff files from
 genebank/embl formats?
In-Reply-To: <C36205FD.103EA%bosborne11@verizon.net>
Message-ID: <OF0C0B3E21.611ACEBE-ON85257398.005C01A8-85257398.005C8D95@gsk.com>

While Lucia's subject line asked for genbank2gff, her message actually 
asked the reverse (gff + fasta -> genbank).

e.g. pretend you had to prepare a genome annotation for submission to 
GenBank ...

and no, I don't know of any generalized gff2genbank script out there ...

Lucia, the SeqIO::genbank module will write GenBank format, but you have 
to get all the bits and bobs together in the right way, i.e. construct the 
various AnnotationCollections and SeqFeatures (with SplitLocations for 
exons, CDS, etc.) that a GenBank record expects.  One way to do this is to 
start with a template GenBank file that you'd like to mimic, strip it down 
to only two gene models, use SeqIO::genbank to read it into memory, and 
then step through the object with the Perl debugger to see how it is 
composed.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM:

> Chris,
> 
> There's also a genbank2gff3.PLS script in the GMOD package (
> 
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
> revision=1.5&view=markup). However, it has not been modified for a 
couple of
> years, it may not be the "preferred" script.
> 
> See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
> http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more 
information
> on using Bioperl's bp_genbank2gff3 script.
> 
> Brian O.
> 
> 
> On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > There are currently many ways to get what you want, but not all are
> > consistent (particularly re: GFF3).  We are aiming for more
> > consistent, compliant GFF/GTF output in the next developer series
> > (1.7) of Bioperl.
> > 
> > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> > scripts directory); these are probably the most common way when
> > working directly from a seq record.  Bio::Tools::GFF is the most
> > commonly used class though I'm unsure of it's status for GFF3
> > output.  From within a Bio::SeqI you can call write_gff() (currently
> > not very flexible) or from the SeqFeature itself gff_string().
> > Bio::Graphics::Feature has the additional method gff3_string().
> > Bio::FeatureIO is also an option, though I would consider it very
> > experimental (it will likely undergo significant revision in the next
> > bioperl dev series).
> > 
> > Any others anyone can think of, maybe non-BioPerl related as well?
> > 
> > chris
> > 
> > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> > 
> >> Hi
> >> I was asked this question recently
> >> and it occurred to me I must be doing things inefficiently
> >> To produce gff file I was using SeqIO to parse the required fields,
> >> then
> >> according to the conventions just printing out whatever was
> >> required tab
> >> delimited, which is easy
> >> 
> >> but if I wanted to generate a genbank file, extracting features
> >> from a gff file
> >> and a plain fasta file it was more complicated
> >> is there support for gff in bioperl now?
> >> anyone can contribute with  smart way to go from/to gff, genebank
> >> and embl?
> >> 
> >> thanks very much
> >> 
> >> Lucia Peixoto
> >> Department of Biology,SAS
> >> University of Pennsylvania
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From johnston at biochem.ucl.ac.uk  Mon Nov 19 14:46:03 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT)
Subject: [Bioperl-l] blast database names
In-Reply-To: <B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
References: <Pine.LNX.4.58.0711191140410.3141@localhost.localdomain>
	<B565B00E-7D09-4486-824D-0ED685E99FD7@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0711191441010.3141@localhost.localdomain>

On Mon, 19 Nov 2007, Chris Fields wrote:

> Here's a recent list (don't know if it's up-to-date):
>
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

Thanks. Perhaps I missed something in the docs, but I don't think I've
quite understood how this is supposed to work. I'm trying to blast primer
sequences against the ref genome sequence. Should I be using ref_contig?
How can I limit the blast to a single species?

cheers,
Cass.


From Kevin.M.Brown at asu.edu  Mon Nov 19 18:31:38 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 19 Nov 2007 11:31:38 -0700
Subject: [Bioperl-l] pSW vs dpAlign
Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu>

I was able to get the Ext package installed, just had to copy the
Align.pm file up one directory from where it was being put by the
installer.  Now I have a technician trying to use pSW (Bio::Tools::pSW)
and it appears to have been last updated back in '99 and seems to lack
certain methods to get things out of the alignment like the score.  The
test.pl script that Bio::Ext comes with actually uses
Bio::Tools::dpAlign.  Is dpAlign the replacement for pSW?


From bernd.web at gmail.com  Wed Nov 21 16:42:40 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 17:42:40 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de>
	<470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>

Hi Russell,

I came across your question. At first I thought all was well on my
system, but indeed I also have these colouring problems.
I noted that scrore in the bgcolor callback gets a different value!
Printing score during hit parsing($hit->raw_score) gives the same
score as -description
my $score = $feature->score; However, printing score in the bgcolor
sub gives 2573!
All scores in the bgcolor routine all different and higher than the
real scores. Were you able to solve this colouring issue?

Regards,
Bernd

> Hi all,
> I'm using a modified version of Lincoln's tutorial
> (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> to give a similar image to that from NCBI but for some reason, my
> colours are coming out wrong (see attached example)
> They seem to be off by one but I can't see why.
>
> Any ideas?
>
> I can't be certain but I think it's only started doing this since our
> BLAST upgrade to 2.2.17 a few weeks ago.
>
> Here's the colouring code:
> ------------------------------------------------------------------------
> -------
> my $track = $panel->add_track(
>                               -glyph       => 'segments',
>                               -label       => 1,
>                               -connector   => 'dashed',
>                               -bgcolor     => sub {
>                                 my $feature = shift;
>                                 my $score = $feature->score;
>                         return 'red'       if $score >= 200;
>                                     return 'fuchsia' if $score >= 80;
>                                     return 'lime'      if $score >= 50;
>                         return 'blue'      if $score >= 40;
>                                     return 'black';
>                                },
>                               -font2color  => 'gray',
>                               -sort_order  => 'high_score',
>                               -description => sub {
>                                 my $feature = shift;
>                                 return unless
> $feature->has_tag('description');
>                                 my ($description) =
> $feature->each_tag_value('description');
>                                 my $score = $feature->score;
>                                 "$description, score=$score";
>                                },
>                              );
> ------------------------------------------------------------------------
> ---------
>
>
> Thanx,
>
> Russell Smithies
>
>
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bernd.web at gmail.com  Wed Nov 21 17:38:30 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 21 Nov 2007 18:38:30 +0100
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk>
	<47022278.7010700@web.de> <47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>

Hi,

I now found that bgcolor is using a  $feature->score that is coming
directly from the blast report, it is not the bit score.
     -bgcolor     => sub {my $feature = shift;
                                  my $score = $feature->score;
				  print "$score\n"; }
always print the score, even if the score is not set in the
Bio::SeqFeature::Generic object.

-description callbacks are somehow using the score from the SeqFeature object.

Does anyone have an idea why?

Further is is possible to get the raw_score of a hit. $hit->raw_score
actually gets the bitscore (w/o decimal point).

Bernd

On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> Hi Russell,
>
> I came across your question. At first I thought all was well on my
> system, but indeed I also have these colouring problems.
> I noted that scrore in the bgcolor callback gets a different value!
> Printing score during hit parsing($hit->raw_score) gives the same
> score as -description
> my $score = $feature->score; However, printing score in the bgcolor
> sub gives 2573!
> All scores in the bgcolor routine all different and higher than the
> real scores. Were you able to solve this colouring issue?
>
> Regards,
> Bernd
>
>
> > Hi all,
> > I'm using a modified version of Lincoln's tutorial
> > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > to give a similar image to that from NCBI but for some reason, my
> > colours are coming out wrong (see attached example)
> > They seem to be off by one but I can't see why.
> >
> > Any ideas?
> >
> > I can't be certain but I think it's only started doing this since our
> > BLAST upgrade to 2.2.17 a few weeks ago.
> >
> > Here's the colouring code:
> > ------------------------------------------------------------------------
> > -------
> > my $track = $panel->add_track(
> >                               -glyph       => 'segments',
> >                               -label       => 1,
> >                               -connector   => 'dashed',
> >                               -bgcolor     => sub {
> >                                 my $feature = shift;
> >                                 my $score = $feature->score;
> >                         return 'red'       if $score >= 200;
> >                                     return 'fuchsia' if $score >= 80;
> >                                     return 'lime'      if $score >= 50;
> >                         return 'blue'      if $score >= 40;
> >                                     return 'black';
> >                                },
> >                               -font2color  => 'gray',
> >                               -sort_order  => 'high_score',
> >                               -description => sub {
> >                                 my $feature = shift;
> >                                 return unless
> > $feature->has_tag('description');
> >                                 my ($description) =
> > $feature->each_tag_value('description');
> >                                 my $score = $feature->score;
> >                                 "$description, score=$score";
> >                                },
> >                              );
> > ------------------------------------------------------------------------
> > ---------
> >
> >
> > Thanx,
> >
> > Russell Smithies
> >
> >
> >
> >
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>


From sac at bioperl.org  Wed Nov 21 18:43:54 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 21 Nov 2007 10:43:54 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>

On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> [snip]
>
> Further is is possible to get the raw_score of a hit. $hit->raw_score
> actually gets the bitscore (w/o decimal point).

Hmmm. raw_score should not be the same as bit score. So given an
example blast hit line such as:

       Score = 60.0 bits (30), Expect = 1e-06

$hit->raw_score() should return 30, not 60, as you seem to be getting.

Could you submit a bug report for this?  http://www.bioperl.org/wiki/Bugs

Thanks,
Steve

>
> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> > Hi Russell,
> >
> > I came across your question. At first I thought all was well on my
> > system, but indeed I also have these colouring problems.
> > I noted that scrore in the bgcolor callback gets a different value!
> > Printing score during hit parsing($hit->raw_score) gives the same
> > score as -description
> > my $score = $feature->score; However, printing score in the bgcolor
> > sub gives 2573!
> > All scores in the bgcolor routine all different and higher than the
> > real scores. Were you able to solve this colouring issue?
> >
> > Regards,
> > Bernd
> >
> >
> > > Hi all,
> > > I'm using a modified version of Lincoln's tutorial
> > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output)
> > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub
> > > to give a similar image to that from NCBI but for some reason, my
> > > colours are coming out wrong (see attached example)
> > > They seem to be off by one but I can't see why.
> > >
> > > Any ideas?
> > >
> > > I can't be certain but I think it's only started doing this since our
> > > BLAST upgrade to 2.2.17 a few weeks ago.
> > >
> > > Here's the colouring code:
> > > ------------------------------------------------------------------------
> > > -------
> > > my $track = $panel->add_track(
> > >                               -glyph       => 'segments',
> > >                               -label       => 1,
> > >                               -connector   => 'dashed',
> > >                               -bgcolor     => sub {
> > >                                 my $feature = shift;
> > >                                 my $score = $feature->score;
> > >                         return 'red'       if $score >= 200;
> > >                                     return 'fuchsia' if $score >= 80;
> > >                                     return 'lime'      if $score >= 50;
> > >                         return 'blue'      if $score >= 40;
> > >                                     return 'black';
> > >                                },
> > >                               -font2color  => 'gray',
> > >                               -sort_order  => 'high_score',
> > >                               -description => sub {
> > >                                 my $feature = shift;
> > >                                 return unless
> > > $feature->has_tag('description');
> > >                                 my ($description) =
> > > $feature->each_tag_value('description');
> > >                                 my $score = $feature->score;
> > >                                 "$description, score=$score";
> > >                                },
> > >                              );
> > > ------------------------------------------------------------------------
> > > ---------
> > >
> > >
> > > Thanx,
> > >
> > > Russell Smithies
> > >
> > >
> > >
> > >
> > > =======================================================================
> > > Attention: The information contained in this message and/or attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or privileged
> > > material. Any review, retransmission, dissemination or other use of, or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > > =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From binkley at genome.stanford.edu  Thu Nov 22 00:35:02 2007
From: binkley at genome.stanford.edu (Jonathan Binkley)
Date: Wed, 21 Nov 2007 16:35:02 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>

Hi,

I installed bioperl on a Mac (OS 10.4, Intel) via fink,
which put it here:

/sw/lib/perl5/5.8.6/Bio/

It seems to work fine, but I need bioperl-ext for
Smith-Waterman alignments.

So, into which directory should I download bioperl-ext and
run the Makefile?

Thanks.


From dcj at sanger.ac.uk  Thu Nov 22 14:47:09 2007
From: dcj at sanger.ac.uk (Daniel Jeffares)
Date: Thu, 22 Nov 2007 14:47:09 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
Message-ID: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>

Hi all,

Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to  
be a little 'broken', at least in my hands.
First,  $bml->set_parameter('runmode', 0); does not work (sets  
runmode to -2). setting runmode to 1 is OK.
Also,  $bml->no_param_checks(1); doesn't seem to work.

The result is that the baseml.ctl file created under /tmp is not  
runnable by baseml with runmode 0. The phylip file created is run OK  
by baeml(with another .ctl file). My script & baseml.ctl below.

Hope it can be fixed,

cheers,

Dan


#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
   use Bio::AlignIO;
   my $alignio = Bio::AlignIO->new(-format => 'phylip',-file =>  
'test.phy');
   my $aln = $alignio->next_aln;

   my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
   $bml->alignment($aln);
   $bml->save_tempfiles(1);
   my $tempdir = $bml->tempdir();


   #set the runmode to zero
   $bml->set_parameter('runmode', 0);

   my ($rc,$parser) = $bml->run();
   system "more $tempdir/baseml.ctl";

   while( my $result = $parser->next_result ) {
     my @otus = $result->get_seqs();
     my $MLmatrix = $result->get_MLmatrix();
     # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
   }
exit;


The baseml.ctl file produced:
seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA
outfile = mlb
fix_rho = 1
verbose = 0
noisy = 0
RateAncestor = 1
kappa = 2.5
model = 0
ndata = 5
Small_Diff = 1e-6
runmode = -2
alpha = 0
fix_kappa = 0
rho = 0
nhomo = 0
getSE = 0
cleandata = 1
fix_alpha = 1
clock = 0
Malpha = 0
ncatG = 5
fix_blength = -1
nparK = 0


Regards,

Daniel Jeffares

______________________________
Population and Comparative Genomics
Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
Phone: +44(0)1223 834244 x 7297
Fax: +44 (0)1223 494919
www.sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Thu Nov 22 16:06:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 22 Nov 2007 17:06:16 +0100
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>

Daniel,

I don't have bioperl-run or PAML installed on my system to test it myself,
but have you tried the latest version of bioperl-run from CVS? It looks like
that code has been worked on since 1.5.2 was released.


If that still doesn't work, could you file this as a bug to make sure it
gets followed up?


Dave


You can grab the tarball here:
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl


and if necessary file the bug here:
BioPerl Bugzilla tracking system <http://bugzilla.open-bio.org/>


From arareko at campus.iztacala.unam.mx  Thu Nov 22 16:37:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 22 Nov 2007 10:37:24 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
Message-ID: <4745B044.5090102@campus.iztacala.unam.mx>

Hi Peter,

In BioPerl, there's no such mapping for db_xref's that I'm aware of. 
Each parser handles db_xref records on its own. Take a look at the 
Bio::SeqIO::genbank code, inside the next_seq() method for example:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup

Regards,
Mauricio.

Peter wrote:
> Dear all,
> 
> I'm one of the Biopython developers.  I've recently got going with
> BioSQL and have been getting to grips with the Biopython BioSQL
> interface.  I'm aware that we need to try and be consistent with
> BioPerl and BioJava, so I'd like to pose my first question related to
> that.
> 
> When loading GenBank records, many features have db_xref qualifiers,
> e.g. from a random CDS feature in E. coli K12:
> 
>                      /db_xref="ASAP:1309"
>                      /db_xref="GI:16128366"
>                      /db_xref="ECOCYC:EG10213"
>                      /db_xref="GeneID:945313"
> 
> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
> "GeneID" before using recording these entries in the seqfeature_dbxref
> and dbxref tables.  For example, "GI" becomes "GeneIndex".
> Biopython's current mapping is as follows:
> 
> # Dictionary of database types, keyed by GenBank db_xref abbreviation
> db_dict = {'GeneID': 'Entrez',
>            'GI': 'GeneIndex',
>            'COG': 'COG',
>            'CDD': 'CDD',
>            'DDBJ': 'DNA Databank of Japan',
>            'Entrez': 'Entrez',
>            'GeneIndex': 'GeneIndex',
>            'PUBMED': 'PubMed',
>            'taxon': 'Taxon',
>            'ATCC': 'ATCC',
>            'ISFinder': 'ISFinder',
>            'GOA': 'Gene Ontology Annotation',
>            'ASAP': 'ASAP',
>            'PSEUDO': 'PSEUDO',
>            'InterPro': 'InterPro',
>            'GEO': 'Gene Expression Omnibus',
>            'EMBL': 'EMBL',
>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>            'ECOCYC': 'EcoCyc',
>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>            }
> 
> In my testing, I've found several GenBank db_xref abbreviation for
> which we don't have a mapping defined, such as "LocusID", "dbSNP",
> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
> 
> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
> similar mapping in their BioSQL code (or GenBank parser), so that
> Biopython can follow your example.
> 
> Thank you,
> 
> Peter
> 
> P.S. See also Biopython bug 2405
> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From avilella at gmail.com  Thu Nov 22 21:55:10 2007
From: avilella at gmail.com (Albert Vilella)
Date: Thu, 22 Nov 2007 21:55:10 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>

Hi,

Am I right in thinking that the '_symbols' hash in SimpleAlign is only
used if one calls the symbol_chars method?

When I comment out this line:

map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
$seq->seq; # line 257

I get a nice speed boost on loading alignments.

Can I comment this line out in the CVS HEAD?

Cheers,

    Albert.

[init] 5.96046447753906e-06 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.0022270679473877 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
2.14348912239075 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
6.91910791397095 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
15.8402290344238 secs...

avilella at magneto:~$ perl
/home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl
-dir /home/avilella/ensembl/exoseq/test -verbose
[init] 1.21593475341797e-05 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta]
0.00294303894042969 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta]
0.510555982589722 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta]
1.6192569732666 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta]
3.86473417282104 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta]
6.99602198600769 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta]
7.26704716682434 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta]
8.44332504272461 secs...
[loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta]
12.103296995163 secs...


From cjfields at uiuc.edu  Fri Nov 23 00:30:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:30:51 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu>

How are tests affected?  It might be worth going through the revision  
history to see if there was a specific reason this was implemented,  
but if it passes tests I don't see why we need it.

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Nov 23 00:42:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:42:12 -0600
Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref
	table
In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx>
References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com>
	<4745B044.5090102@campus.iztacala.unam.mx>
Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu>

I think SeqIO checks the name for parsing reasons only, in cases  
where the format changes based on the source (such as GenPept  
DBSOURCE data).  I don't think we go beyond that in Bioperl, probably  
b/c modifying or expanding names for data persistence would lead to  
volatile coding issues (i.e. consistency between parsers, constant  
updating to cover new crossrefs, etc).

I would definitely suggest retaining the original DB as it appears in  
the dbxref for consistency/sanity; if needed return expanded names  
using a different method if they are designated.

chris

On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote:

> Hi Peter,
>
> In BioPerl, there's no such mapping for db_xref's that I'm aware of.
> Each parser handles db_xref records on its own. Take a look at the
> Bio::SeqIO::genbank code, inside the next_seq() method for example:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
> Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup
>
> Regards,
> Mauricio.
>
> Peter wrote:
>> Dear all,
>>
>> I'm one of the Biopython developers.  I've recently got going with
>> BioSQL and have been getting to grips with the Biopython BioSQL
>> interface.  I'm aware that we need to try and be consistent with
>> BioPerl and BioJava, so I'd like to pose my first question related to
>> that.
>>
>> When loading GenBank records, many features have db_xref qualifiers,
>> e.g. from a random CDS feature in E. coli K12:
>>
>>                      /db_xref="ASAP:1309"
>>                      /db_xref="GI:16128366"
>>                      /db_xref="ECOCYC:EG10213"
>>                      /db_xref="GeneID:945313"
>>
>> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
>> "GeneID" before using recording these entries in the  
>> seqfeature_dbxref
>> and dbxref tables.  For example, "GI" becomes "GeneIndex".
>> Biopython's current mapping is as follows:
>>
>> # Dictionary of database types, keyed by GenBank db_xref abbreviation
>> db_dict = {'GeneID': 'Entrez',
>>            'GI': 'GeneIndex',
>>            'COG': 'COG',
>>            'CDD': 'CDD',
>>            'DDBJ': 'DNA Databank of Japan',
>>            'Entrez': 'Entrez',
>>            'GeneIndex': 'GeneIndex',
>>            'PUBMED': 'PubMed',
>>            'taxon': 'Taxon',
>>            'ATCC': 'ATCC',
>>            'ISFinder': 'ISFinder',
>>            'GOA': 'Gene Ontology Annotation',
>>            'ASAP': 'ASAP',
>>            'PSEUDO': 'PSEUDO',
>>            'InterPro': 'InterPro',
>>            'GEO': 'Gene Expression Omnibus',
>>            'EMBL': 'EMBL',
>>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>>            'ECOCYC': 'EcoCyc',
>>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>>            }
>>
>> In my testing, I've found several GenBank db_xref abbreviation for
>> which we don't have a mapping defined, such as "LocusID", "dbSNP",
>> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
>>
>> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
>> similar mapping in their BioSQL code (or GenBank parser), so that
>> Biopython can follow your example.
>>
>> Thank you,
>>
>> Peter
>>
>> P.S. See also Biopython bug 2405
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Nov 23 00:49:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Nov 2007 18:49:15 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>

Albert,

Found it:

http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
SimpleAlign.pm.diff?r1=1.36&r2=1.37

If it slows performance that dramatically, maybe we can move this to  
a separate AlignUtils method instead.  Maybe something to ask Jason  
about?

chris

On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:

> Hi,
>
> Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> used if one calls the symbol_chars method?
>
> When I comment out this line:
>
> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> $seq->seq; # line 257
>
> I get a nice speed boost on loading alignments.
>
> Can I comment this line out in the CVS HEAD?
>
> Cheers,
>
>     Albert.
>
> [init] 5.96046447753906e-06 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.0022270679473877 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 2.14348912239075 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 6.91910791397095 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 15.8402290344238 secs...
>
> avilella at magneto:~$ perl
> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ 
> ancestral_alleles.pl
> -dir /home/avilella/ensembl/exoseq/test -verbose
> [init] 1.21593475341797e-05 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162399.chr1.fasta]
> 0.00294303894042969 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000158022.chr1.fasta]
> 0.510555982589722 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000162585.chr1.fasta]
> 1.6192569732666 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000121957.chr1.fasta]
> 3.86473417282104 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000203717.chr1.fasta]
> 6.99602198600769 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000196188.chr1.fasta]
> 7.26704716682434 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000025800.chr1.fasta]
> 8.44332504272461 secs...
> [loading aln /home/avilella/ensembl/exoseq/test/ 
> ENSG00000117475.chr1.fasta]
> 12.103296995163 secs...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 23 12:29:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Nov 2007 12:29:37 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
Message-ID: <4746C7B1.1010002@sendu.me.uk>

Dave Messina wrote:
> Daniel,
> 
> I don't have bioperl-run or PAML installed on my system to test it myself,
> but have you tried the latest version of bioperl-run from CVS? It looks like
> that code has been worked on since 1.5.2 was released.

Yes, I fixed it in CVS so it should at least /run/. I don't know about 
the parsing side of things, though that may also have been fixed 
recently by someone else.


From avilella at gmail.com  Fri Nov 23 13:08:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Nov 2007 13:08:59 +0000
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <4746C7B1.1010002@sendu.me.uk>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>

Just to mention that the new paml4 has a "basemlg" instead of a
"baseml" binary. AFAIK, Jason fixed codeml to make it work both for
paml3.xx a paml4, but I am not sure about baseml.

Also, I think if you set runmode 0, you have to provide a tree:

#!/usr/bin/perl

use Bio::Tools::Run::Phylo::PAML::Baseml;
use Bio::AlignIO;
use Bio::TreeIO;
my $alignio = Bio::AlignIO->new(-format => 'phylip',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy');
my $treeio = Bio::TreeIO->new(-format => 'newick',
                                -file =>
'/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree');
my $aln = $alignio->next_aln;
my $tree = $treeio->next_tree;

my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new();
$bml->alignment($aln);
$bml->tree($tree);
$bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml");
$bml->save_tempfiles(1);
my $tempdir = $bml->tempdir();


#set the runmode to zero
$bml->set_parameter('runmode', 0);

my ($rc,$parser) = $bml->run();
system "more $tempdir/baseml.ctl";

while ( my $result = $parser->next_result ) {
    my @otus = $result->get_seqs();
    my $MLmatrix = $result->get_MLmatrix();
    $DB::single=1;1;
    # 0 and 1 correspond to the 1st and 2nd entry in the @otus array
}
exit;

4 50
Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC
Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC
Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC-
Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC

ACAUUUU-CCUUGCAAAG
ACAUCAU-CCUUGCAAAG
ACAUCAUCCCUCGCAGAG
ACAUCAUCCCUUGCAGAG

(((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm);
On Nov 23, 2007 12:29 PM, Sendu Bala <bix at sendu.me.uk> wrote:
> Dave Messina wrote:
> > Daniel,
> >
> > I don't have bioperl-run or PAML installed on my system to test it myself,
> > but have you tried the latest version of bioperl-run from CVS? It looks like
> > that code has been worked on since 1.5.2 was released.
>
> Yes, I fixed it in CVS so it should at least /run/. I don't know about
> the parsing side of things, though that may also have been fixed
> recently by someone else.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Fri Nov 23 16:24:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 10:24:59 -0600
Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml
In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
References: <C290408F-48DC-47A7-9983-40E3732DB0D2@sanger.ac.uk>
	<628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com>
	<4746C7B1.1010002@sendu.me.uk>
	<358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com>
Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu>

I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just  
'basemlg'), so it would need to work with both.

Do we want to put a PAML parser/wrapper overhaul on the TODO list for  
1.6?

chris

On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote:

> Just to mention that the new paml4 has a "basemlg" instead of a
> "baseml" binary. AFAIK, Jason fixed codeml to make it work both for
> paml3.xx a paml4, but I am not sure about baseml.
...


From arvindvanam at gmail.com  Fri Nov 23 21:26:06 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST)
Subject: [Bioperl-l]  run RNAfold in perl
Message-ID: <13918981.post@talk.nabble.com>


how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
my $rnafold = $factory->program('rnafold');
my $job=$rnafold->run(-rnafold =>
'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');

I installed Vienna package and then i tried using Pise to create an object
for the program but its giving the following error
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bio::Tools::Run::PiseJob terminated: URL missing
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Tools::Run::PiseJob::terminated
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
STACK: Bio::Tools::Run::PiseApplication::submit
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
STACK: Bio::Tools::Run::PiseApplication::run
/usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
STACK: evaluate.pl:12


how to make the program RNAfold run in perl... 
IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???

plz reply soon
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Fri Nov 23 22:49:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Nov 2007 16:49:43 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13918981.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>

The Pise wrappers run the programs remotely; see  
Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
mfold wrappers but haven't done so yet.  The Vienna tools do have a  
Perl-based (non-BioPerl-based) module included which uses libRNA, and  
is well worth a look.  Try 'perldoc RNA' if you have installed the  
tools locally, or look here for other Perl-based tools:

http://www.tbi.univie.ac.at/~ivo/RNA/utils.html

chris

On Nov 23, 2007, at 3:26 PM, vanam wrote:

>
> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>
> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
> my $rnafold = $factory->program('rnafold');
> my $job=$rnafold->run(-rnafold =>
> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>
> I installed Vienna package and then i tried using Pise to create an  
> object
> for the program but its giving the following error
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Tools::Run::PiseJob::terminated
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
> STACK: Bio::Tools::Run::PiseApplication::submit
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
> STACK: Bio::Tools::Run::PiseApplication::run
> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
> STACK: evaluate.pl:12
>
>
> how to make the program RNAfold run in perl...
> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>
> plz reply soon
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13918981
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Sat Nov 24 07:29:11 2007
From: arvindvanam at gmail.com (vanam)
Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
Message-ID: <13922740.post@talk.nabble.com>


i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and
i used it exactly as it was mentioned in it.

i just want that instead of running its perl version "RNAfold.pl" I can use
the functions associated with RNAfold with a perl program without having to
call the program using system() command.

if you can just tell me how to use these wrapper modules it would b of gr8
help...like while using clustalw or clustalx we define the environment
variable for it ..do we have to do the same for RNAfold or Mfold


Chris Fields wrote:
> 
> The Pise wrappers run the programs remotely; see  
> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a  
> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ 
> mfold wrappers but haven't done so yet.  The Vienna tools do have a  
> Perl-based (non-BioPerl-based) module included which uses libRNA, and  
> is well worth a look.  Try 'perldoc RNA' if you have installed the  
> tools locally, or look here for other Perl-based tools:
> 
> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
> 
> chris
> 
> On Nov 23, 2007, at 3:26 PM, vanam wrote:
> 
>>
>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>
>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>> my $rnafold = $factory->program('rnafold');
>> my $job=$rnafold->run(-rnafold =>
>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>
>> I installed Vienna package and then i tried using Pise to create an  
>> object
>> for the program but its giving the following error
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::Tools::Run::PiseJob::terminated
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>> STACK: Bio::Tools::Run::PiseApplication::submit
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>> STACK: Bio::Tools::Run::PiseApplication::run
>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>> STACK: evaluate.pl:12
>>
>>
>> how to make the program RNAfold run in perl...
>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>
>> plz reply soon
>> -- 
>> View this message in context: http://www.nabble.com/run-RNAfold-in- 
>> perl-tf4863835.html#a13918981
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From avilella at gmail.com  Sun Nov 25 11:50:42 2007
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 25 Nov 2007 11:50:42 +0000
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>

cvs commited now. it is calculated anyway when calling symbol_chars so...

On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> Found it:
>
> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>
> If it slows performance that dramatically, maybe we can move this to
> a separate AlignUtils method instead.  Maybe something to ask Jason
> about?
>
> chris
>
> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>
>
> > Hi,
> >
> > Am I right in thinking that the '_symbols' hash in SimpleAlign is only
> > used if one calls the symbol_chars method?
> >
> > When I comment out this line:
> >
> > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> > $seq->seq; # line 257
> >
> > I get a nice speed boost on loading alignments.
> >
> > Can I comment this line out in the CVS HEAD?
> >
> > Cheers,
> >
> >     Albert.
> >
> > [init] 5.96046447753906e-06 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.0022270679473877 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 2.14348912239075 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 6.91910791397095 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 15.8402290344238 secs...
> >
> > avilella at magneto:~$ perl
> > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> > ancestral_alleles.pl
> > -dir /home/avilella/ensembl/exoseq/test -verbose
> > [init] 1.21593475341797e-05 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162399.chr1.fasta]
> > 0.00294303894042969 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000158022.chr1.fasta]
> > 0.510555982589722 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000162585.chr1.fasta]
> > 1.6192569732666 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000121957.chr1.fasta]
> > 3.86473417282104 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000203717.chr1.fasta]
> > 6.99602198600769 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000196188.chr1.fasta]
> > 7.26704716682434 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000025800.chr1.fasta]
> > 8.44332504272461 secs...
> > [loading aln /home/avilella/ensembl/exoseq/test/
> > ENSG00000117475.chr1.fasta]
> > 12.103296995163 secs...
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Sun Nov 25 15:05:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:05:27 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13922740.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>

Again, these wrappers are for submitting data to a Pise server for  
the corresponding programs (run on a remote server).  There are no  
wrappers for running RNAfold on your computer (i.e. locally), with or  
w/o a set env. variable.  You can try instaling Pise locally and  
setting the location() as shown in POD to localhost, however I don't  
know how stable these modules are with newer versions of Pise.  These  
haven't been updated in a few years, apart from getting tests to work.

Another option is installing EMBOSS along with the EMBASSY version of  
RNAFold; this could conceivably be run through Bio::Factory::EMBOSS.

chris

On Nov 24, 2007, at 1:29 AM, vanam wrote:

>
> i have seen the documentation for  
> Bio::Tools::Run::AnalysisFactory::Pise and
> i used it exactly as it was mentioned in it.
>
> i just want that instead of running its perl version "RNAfold.pl" I  
> can use
> the functions associated with RNAfold with a perl program without  
> having to
> call the program using system() command.
>
> if you can just tell me how to use these wrapper modules it would b  
> of gr8
> help...like while using clustalw or clustalx we define the environment
> variable for it ..do we have to do the same for RNAfold or Mfold
>
>
>
>
> Chris Fields wrote:
>>
>> The Pise wrappers run the programs remotely; see
>> Bio::Tools::Run::AnalysisFactory::Pise on how to run it.  As for a
>> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/
>> mfold wrappers but haven't done so yet.  The Vienna tools do have a
>> Perl-based (non-BioPerl-based) module included which uses libRNA, and
>> is well worth a look.  Try 'perldoc RNA' if you have installed the
>> tools locally, or look here for other Perl-based tools:
>>
>> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html
>>
>> chris
>>
>> On Nov 23, 2007, at 3:26 PM, vanam wrote:
>>
>>>
>>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise?????
>>>
>>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
>>> my $rnafold = $factory->program('rnafold');
>>> my $job=$rnafold->run(-rnafold =>
>>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC');
>>>
>>> I installed Vienna package and then i tried using Pise to create an
>>> object
>>> for the program but its giving the following error
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::Tools::Run::PiseJob::terminated
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460
>>> STACK: Bio::Tools::Run::PiseApplication::submit
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416
>>> STACK: Bio::Tools::Run::PiseApplication::run
>>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352
>>> STACK: evaluate.pl:12
>>>
>>>
>>> how to make the program RNAfold run in perl...
>>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is???
>>>
>>> plz reply soon
>>> -- 
>>> View this message in context: http://www.nabble.com/run-RNAfold-in-
>>> perl-tf4863835.html#a13918981
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/run-RNAfold-in- 
> perl-tf4863835.html#a13922740
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 15:38:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 09:38:40 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
Message-ID: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>

Albert,

I was getting a single AlignIO.t fail which appeared to be related to  
this:

...
ok 122 - The object isa Bio::Align::AlignI
ok 123 - consensus_string on metafasta

not ok 124 - symbol_chars() using metafasta
#   Failed test 'symbol_chars() using metafasta'
#   in t/AlignIO.t at line 346.
#          got: '0'
#     expected: '23'

It was b/c the symbol hash was initialized in the constructor (so it  
was present, just empty).  I have changed that in CVS; all tests pass  
now.

chris

On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:

> cvs commited now. it is calculated anyway when calling symbol_chars  
> so...
>
> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> Found it:
>>
>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ 
>> Bio/
>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>
>> If it slows performance that dramatically, maybe we can move this to
>> a separate AlignUtils method instead.  Maybe something to ask Jason
>> about?
>>
>> chris
>>
>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>
>>
>>> Hi,
>>>
>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is  
>>> only
>>> used if one calls the symbol_chars method?
>>>
>>> When I comment out this line:
>>>
>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>> $seq->seq; # line 257
>>>
>>> I get a nice speed boost on loading alignments.
>>>
>>> Can I comment this line out in the CVS HEAD?
>>>
>>> Cheers,
>>>
>>>     Albert.
>>>
>>> [init] 5.96046447753906e-06 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.0022270679473877 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 2.14348912239075 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 6.91910791397095 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 15.8402290344238 secs...
>>>
>>> avilella at magneto:~$ perl
>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>> ancestral_alleles.pl
>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>> [init] 1.21593475341797e-05 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162399.chr1.fasta]
>>> 0.00294303894042969 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000158022.chr1.fasta]
>>> 0.510555982589722 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000162585.chr1.fasta]
>>> 1.6192569732666 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000121957.chr1.fasta]
>>> 3.86473417282104 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000203717.chr1.fasta]
>>> 6.99602198600769 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000196188.chr1.fasta]
>>> 7.26704716682434 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000025800.chr1.fasta]
>>> 8.44332504272461 secs...
>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>> ENSG00000117475.chr1.fasta]
>>> 12.103296995163 secs...
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bernd.web at gmail.com  Sun Nov 25 16:13:44 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Sun, 25 Nov 2007 17:13:44 +0100
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>

Hi,

I am not sure if this is related, but I remember SimpleAlign was
adapted to cope with more gap symbols that can occur in
alignments/FastA sequences, as: . _ - =
Previous versions would throw an error on 'illegal' gap characters,

Regards,
Bernd

On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
> Albert,
>
> I was getting a single AlignIO.t fail which appeared to be related to
> this:
>
> ...
> ok 122 - The object isa Bio::Align::AlignI
> ok 123 - consensus_string on metafasta
>
> not ok 124 - symbol_chars() using metafasta
> #   Failed test 'symbol_chars() using metafasta'
> #   in t/AlignIO.t at line 346.
> #          got: '0'
> #     expected: '23'
>
> It was b/c the symbol hash was initialized in the constructor (so it
> was present, just empty).  I have changed that in CVS; all tests pass
> now.
>
> chris
>
>
> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>
> > cvs commited now. it is calculated anyway when calling symbol_chars
> > so...
> >
> > On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Albert,
> >>
> >> Found it:
> >>
> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >> Bio/
> >> SimpleAlign.pm.diff?r1=1.36&r2=1.37
> >>
> >> If it slows performance that dramatically, maybe we can move this to
> >> a separate AlignUtils method instead.  Maybe something to ask Jason
> >> about?
> >>
> >> chris
> >>
> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
> >>> only
> >>> used if one calls the symbol_chars method?
> >>>
> >>> When I comment out this line:
> >>>
> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
> >>> $seq->seq; # line 257
> >>>
> >>> I get a nice speed boost on loading alignments.
> >>>
> >>> Can I comment this line out in the CVS HEAD?
> >>>
> >>> Cheers,
> >>>
> >>>     Albert.
> >>>
> >>> [init] 5.96046447753906e-06 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.0022270679473877 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 2.14348912239075 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 6.91910791397095 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 15.8402290344238 secs...
> >>>
> >>> avilella at magneto:~$ perl
> >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
> >>> ancestral_alleles.pl
> >>> -dir /home/avilella/ensembl/exoseq/test -verbose
> >>> [init] 1.21593475341797e-05 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162399.chr1.fasta]
> >>> 0.00294303894042969 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000158022.chr1.fasta]
> >>> 0.510555982589722 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000162585.chr1.fasta]
> >>> 1.6192569732666 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000121957.chr1.fasta]
> >>> 3.86473417282104 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000203717.chr1.fasta]
> >>> 6.99602198600769 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000196188.chr1.fasta]
> >>> 7.26704716682434 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000025800.chr1.fasta]
> >>> 8.44332504272461 secs...
> >>> [loading aln /home/avilella/ensembl/exoseq/test/
> >>> ENSG00000117475.chr1.fasta]
> >>> 12.103296995163 secs...
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Sun Nov 25 16:39:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 10:39:01 -0600
Subject: [Bioperl-l] proposed change -- symbols SimpleAlign
In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com>
	<6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu>
	<358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com>
	<F9248F84-A3AB-49F0-8419-443DA8BC4FDF@uiuc.edu>
	<716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com>
Message-ID: <B849A608-7C12-4C87-BB93-D846959F0523@uiuc.edu>

Bernd,

That would be when generating Bio::LocatableSeq instances for  
building a Bio::SimpleAlign object.  Judging by test suite results  
that doesn't appear to be affected.

chris

On Nov 25, 2007, at 10:13 AM, Bernd Web wrote:

> Hi,
>
> I am not sure if this is related, but I remember SimpleAlign was
> adapted to cope with more gap symbols that can occur in
> alignments/FastA sequences, as: . _ - =
> Previous versions would throw an error on 'illegal' gap characters,
>
> Regards,
> Bernd
>
> On Nov 25, 2007 4:38 PM, Chris Fields <cjfields at uiuc.edu> wrote:
>> Albert,
>>
>> I was getting a single AlignIO.t fail which appeared to be related to
>> this:
>>
>> ...
>> ok 122 - The object isa Bio::Align::AlignI
>> ok 123 - consensus_string on metafasta
>>
>> not ok 124 - symbol_chars() using metafasta
>> #   Failed test 'symbol_chars() using metafasta'
>> #   in t/AlignIO.t at line 346.
>> #          got: '0'
>> #     expected: '23'
>>
>> It was b/c the symbol hash was initialized in the constructor (so it
>> was present, just empty).  I have changed that in CVS; all tests pass
>> now.
>>
>> chris
>>
>>
>> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote:
>>
>>> cvs commited now. it is calculated anyway when calling symbol_chars
>>> so...
>>>
>>> On Nov 23, 2007 12:49 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>>>> Albert,
>>>>
>>>> Found it:
>>>>
>>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
>>>> Bio/
>>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37
>>>>
>>>> If it slows performance that dramatically, maybe we can move  
>>>> this to
>>>> a separate AlignUtils method instead.  Maybe something to ask Jason
>>>> about?
>>>>
>>>> chris
>>>>
>>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is
>>>>> only
>>>>> used if one calls the symbol_chars method?
>>>>>
>>>>> When I comment out this line:
>>>>>
>>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if
>>>>> $seq->seq; # line 257
>>>>>
>>>>> I get a nice speed boost on loading alignments.
>>>>>
>>>>> Can I comment this line out in the CVS HEAD?
>>>>>
>>>>> Cheers,
>>>>>
>>>>>     Albert.
>>>>>
>>>>> [init] 5.96046447753906e-06 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.0022270679473877 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 2.14348912239075 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 6.91910791397095 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 15.8402290344238 secs...
>>>>>
>>>>> avilella at magneto:~$ perl
>>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/
>>>>> ancestral_alleles.pl
>>>>> -dir /home/avilella/ensembl/exoseq/test -verbose
>>>>> [init] 1.21593475341797e-05 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162399.chr1.fasta]
>>>>> 0.00294303894042969 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000158022.chr1.fasta]
>>>>> 0.510555982589722 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000162585.chr1.fasta]
>>>>> 1.6192569732666 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000121957.chr1.fasta]
>>>>> 3.86473417282104 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000203717.chr1.fasta]
>>>>> 6.99602198600769 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000196188.chr1.fasta]
>>>>> 7.26704716682434 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000025800.chr1.fasta]
>>>>> 8.44332504272461 secs...
>>>>> [loading aln /home/avilella/ensembl/exoseq/test/
>>>>> ENSG00000117475.chr1.fasta]
>>>>> 12.103296995163 secs...
>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Nov 25 18:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Nov 2007 12:51:42 -0600
Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live
Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu>

I have been making some significant changes to  
Bio::SeqIO::staden::read over the last few months which incorporate  
code from Bugzilla (bugs 2074 and 2329, very kindly donated from  
Chris Bailey and Joel Martin, cheers!).

Significant Changes:

* All Inline code in staden::read are now XS-based
* A new method has been added to Bio::SeqIO::staden::read for  
optionally getting trace data (i.e. for drawing graphs).

The method ode is now implemented in Bio::SeqIO::abi, with example  
code in examples/quality/svgtrace.pl.  These changes should allow  
newer versions of Staden io_lib as well (the code is tested with  
io_lib 1.9.2), though they haven't been tested extensively as I am  
having problems compiling newer io_lib versions on my MacBook.  It's  
very likely more changes will need to be made along the way; some  
issues were found with XS compilation which appear harmless but need  
to be investigated, and trace data from other formats need to be  
evaluated.  The possibility exists that many of these changes break  
backward compatibility with older bioperl releases, though tests  
passed with bioperl 1.5.2.

Any feedback re: platform issues, test results using newer io_lib  
versions, older bioperl-versions, etc would be greatly appreciated.   
I'm hoping this will stimulate more interest in getting other bioperl- 
ext modules up-to-date with bioperl-live.

chris


From cjfields at uiuc.edu  Mon Nov 26 18:59:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 12:59:23 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de>
	<47025AD9.1090105@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
Message-ID: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>

Steve, Bernd, (and Jason, since you may have some input on this as  
well),

I am now looking into the bug Bernd submitted and it seems there is a  
serious discrepancy with the way the hit raw_score, bits, and  
significance is determined for Hit objects.  Unless I am mistaken  
these should always come from the best HSP when they are present,  
falling back to the hit table data only when no HSP alignments are  
present.  Under the latter conditions a minimal Hit object is made  
from data in the hit table, which reports the rounded bit score, not  
the raw score, so in those cases the raw score would be undefined  
(and you probably should get a nasty warning indicating there are no  
HSPs present to get the data from).

What is occurring now, though, is the raw_score and significance is  
explicitly set from the hit table in the BLAST parser for the Hit  
object at all times, while the bits are correctly derived from the  
best HSP (no fallback to the hit table).  Changing to the behavior  
above results in several tests failing via SearchIO.t, with each  
failed test reporting the expected (read:correct) raw score.

I'll look through the tests just in case, but I am planning on  
committing changes to the BLAST parsers, GenericHit, and SearchIO.t  
(to reflect the correct expected data) in the next day or two unless  
there are any objections.

chris

On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:

> On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
>> [snip]
>>
>> Further is is possible to get the raw_score of a hit. $hit->raw_score
>> actually gets the bitscore (w/o decimal point).
>
> Hmmm. raw_score should not be the same as bit score. So given an
> example blast hit line such as:
>
>        Score = 60.0 bits (30), Expect = 1e-06
>
> $hit->raw_score() should return 30, not 60, as you seem to be getting.
>
> Could you submit a bug report for this?  http://www.bioperl.org/ 
> wiki/Bugs
>
> Thanks,
> Steve
>
>>
>> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
>>> Hi Russell,
>>>
>>> I came across your question. At first I thought all was well on my
>>> system, but indeed I also have these colouring problems.
>>> I noted that scrore in the bgcolor callback gets a different value!
>>> Printing score during hit parsing($hit->raw_score) gives the same
>>> score as -description
>>> my $score = $feature->score; However, printing score in the bgcolor
>>> sub gives 2573!
>>> All scores in the bgcolor routine all different and higher than the
>>> real scores. Were you able to solve this colouring issue?
>>>
>>> Regards,
>>> Bernd
>>>
>>>
>>>> Hi all,
>>>> I'm using a modified version of Lincoln's tutorial
>>>> (http://www.bioperl.org/wiki/ 
>>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
>>>> and I'm colouring the HSPs by setting the -bgcolor by score with  
>>>> a sub
>>>> to give a similar image to that from NCBI but for some reason, my
>>>> colours are coming out wrong (see attached example)
>>>> They seem to be off by one but I can't see why.
>>>>
>>>> Any ideas?
>>>>
>>>> I can't be certain but I think it's only started doing this  
>>>> since our
>>>> BLAST upgrade to 2.2.17 a few weeks ago.
>>>>
>>>> Here's the colouring code:
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> -------
>>>> my $track = $panel->add_track(
>>>>                               -glyph       => 'segments',
>>>>                               -label       => 1,
>>>>                               -connector   => 'dashed',
>>>>                               -bgcolor     => sub {
>>>>                                 my $feature = shift;
>>>>                                 my $score = $feature->score;
>>>>                         return 'red'       if $score >= 200;
>>>>                                     return 'fuchsia' if $score  
>>>> >= 80;
>>>>                                     return 'lime'      if $score  
>>>> >= 50;
>>>>                         return 'blue'      if $score >= 40;
>>>>                                     return 'black';
>>>>                                },
>>>>                               -font2color  => 'gray',
>>>>                               -sort_order  => 'high_score',
>>>>                               -description => sub {
>>>>                                 my $feature = shift;
>>>>                                 return unless
>>>> $feature->has_tag('description');
>>>>                                 my ($description) =
>>>> $feature->each_tag_value('description');
>>>>                                 my $score = $feature->score;
>>>>                                 "$description, score=$score";
>>>>                                },
>>>>                              );
>>>> ------------------------------------------------------------------- 
>>>> -----
>>>> ---------
>>>>
>>>>
>>>> Thanx,
>>>>
>>>> Russell Smithies
>>>>
>>>>
>>>>
>>>>
>>>> =================================================================== 
>>>> ====
>>>> Attention: The information contained in this message and/or  
>>>> attachments
>>>> from AgResearch Limited is intended only for the persons or  
>>>> entities
>>>> to which it is addressed and may contain confidential and/or  
>>>> privileged
>>>> material. Any review, retransmission, dissemination or other use  
>>>> of, or
>>>> taking of any action in reliance upon, this information by  
>>>> persons or
>>>> entities other than the intended recipients is prohibited by  
>>>> AgResearch
>>>> Limited. If you have received this message in error, please  
>>>> notify the
>>>> sender immediately.
>>>> =================================================================== 
>>>> ====
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arvindvanam at gmail.com  Mon Nov 26 19:08:41 2007
From: arvindvanam at gmail.com (vanam)
Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST)
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
Message-ID: <13955209.post@talk.nabble.com>


i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m
unable to find a downloadable version.all ther is a web interface for it.
can u tell frm wher to fdownload it????

or can you just tell me how to set the location in piseapplication to
localhost n wat to enter in the $email variable????
-- 
View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Nov 26 20:08:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Nov 2007 14:08:24 -0600
Subject: [Bioperl-l] run RNAfold in perl
In-Reply-To: <13955209.post@talk.nabble.com>
References: <13918981.post@talk.nabble.com>
	<214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu>
	<13922740.post@talk.nabble.com>
	<1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu>
	<13955209.post@talk.nabble.com>
Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu>


On Nov 26, 2007, at 1:08 PM, vanam wrote:

> i searches for the embassy version of RNAFOLD (i guess its  
> vrnafold) but i m
> unable to find a downloadable version.all ther is a web interface  
> for it.
> can u tell frm wher to fdownload it????

You will need to install EMBOSS as well as the EMBASSY version of  
VIENNA (something which is documented in the docs that come along  
with the distributions and I will not go into detail on):

ftp://emboss.open-bio.org/pub/EMBOSS/

This would be your best bet.  Understand that there is no specific  
class framework for dealing with RNA secondary structure in BioPerl,  
so you will be on your own for now.

My suggestion for using Pise had the very important caveats that (1)  
it very well may not work, (2) I have no experience with Pise, let  
alone setting it up locally, therefore (3) I haven't tested it (and  
don't intend to, as I don't have the time).

> or can you just tell me how to set the location in piseapplication to
> localhost n wat to enter in the $email variable????

I have pointed out documentation previously which comes with the  
modules in question.  Remember perldoc is your friend; consulting it  
saves me (and everyone else) time.

 From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise':

----------------------------------------------

DESCRIPTION

Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli-
cation objects, that let you submit jobs on a Pise server.

my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                             -email => 'me at myhome');

The email is optional (there is default one). It can be useful, though.
Your program might enter infinite loops, or just run many jobs: the
Pise server maintainer needs a contact (s/he could of course cancel any
requests from your address...). And if you plan to run a lot of heavy
jobs, or to do a course with many students, please ask the maintainer
before.

The location parameter stands for the actual CGI location, except when
set at the factory creation step, where it is rather the root of all
CGI.  There are default values for most of Pise programs.

You can either set location at:

1 factory creation:
      my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(
                                     -location => 'http://somewhere/ 
Pise/cgi-bin',
                                     -email => 'me at myhome');

2 program creation:
      my $program = $factory->program('water',
                               -location => 'http://somewhere/Pise/ 
cgi-bin/water.pl'
                                      );

3 any time before running:
      $program->location('http://somewhere/Pise/cgi-bin/water.pl');
      $job = $program->run();

4 when running:
      $job = $program->run(-location => 'http://somewhere/Pise/cgi- 
bin/water.pl');

You can also retrieve a previous job results by providing its url:

   $job = $factory->job($url);

You get the url of a job by:

   $job->jobid;

----------------------------------------------

chris


From sac at bioperl.org  Tue Nov 27 01:41:59 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 17:41:59 -0800
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>

Chris,

Cood catch. You're on track here with one exception: WU blast and NCBI
blast behave differently in what they report in the hit table: WU
blast puts the raw score in the table not the bit score as NCBI blast
does (see example below for reference). WU blast also swaps their
location in the HSP header relative to how NCBI reports it. It would
be good to verify that the blast parser isn't befuddled by this. A
quick look at SearchIO::blast and it appears that data from the hit
table is always getting stored as score, not bits for WU blast. Not
sure if the HSP section data are parsed correctly. I'd recommend
looking into these things when you do your fixes.

So in the end, WU blast HSPs that are built from the hit table should
report a value for raw_score and punt on bits, but NCBI HSPs so
constructed should do the opposite. The downside to this arrangement
is that code that works for NCBI blast hits will need modification to
work for WU blast hits, but that is just the nature of the data. It
shouldn't be an issue for the majority of users that stick with one
flavor of blast and don't switch back and forth, or for users that get
their HSP scoring data from HSP sections rather than relying on the
hit table.

Ideally, the HSP object would know whether it was NCBI or WU-based and
issue an informative warning when attempting to access data it doesn't
have. One solution might be for the parser to put a 'WU-' in front of
the algorithm name for WU blast reports, so it would then be available
for the contained hit/hsp objects. This could break anything dependent
on algorithm name, so it would need some testing.

Steve

Example WU blast table and HSP header:
                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh...  4141  0.0       1
gb|AAC76922.1| (AE000468) aspartokinase II and homoserine...   844  3.1e-86   1
gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi...   483  2.8e-47   1
gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c...    97  0.0010    1

>gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I
            [Escherichia coli]
        Length = 820

 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0
 Identities = 820/820 (100%), Positives = 820/820 (100%)


Example NCBI blast table and HSP header:

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E...
120   3e-27
ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189...
115   8e-26

>ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397
           transcript:ENST00000357569
          Length = 425

 Score =  120 bits (301), Expect = 3e-27
 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%)


On Nov 26, 2007 10:59 AM, Chris Fields <cjfields at uiuc.edu> wrote:
> Steve, Bernd, (and Jason, since you may have some input on this as
> well),
>
> I am now looking into the bug Bernd submitted and it seems there is a
> serious discrepancy with the way the hit raw_score, bits, and
> significance is determined for Hit objects.  Unless I am mistaken
> these should always come from the best HSP when they are present,
> falling back to the hit table data only when no HSP alignments are
> present.  Under the latter conditions a minimal Hit object is made
> from data in the hit table, which reports the rounded bit score, not
> the raw score, so in those cases the raw score would be undefined
> (and you probably should get a nasty warning indicating there are no
> HSPs present to get the data from).
>
> What is occurring now, though, is the raw_score and significance is
> explicitly set from the hit table in the BLAST parser for the Hit
> object at all times, while the bits are correctly derived from the
> best HSP (no fallback to the hit table).  Changing to the behavior
> above results in several tests failing via SearchIO.t, with each
> failed test reporting the expected (read:correct) raw score.
>
> I'll look through the tests just in case, but I am planning on
> committing changes to the BLAST parsers, GenericHit, and SearchIO.t
> (to reflect the correct expected data) in the next day or two unless
> there are any objections.
>
> chris
>
>
> On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote:
>
> > On Nov 21, 2007 9:38 AM, Bernd Web <bernd.web at gmail.com> wrote:
> >> [snip]
> >>
> >> Further is is possible to get the raw_score of a hit. $hit->raw_score
> >> actually gets the bitscore (w/o decimal point).
> >
> > Hmmm. raw_score should not be the same as bit score. So given an
> > example blast hit line such as:
> >
> >        Score = 60.0 bits (30), Expect = 1e-06
> >
> > $hit->raw_score() should return 30, not 60, as you seem to be getting.
> >
> > Could you submit a bug report for this?  http://www.bioperl.org/
> > wiki/Bugs
> >
> > Thanks,
> > Steve
> >
> >>
> >> On Nov 21, 2007 5:42 PM, Bernd Web <bernd.web at gmail.com> wrote:
> >>> Hi Russell,
> >>>
> >>> I came across your question. At first I thought all was well on my
> >>> system, but indeed I also have these colouring problems.
> >>> I noted that scrore in the bgcolor callback gets a different value!
> >>> Printing score during hit parsing($hit->raw_score) gives the same
> >>> score as -description
> >>> my $score = $feature->score; However, printing score in the bgcolor
> >>> sub gives 2573!
> >>> All scores in the bgcolor routine all different and higher than the
> >>> real scores. Were you able to solve this colouring issue?
> >>>
> >>> Regards,
> >>> Bernd
> >>>
> >>>
> >>>> Hi all,
> >>>> I'm using a modified version of Lincoln's tutorial
> >>>> (http://www.bioperl.org/wiki/
> >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output)
> >>>> and I'm colouring the HSPs by setting the -bgcolor by score with
> >>>> a sub
> >>>> to give a similar image to that from NCBI but for some reason, my
> >>>> colours are coming out wrong (see attached example)
> >>>> They seem to be off by one but I can't see why.
> >>>>
> >>>> Any ideas?
> >>>>
> >>>> I can't be certain but I think it's only started doing this
> >>>> since our
> >>>> BLAST upgrade to 2.2.17 a few weeks ago.
> >>>>
> >>>> Here's the colouring code:
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> -------
> >>>> my $track = $panel->add_track(
> >>>>                               -glyph       => 'segments',
> >>>>                               -label       => 1,
> >>>>                               -connector   => 'dashed',
> >>>>                               -bgcolor     => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 my $score = $feature->score;
> >>>>                         return 'red'       if $score >= 200;
> >>>>                                     return 'fuchsia' if $score
> >>>> >= 80;
> >>>>                                     return 'lime'      if $score
> >>>> >= 50;
> >>>>                         return 'blue'      if $score >= 40;
> >>>>                                     return 'black';
> >>>>                                },
> >>>>                               -font2color  => 'gray',
> >>>>                               -sort_order  => 'high_score',
> >>>>                               -description => sub {
> >>>>                                 my $feature = shift;
> >>>>                                 return unless
> >>>> $feature->has_tag('description');
> >>>>                                 my ($description) =
> >>>> $feature->each_tag_value('description');
> >>>>                                 my $score = $feature->score;
> >>>>                                 "$description, score=$score";
> >>>>                                },
> >>>>                              );
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>> ---------
> >>>>
> >>>>
> >>>> Thanx,
> >>>>
> >>>> Russell Smithies
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ===================================================================
> >>>> ====
> >>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>> from AgResearch Limited is intended only for the persons or
> >>>> entities
> >>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>> material. Any review, retransmission, dissemination or other use
> >>>> of, or
> >>>> taking of any action in reliance upon, this information by
> >>>> persons or
> >>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>> Limited. If you have received this message in error, please
> >>>> notify the
> >>>> sender immediately.
> >>>> ===================================================================
> >>>> ====
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From sac at bioperl.org  Tue Nov 27 03:27:09 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Mon, 26 Nov 2007 19:27:09 -0800
Subject: [Bioperl-l] Installing bioperl-ext on Mac
In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu>
Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com>

Hi Jon,

I'd recommend downloading it into a separate location of your choosing
(~/lib/bioperl-ext for example) and running the installer as specified
in the docs that come with the download. Then you can include the
location you installed it into via a "use lib '~/lib/bioperl-ext'"
statement at the top of your script. It may be handy to install it as
a non-root user so that you don't alter the main perl installation.

This way your ext install will stay separate from your main bioperl
and perl installations.

There are some docs about the ext packages you might want to check out
at http://www.bioperl.org/wiki/Ext_package.

Steve

On Nov 21, 2007 4:35 PM, Jonathan Binkley <binkley at genome.stanford.edu> wrote:
> Hi,
>
> I installed bioperl on a Mac (OS 10.4, Intel) via fink,
> which put it here:
>
> /sw/lib/perl5/5.8.6/Bio/
>
> It seems to work fine, but I need bioperl-ext for
> Smith-Waterman alignments.
>
> So, into which directory should I download bioperl-ext and
> run the Makefile?
>
> Thanks.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From a_arya2000 at yahoo.com  Tue Nov 27 14:51:41 2007
From: a_arya2000 at yahoo.com (a_arya2000)
Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST)
Subject: [Bioperl-l] Bioperl-ext test fails
Message-ID: <615478.1036.qm@web60113.mail.yahoo.com>

Hello,
I downloaded latest bioperl-ext from bioperl website,
and I have io_lib v1.8.11 installed, and I was trying
to install Bio::SeqIO::staden::read (of bioperl-ext).
It compiled fine without any error but when I run make
test I got following output. 


ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib/lib', 'blib/arch')" t/*.t
t/staden_read....ok 3/94# Test 7 got: "0"
(t/staden_read.t at line 110 *TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
#  t/staden_read.t line 110 is:         ok(0, undef,
"We don't have the ability to write files for $format
format") for 1..7;
# Test 8 got: "0" (t/staden_read.t at line 110 fail #2
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 9 got: "0" (t/staden_read.t at line 110 fail #3
*TODO*)
#   Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 10 got: "0" (t/staden_read.t at line 110 fail
#4 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 11 got: "0" (t/staden_read.t at line 110 fail
#5 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 12 got: "0" (t/staden_read.t at line 110 fail
#6 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 13 got: "0" (t/staden_read.t at line 110 fail
#7 *TODO*)
#    Expected: <UNDEF> (We don't have the ability to
write files for abi format)
# Test 14 got: "0" (t/staden_read.t at line 62 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
#  t/staden_read.t line 62 is:  ok(0, undef, "Still
missing test files for $format format") for
(1..$formatlooptests);
# Test 15 got: "0" (t/staden_read.t at line 62 fail #2
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 16 got: "0" (t/staden_read.t at line 62 fail #3
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 17 got: "0" (t/staden_read.t at line 62 fail #4
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 18 got: "0" (t/staden_read.t at line 62 fail #5
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 19 got: "0" (t/staden_read.t at line 62 fail #6
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 20 got: "0" (t/staden_read.t at line 62 fail #7
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 21 got: "0" (t/staden_read.t at line 62 fail #8
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 22 got: "0" (t/staden_read.t at line 62 fail #9
*TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 23 got: "0" (t/staden_read.t at line 62 fail
#10 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 24 got: "0" (t/staden_read.t at line 62 fail
#11 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 25 got: "0" (t/staden_read.t at line 62 fail
#12 *TODO*)
#    Expected: <UNDEF> (Still missing test files for
alf format)
# Test 31 got: "0" (t/staden_read.t at line 107
*TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
#  t/staden_read.t line 107 is:             ok(0,
undef, "Can't write valid ctf files until we have a
trace object") for 1..7;
# Test 32 got: "0" (t/staden_read.t at line 107 fail
#2 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 33 got: "0" (t/staden_read.t at line 107 fail
#3 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 34 got: "0" (t/staden_read.t at line 107 fail
#4 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 35 got: "0" (t/staden_read.t at line 107 fail
#5 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 36 got: "0" (t/staden_read.t at line 107 fail
#6 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
# Test 37 got: "0" (t/staden_read.t at line 107 fail
#7 *TODO*)
#    Expected: <UNDEF> (Can't write valid ctf files
until we have a trace object)
t/staden_read....ok                                   
                      
All tests successful.
Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
0.15 csys =  1.71 CPU)


Anyone has any idea what might be going wrong here? By
the way, my OS is Linux. Thank you very much.

Arya


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/


From bix at sendu.me.uk  Tue Nov 27 15:41:38 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 15:41:38 +0000
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com>
References: <615478.1036.qm@web60113.mail.yahoo.com>
Message-ID: <474C3AB2.5050208@sendu.me.uk>

a_arya2000 wrote:
> Hello,
> I downloaded latest bioperl-ext from bioperl website,
> and I have io_lib v1.8.11 installed, and I was trying
> to install Bio::SeqIO::staden::read (of bioperl-ext).
> It compiled fine without any error but when I run make
> test I got following output. 
[...]
> All tests successful.
> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr + 
> 0.15 csys =  1.71 CPU)
> 
> 
> Anyone has any idea what might be going wrong here? By
> the way, my OS is Linux. Thank you very much.

Not being familiar with the test script or ext, I can at least say that 
nothing actually went wrong: 'All tests successful'. Apparently there 
are some things in the test script that are known by the author to not 
work quite right, so he marked them as 'todo'. The problems seem 
harmless in any case, with things returning 0 instead of undef.

So, unless you've reason to believe there is something significant going 
on, all is well.


From alison.waller at utoronto.ca  Mon Nov 26 21:06:35 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Mon, 26 Nov 2007 16:06:35 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
Message-ID: <005a01c83070$3a814580$d81efea9@AWALL>

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs help
with bioinformatics so I can analyze huge amounts of data to finish my
thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want to
have the flexibility to obtain information on say the top5.  I am pretty
sure I have done this wrong, any advice on how to correct my script to do
this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many hits
to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From bix at sendu.me.uk  Tue Nov 27 17:01:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 27 Nov 2007 17:01:36 +0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <474C4D70.2010206@sendu.me.uk>

alison waller wrote:
> I am trying to write a script that will parse large blast files (usually
> blastx) I also want to be able to specify how many hits I want to report
> information on.
> 
> Most of the time I will only want information on the top hit, but I want to
> have the flexibility to obtain information on say the top5.  I am pretty
> sure I have done this wrong, any advice on how to correct my script to do
> this, would be great.

[snip]

>       if ($top_hit=$result->next_hit) # this might be wrong - I want to
> specify how many hits to print results for

I didn't really pay attention to the rest of your code, but assuming it 
all works except for only ever giving you info for the top hit, you just 
need to change this 'if' to a loop of some kind.

# ...
my $hits = 0;

while (my $hit = $result->next_hit) {
  $hits++;
  last if $hits > $tophit;
  # ...
}


From David.Messina at sbc.su.se  Tue Nov 27 17:55:44 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 27 Nov 2007 18:55:44 +0100
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <474C4D70.2010206@sendu.me.uk>
References: <005a01c83070$3a814580$d81efea9@AWALL>
	<474C4D70.2010206@sendu.me.uk>
Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>

Hi Alison,
As Sendu mentioned, the key bit is adding a condition to the hit loop to
limit the number of hits that are printed. I didn't test the below
extensively, but give it a try...


Dave


#!/usr/local/bin/perl -w

# Parsing BLAST reports with BioPerl's Bio::SearchIO module
# alison waller November 2007

use strict;
use warnings;
use Bio::SearchIO;

my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }

my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query

#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";

my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);

print OUT
  "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

# Go through BLAST reports one by one
while ( my $result = $report->next_result ) {
    my $i = 0;
    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
        while ( my $hsp = $hit->next_hsp ) {

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t";
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t";
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t";
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n";
        }
    }

    if ($i == 0) { print OUT "no hits\n"; }
}


From Russell.Smithies at agresearch.co.nz  Tue Nov 27 19:31:29 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 28 Nov 2007 08:31:29 +1300
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>

Do the hits need to be sorted first or is this done automagicly?
I ask this as I know Megablast doesn't provide sorted output for most of
it's formats.

Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Dave Messina
> Sent: Wednesday, 28 November 2007 6:56 a.m.
> To: alison waller
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
> 
> Hi Alison,
> As Sendu mentioned, the key bit is adding a condition to the hit loop
to
> limit the number of hits that are printed. I didn't test the below
> extensively, but give it a try...
> 
> 
> Dave
> 
> 
> 
> #!/usr/local/bin/perl -w
> 
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
> 
> use strict;
> use warnings;
> use Bio::SearchIO;
> 
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
hits>\n";
> if (@ARGV != 2) { die $usage; }
> 
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                         # to report for each query
> 
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
$!\n";
> 
> my $report = new Bio::SearchIO(
>     -file   => "$infile",
>     -format => "blast"
> );
> 
> print OUT
>
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
> Qstrand\tHstrand\n";
> 
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>     my $i = 0;
>     while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>         while ( my $hsp = $hit->next_hsp ) {
> 
>             # Print some tab-delimited data about this hit
>             print OUT $result->query_name,     "\t";
>             print OUT $hit->name,              "\t";
>             print OUT $hit->significance,      "\t";
>             print OUT $hit->bits,              "\t";
>             print OUT $hsp->evalue,            "\t";
>             print OUT $hsp->percent_identity,  "\t";
>             print OUT $hsp->length('total'),   "\t";
>             print OUT $hsp->num_identical,     "\t";
>             print OUT $hsp->gaps,              "\t";
>             print OUT $hsp->strand('query'),   "\t";
>             print OUT $hsp->strand('hit'),     "\n";
>         }
>     }
> 
>     if ($i == 0) { print OUT "no hits\n"; }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Nov 27 21:09:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:09:43 -0600
Subject: [Bioperl-l] Bioperl-ext test fails
In-Reply-To: <474C3AB2.5050208@sendu.me.uk>
References: <615478.1036.qm@web60113.mail.yahoo.com>
	<474C3AB2.5050208@sendu.me.uk>
Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu>

You can always test it within the bioperl suite after it's installed;  
several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read.  In general  
though if it's passing tests it should be fine.

chris

On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote:

> a_arya2000 wrote:
>> Hello,
>> I downloaded latest bioperl-ext from bioperl website,
>> and I have io_lib v1.8.11 installed, and I was trying
>> to install Bio::SeqIO::staden::read (of bioperl-ext).
>> It compiled fine without any error but when I run make
>> test I got following output.
> [...]
>> All tests successful.
>> Files=1, Tests=94,  2 wallclock secs ( 1.56 cusr +
>> 0.15 csys =  1.71 CPU)
>>
>>
>> Anyone has any idea what might be going wrong here? By
>> the way, my OS is Linux. Thank you very much.
>
> Not being familiar with the test script or ext, I can at least say  
> that
> nothing actually went wrong: 'All tests successful'. Apparently there
> are some things in the test script that are known by the author to not
> work quite right, so he marked them as 'todo'. The problems seem
> harmless in any case, with things returning 0 instead of undef.
>
> So, unless you've reason to believe there is something significant  
> going
> on, all is well.


From cjfields at uiuc.edu  Tue Nov 27 21:00:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Nov 2007 15:00:33 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk>
	<628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C06249FA1@imail.agresearch.co.nz>
Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Wed Nov 28 01:06:30 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT)
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
Message-ID: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>

Hello,

I was playing around with Primer3, and I hit a problem. Not sure if it's a
bug or if I was doing something I wasn't supposed to, but if it's the
latter, I thought it might save someone else half an hour of banging their
head of a keyboard if I mentioned it:

What I was doing was roughly:

# create a primer3 obj
my $p3 = ...Primer3->new();

# loop through some sequences generating primers for
# each of them using the same primer3 obj
while (@some_bio_seqs){
  my $res = $p3->run;
  ...
}

This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC,
at which point it worked for a few sequences then I got a "can't place
primer on sequence"  error.

After a bit of faffing about, I think the problem occurs when no primers
are found. In which case $p3 still has the primers from the previous run,
which don't come from the current sequence, so can't be placed on it. I
tried calling $p3->cleanup in the loop, but that didn't work either.
Creating a new $p3 every time works fine.

Are you supposed to create a new Primer3 object for every sequence?
(Apologies if I missed the relevant bit of the docs).

Cheers,
Cass xx


From alison.waller at utoronto.ca  Tue Nov 27 21:32:07 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Tue, 27 Nov 2007 16:32:07 -0500
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu>
Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>

Thanks Everyone,

Your edits worked Dave, however after looking at the output I realized that
I only want information on the top hsp per query returned.  For example some
of the querys the top hit has two hsps so it returned both.

I tried to further edit it, but after 3 attempts they are all failing, I
think due to me using the loops wrong.

I also have another problem, I also want to retrieve the gi, this doesn't
seem to be straight forward as it should.  I found another method
_get_seq_identifiers, but this looks awkward, isn't there and object for the
gi?

I've pasted my non-working script below if there are any suggestions on how
to get it to print out just the first hsp per hit, that would be great.

Thanks,


#!/usr/local/bin/perl -w


# Parsing BLAST reports with BioPerl's Bio::SearchIO module 
# alison waller November 2007


use strict;
use warnings;
use Bio::SearchIO;


my $usage = "to run type: blast_parse_aw.pl <blast report> <# of hits>\n";
if (@ARGV != 2) { die $usage; }


my $infile  = $ARGV[0];
my $outfile = $infile . '.parsed';
my $tophit  = $ARGV[1]; # to specify in the command line how many hits
                        # to report for each query


#open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n";


my $report = new Bio::SearchIO(
    -file   => "$infile",
    -format => "blast"
);


print OUT
 
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
strand\tHstrand\n";


# Go through BLAST reports one by one
while (my $result = $report->next_result) {
	my $i=0;
	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
        

            # Print some tab-delimited data about this hit
            print OUT $result->query_name,     "\t";
            print OUT $hit->name,              "\t"; 
            print OUT $hit->significance,      "\t";
            print OUT $hit->bits,              "\t";
            print OUT $hsp->evalue,            "\t"; 
            print OUT $hsp->percent_identity,  "\t";
            print OUT $hsp->length('total'),   "\t";
            print OUT $hsp->num_identical,     "\t"; 
            print OUT $hsp->gaps,              "\t";
            print OUT $hsp->strand('query'),   "\t";
            print OUT $hsp->strand('hit'),     "\n"; 
        }
}
    if ($i == 0) { print OUT "no hits\n"; } 

}

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, November 27, 2007 4:01 PM
To: Smithies, Russell
Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results

The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dennis.prickett at bbsrc.ac.uk  Wed Nov 28 10:18:26 2007
From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C))
Date: Wed, 28 Nov 2007 10:18:26 -0000
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL>
References: <005a01c83070$3a814580$d81efea9@AWALL>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk>

Dear Alison
 
Or, if you are absolutely only interested in the top hit you could limit
it to that in the blast  command by adding the parameters  " -b 1 ".  

This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps,
etc).  Your blasts run faster and then you won't have to worry about how
to parse out the top blast hit(s).

However, if there are any caveats for using this parameter that I am not
aware of please let us know. 

Dennis Prickett
Institute of Animal Health
Compton, nr Newbury
RG2 9FS
United Kingdom


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller
Sent: 26 November 2007 21:07
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] help using SEARCH IO to parse blast results

Hello all,

 
It's the usual story, I'm an engineer turned biologist who now needs
help with bioinformatics so I can analyze huge amounts of data to finish
my thesis.  

 
I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want
to have the flexibility to obtain information on say the top5.  I am
pretty sure I have done this wrong, any advice on how to correct my
script to do this, would be great.

 
Thanks so much,

 
Alison

 
#!/usr/local/bin/perl -w

 
# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 
# to run type: blast_parse_aw.pl input.txt #of hits

 
my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many
hits to report for each query

 
open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 
$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 
print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
Qstrand\tHstrand\n";

 
# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 
******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From t.nugent at cs.ucl.ac.uk  Wed Nov 28 13:10:41 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Wed, 28 Nov 2007 13:10:41 +0000
Subject: [Bioperl-l] Helical Wheel module
Message-ID: <474D68D1.3080602@cs.ucl.ac.uk>

Hi everyone,

I've been drawing a lot of helical wheels recently so put all my code 
into a module. I don't think there's anything in bioperl to do this yet 
though there are a few programs written in perl and flash on the web to 
do the same thing. I was thinking this could fit into biographics. Has 
lots of options to adjust labels, colours, ttf fonts and can output to 
png & svg.

Tim

...

Here's the output, converted to jpg from svg:
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg

Module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz

Works like this:

use DrawHelicalWheel;

my $im = DrawHelicalWheel->new(-title=>$title,
                               -sequence=>$sequence,
                               -helices=>\@helices,
                               -ttf_font=>$font);
open(OUTPUT, ">$svg");
binmode OUTPUT;
print OUTPUT $im->svg;
close OUTPUT;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk
http://www.cs.ucl.ac.uk/staff/T.Nugent


From tristan.lefebure at gmail.com  Wed Nov 28 15:46:11 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:46:11 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281046.11146.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan


From bix at sendu.me.uk  Wed Nov 28 16:19:36 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Nov 2007 16:19:36 +0000
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <200711281046.11146.tnl7@cornell.edu>
References: <200711281046.11146.tnl7@cornell.edu>
Message-ID: <474D9518.7010201@sendu.me.uk>

Tristan Lefebure wrote:
> Hello!
> 
> I was wondering if there was a function to remove sites/columns of an 
> alignment. Something like: $aln->remove_sites(@sites_to_remove)
> I looked around Bio::SimpleAlign but did not find exactly that. There is 
> remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

You might want to take a second look at the docs. You can supply column 
number ranges to remove_columns(), so it does exactly what you want.


From tnl7 at cornell.edu  Wed Nov 28 15:44:17 2007
From: tnl7 at cornell.edu (Tristan Lefebure)
Date: Wed, 28 Nov 2007 10:44:17 -0500
Subject: [Bioperl-l] Remove sites of an alignment
Message-ID: <200711281044.17770.tnl7@cornell.edu>

Hello!

I was wondering if there was a function to remove sites/columns of an 
alignment. Something like: $aln->remove_sites(@sites_to_remove)
I looked around Bio::SimpleAlign but did not find exactly that. There is 
remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria.

I could recycle the '_remove_col' sub-function of 'remove_columns' to do so 
(it splits the alignment into sequence objects, removes the sites, and then 
regenerates an alignment object), but I would be surprised if there was 
nothing already doing the job...

Thanks

-Tristan


From cjfields at uiuc.edu  Wed Nov 28 13:57:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:57:27 -0600
Subject: [Bioperl-l] help using SEARCH IO to parse blast results
In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL>
Message-ID: <B3E0F9EA-9452-483E-AA17-5174B743B164@uiuc.edu>

I had some code which does this which I committed yesterday to CVS; it  
catches the GI for the query and the hits:

$result->query_gi;
$hit->ncbi_gi;

I am in the midst of fixing additional problems with WU-BLAST parsing  
but you are more than welcome to try it.

chris

On Nov 27, 2007, at 3:32 PM, alison waller wrote:

> Thanks Everyone,
>
> Your edits worked Dave, however after looking at the output I  
> realized that
> I only want information on the top hsp per query returned.  For  
> example some
> of the querys the top hit has two hsps so it returned both.
>
> I tried to further edit it, but after 3 attempts they are all  
> failing, I
> think due to me using the loops wrong.
>
> I also have another problem, I also want to retrieve the gi, this  
> doesn't
> seem to be straight forward as it should.  I found another method
> _get_seq_identifiers, but this looks awkward, isn't there and object  
> for the
> gi?
>
> I've pasted my non-working script below if there are any suggestions  
> on how
> to get it to print out just the first hsp per hit, that would be  
> great.
>
> Thanks,
>
>
> #!/usr/local/bin/perl -w
>
>
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
>
>
> use strict;
> use warnings;
> use Bio::SearchIO;
>
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits>\n";
> if (@ARGV != 2) { die $usage; }
>
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                        # to report for each query
>
>
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
>
> my $report = new Bio::SearchIO(
>    -file   => "$infile",
>    -format => "blast"
> );
>
>
> print OUT
>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tgaps\t
> strand\tHstrand\n";
>
>
> # Go through BLAST reports one by one
> while (my $result = $report->next_result) {
> 	my $i=0;
> 	while( ( $i++<$tophit) && (my $hit = $result->next_hit)){
>    	while (  ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) {
>
>
>            # Print some tab-delimited data about this hit
>            print OUT $result->query_name,     "\t";
>            print OUT $hit->name,              "\t";
>            print OUT $hit->significance,      "\t";
>            print OUT $hit->bits,              "\t";
>            print OUT $hsp->evalue,            "\t";
>            print OUT $hsp->percent_identity,  "\t";
>            print OUT $hsp->length('total'),   "\t";
>            print OUT $hsp->num_identical,     "\t";
>            print OUT $hsp->gaps,              "\t";
>            print OUT $hsp->strand('query'),   "\t";
>            print OUT $hsp->strand('hit'),     "\n";
>        }
> }
>    if ($i == 0) { print OUT "no hits\n"; }
>
> }
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, November 27, 2007 4:01 PM
> To: Smithies, Russell
> Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>
> The hits/HSPs are generally in the order they appear in the report.
>
> If you are looking for best/worst HSP after parsing you can use the
> $hit->hsp() method:
>
> # best and worst
> my $best = $hit->hsp('best'); # also 'first'
> my $worst = $hit->hsp('worst'); # also last
>
> The SearchIO text BLAST parser also has several options implemented
> for finer control:
>
>     -inclusion_threshold => e-value threshold for inclusion in the
>                             PSI-BLAST score matrix model (blastpgp)
>     -signif      => float or scientific notation number to be used
>                     as a P- or Expect value cutoff
>     -score       => integer or scientific notation number to be used
>                     as a blast score value cutoff
>     -bits        => integer or scientific notation number to be used
>                     as a bit score value cutoff
>     -hit_filter  => reference to a function to be used for
>                     filtering hits based on arbitrary criteria.
>                     All hits of each BLAST report must satisfy
>                     this criteria to be retained.
>                     If a hit fails this test, it is ignored.
>                     This function should take a
>                     Bio::Search::Hit::BlastHit.pm object as its first
>                     argument and return true
>                     if the hit should be retained.
>                     Sample filter function:
>                        -hit_filter => sub { $hit = shift;
>                                             $hit->gaps == 0; },
>                     (Note: -filt_func is synonymous with -hit_filter)
>     -overlap     => integer. The amount of overlap to permit between
>                     adjacent HSPs when tiling HSPs. A reasonable
> value is 2.
>                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.
>
> chris
>
> On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:
>
>> Do the hits need to be sorted first or is this done automagicly?
>> I ask this as I know Megablast doesn't provide sorted output for
>> most of
>> it's formats.
>>
>> Russell
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-
>>> bio.org] On Behalf Of Dave Messina
>>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>>> To: alison waller
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>>
>>> Hi Alison,
>>> As Sendu mentioned, the key bit is adding a condition to the hit  
>>> loop
>> to
>>> limit the number of hits that are printed. I didn't test the below
>>> extensively, but give it a try...
>>>
>>>
>>> Dave
>>>
>>>
>>>
>>> #!/usr/local/bin/perl -w
>>>
>>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>>> # alison waller November 2007
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SearchIO;
>>>
>>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
>> hits>\n";
>>> if (@ARGV != 2) { die $usage; }
>>>
>>> my $infile  = $ARGV[0];
>>> my $outfile = $infile . '.parsed';
>>> my $tophit  = $ARGV[1]; # to specify in the command line how many
>>> hits
>>>                       # to report for each query
>>>
>>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!
>>> \n";
>>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
>> $!\n";
>>>
>>> my $report = new Bio::SearchIO(
>>>   -file   => "$infile",
>>>   -format => "blast"
>>> );
>>>
>>> print OUT
>>>
>> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent
>> \tga
>> ps\t
>>> Qstrand\tHstrand\n";
>>>
>>> # Go through BLAST reports one by one
>>> while ( my $result = $report->next_result ) {
>>>   my $i = 0;
>>>   while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>>       while ( my $hsp = $hit->next_hsp ) {
>>>
>>>           # Print some tab-delimited data about this hit
>>>           print OUT $result->query_name,     "\t";
>>>           print OUT $hit->name,              "\t";
>>>           print OUT $hit->significance,      "\t";
>>>           print OUT $hit->bits,              "\t";
>>>           print OUT $hsp->evalue,            "\t";
>>>           print OUT $hsp->percent_identity,  "\t";
>>>           print OUT $hsp->length('total'),   "\t";
>>>           print OUT $hsp->num_identical,     "\t";
>>>           print OUT $hsp->gaps,              "\t";
>>>           print OUT $hsp->strand('query'),   "\t";
>>>           print OUT $hsp->strand('hit'),     "\n";
>>>       }
>>>   }
>>>
>>>   if ($i == 0) { print OUT "no hits\n"; }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =
>> = 
>> =====================================================================
>> Attention: The information contained in this message and/or
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or
>> privileged
>> material. Any review, retransmission, dissemination or other use of,
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by
>> AgResearch
>> Limited. If you have received this message in error, please notify  
>> the
>> sender immediately.
>> =
>> = 
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 13:54:39 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 07:54:39 -0600
Subject: [Bioperl-l] Helical Wheel module
In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk>
References: <474D68D1.3080602@cs.ucl.ac.uk>
Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu>

Looks good!  We recently added in your transmembrane module, so we  
could definitely add this in.

chris

On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've been drawing a lot of helical wheels recently so put all my code
> into a module. I don't think there's anything in bioperl to do this  
> yet
> though there are a few programs written in perl and flash on the web  
> to
> do the same thing. I was thinking this could fit into biographics. Has
> lots of options to adjust labels, colours, ttf fonts and can output to
> png & svg.
>
> Tim
>
> ...
>
> Here's the output, converted to jpg from svg:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg
>
> Module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz
>
> Works like this:
>
> use DrawHelicalWheel;
>
> my $im = DrawHelicalWheel->new(-title=>$title,
>                               -sequence=>$sequence,
>                               -helices=>\@helices,
>                               -ttf_font=>$font);
> open(OUTPUT, ">$svg");
> binmode OUTPUT;
> print OUTPUT $im->svg;
> close OUTPUT;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> http://www.cs.ucl.ac.uk/staff/T.Nugent
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Nov 28 18:43:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Nov 2007 12:43:58 -0600
Subject: [Bioperl-l] coloring of HSPs in blast panel
In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
References: <4701AEE6.6070506@web.de>
	<D5DBA313349A4B458528BE63B387F36C05DCDB97@imail.agresearch.co.nz>
	<4702BC5B.7040407@web.de>
	<62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C05DCDC73@imail.agresearch.co.nz>
	<e572b3c70711210834t4c70da0ey87ecef6eb92e86fd@mail.gmail.com>
	<716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com>
	<716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com>
	<8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com>
	<C9E60538-FBA6-4E53-8842-0C7D987CE364@uiuc.edu>
	<8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com>
Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu>


On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote:

> Chris,
>
> Cood catch. You're on track here with one exception: WU blast and NCBI
> blast behave differently in what they report in the hit table: WU
> blast puts the raw score in the table not the bit score as NCBI blast
> does (see example below for reference). WU blast also swaps their
> location in the HSP header relative to how NCBI reports it. It would
> be good to verify that the blast parser isn't befuddled by this. A
> quick look at SearchIO::blast and it appears that data from the hit
> table is always getting stored as score, not bits for WU blast. Not
> sure if the HSP section data are parsed correctly. I'd recommend
> looking into these things when you do your fixes.

What I have now after commits is:

GenericHit - use the best HSP when possible for bits, score/raw_score,  
significance.  When there is no HSP, construct a minimal Hit object  
using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST  
maps to bits(), both map evalue/pvalue to significance).  HSP mapping  
seems to be correct.

One issue that has popped up is GenericHit::significance  
preferentially uses the best HSP.  However, GenericHSP::significance  
uses evalues preferentially over pvalues; both Expect and P appear to  
be parsed for WU-BLAST HSPs now (so the evalue is reported); this  
apparently wasn't always the case if I read the GenericHit docs  
correctly.  As NCBI BLAST doesn't report pvalues we could change that  
so it preferentially returns a pvalue if present, falling back to an  
evalue.   This would match what is found hit table more closely and  
resembles what is documented for the method (for significance(), WU- 
BLAST gets pvalues, NCBI BLAST gets evalues).

> So in the end, WU blast HSPs that are built from the hit table should
> report a value for raw_score and punt on bits, but NCBI HSPs so
> constructed should do the opposite. The downside to this arrangement
> is that code that works for NCBI blast hits will need modification to
> work for WU blast hits, but that is just the nature of the data. It
> shouldn't be an issue for the majority of users that stick with one
> flavor of blast and don't switch back and forth, or for users that get
> their HSP scoring data from HSP sections rather than relying on the
> hit table.

In general I get my data from the HSPs, so this shouldn't be a  
significant issue (bad pun).  I did find that changing it so that Hit  
objects use HSP data pointed out issues with test data; hit table raw/ 
bit scores were rounded from the HSP score data or vice versa since  
all data came from the hit table, so tests flunked.

I think changing the way minimal hit objects report data (particularly  
for NCBI BLAST) will lead to a lot of confusion unless we clarify  
warnings when one or the other is missing (as you also indicated).   
I'm working on that now.

> Ideally, the HSP object would know whether it was NCBI or WU-based and
> issue an informative warning when attempting to access data it doesn't
> have. One solution might be for the parser to put a 'WU-' in front of
> the algorithm name for WU blast reports, so it would then be available
> for the contained hit/hsp objects. This could break anything dependent
> on algorithm name, so it would need some testing.
>
> Steve


I can probably work around as noted above that unless you think it's  
warranted to add a 'WU' designation (the version info in the Result  
object has 'WashU' attached, so one could feasibly use that for  
distinguishing the two report types).

Anyway, I'm committing my first batch of fixes, the significance test  
will fail for at least a day until I can look into it more.

chris


From tristan.lefebure at gmail.com  Wed Nov 28 19:03:44 2007
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 28 Nov 2007 14:03:44 -0500
Subject: [Bioperl-l] Remove sites of an alignment
In-Reply-To: <474D9518.7010201@sendu.me.uk>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
Message-ID: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>

Hoops. I was reading the BioPerl 1.4 documentation. Actually,
http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to
http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be
the 1.4documentation...

Thank you, it works great.


On Nov 28, 2007 11:19 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Tristan Lefebure wrote:
> > Hello!
> >
> > I was wondering if there was a function to remove sites/columns of an
> > alignment. Something like: $aln->remove_sites(@sites_to_remove)
> > I looked around Bio::SimpleAlign but did not find exactly that. There is
> > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch'
> criteria.
>
> You might want to take a second look at the docs. You can supply column
> number ranges to remove_columns(), so it does exactly what you want.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Russell.Smithies at agresearch.co.nz  Wed Nov 28 21:57:14 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 29 Nov 2007 10:57:14 +1300
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>

Has anyone got a good example of parsing ASN.1 with
Bio::SeqIO::entrezgene?
I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
and my Perl isn't that good  :-(

Russell
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From stefan.kirov at bms.com  Wed Nov 28 22:16:18 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time)
Subject: [Bioperl-l] Parsing Entrez Gene ASN.1
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
References: <200711281046.11146.tnl7@cornell.edu>
	<474D9518.7010201@sendu.me.uk>
	<d31f7c40711281103g65f55fcfxa7e5d308a5948e4a@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C0624A3CC@imail.agresearch.co.nz>
Message-ID: <Pine.WNT.4.64.0711281708590.21768@A103728.hpw.stf.bms.com>

Here is an example for GO, will send the one for KEGG later:
my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene',
 	-service_record=>'yes');#, -locuslink=>'convert');
while (my $seq=$eio->next_seq) {
 	my $gid=$seq->accession_number;
 	foreach my $ot ($ann->get_Annotations('OntologyTerm')) {
     		next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers
     		my $evid=$ot->comment;
     		$evid=~s/evidence: //i;
     		my @ref=$ot->term->get_references; #Really there should be just one?
     		my $id=$ot->identifier;
     		my $fid='GO:' . sprintf("%07u",$id);
     		print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n";
 	}
}
Please note there is a bug in the parser that makes it suck a lot of RAM. 
I am fixing this and will commit probably by the week's end- you will have 
to update at that point. If you work with few records this should not 
matter.
Stefan


On Thu, 29 Nov 2007, Smithies, Russell wrote:

> Has anyone got a good example of parsing ASN.1 with
> Bio::SeqIO::entrezgene?
> I'm trying to get GO ids and KEGG terms out but it's quite deeply nested
> and my Perl isn't that good  :-(
>
> Russell
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Nov 29 23:06:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 17:06:42 -0600
Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml
Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu>

For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST  
parsing in Bio::SearchIO::blastxml (though it appears to be pretty  
stable!).  Since there isn't any easy way to distinguish between  
normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to  
BLAST, you have to indicate how the report is to be parsed by passing  
in a '-blasttype' parameter:

$searchio = Bio::SearchIO->new('-tempfile' => 1,
        '-format' => 'blastxml',
        '-file'   => 'psiblast.xml',
        '-blasttype' => 'psiblast');

Otherwise it chunks the individual iterations out as separate BLAST  
reports and parses them as separate reports.

Tests have also been added to SearchIO.t.  I will update the HOWTO and  
blastxml docs soon.

chris


From cjfields at uiuc.edu  Fri Nov 30 02:41:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 20:41:49 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Primer3
In-Reply-To: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
References: <Pine.LNX.4.58.0711280035120.17343@localhost.localdomain>
Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu>

It's probably safer to create a new instance each time but it really  
shouldn't be necessary for a wrapper module; this sounds like a bug to  
me.  Could you file it in Bugzilla?

On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote:

> Hello,
>
> I was playing around with Primer3, and I hit a problem. Not sure if  
> it's a
> bug or if I was doing something I wasn't supposed to, but if it's the
> latter, I thought it might save someone else half an hour of banging  
> their
> head of a keyboard if I mentioned it:
>
> What I was doing was roughly:
>
> # create a primer3 obj
> my $p3 = ...Primer3->new();
>
> # loop through some sequences generating primers for
> # each of them using the same primer3 obj
> while (@some_bio_seqs){
>  my $res = $p3->run;
>  ...
> }
>
> This worked fine for a while, but broke when I tried to set  
> PRIMER_MIN_GC,
> at which point it worked for a few sequences then I got a "can't place
> primer on sequence"  error.
>
> After a bit of faffing about, I think the problem occurs when no  
> primers
> are found. In which case $p3 still has the primers from the previous  
> run,
> which don't come from the current sequence, so can't be placed on  
> it. I
> tried calling $p3->cleanup in the loop, but that didn't work either.
> Creating a new $p3 every time works fine.
>
> Are you supposed to create a new Primer3 object for every sequence?
> (Apologies if I missed the relevant bit of the docs).
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From paulhengen at coh.org  Thu Nov 29 01:20:42 2007
From: paulhengen at coh.org (Paul N. Hengen)
Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST)
Subject: [Bioperl-l]  Collecting genomic DNA sequences using Entrez IDs
Message-ID: <14017289.post@talk.nabble.com>


Hi.

I have a number of gene IDs from Entrez and I want to find the
start and end locations in the human genome. This seemed simple
enough, so I started working through some of the examples for
using the EntrezGene module at www.bioperl.org  Of course this
did not work because the core installation does not include this
module. So, I think I have two choices (1) install the module (how?),
or (2) find an easier way to get the locations in the human genome.
I want to use the locations to grab sequences out of the genome.
Can anyone offer advice on this? Thanks.

-Paul.

--
Paul N. Hengen, Ph.D.
Hematopoietic Stem Cell and Leukemia Research
City of Hope National Medical Center
1500 E. Duarte Road, Duarte, CA 91010 USA
mailto:paulhengen at coh.org

-- 
View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Viktor.Martyanov at Dartmouth.EDU  Thu Nov 29 20:20:19 2007
From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov)
Date: 29 Nov 2007 15:20:19 -0500
Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases
Message-ID: <193573097@newdonner.Dartmouth.EDU>

A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 445 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071129/a6380324/attachment-0004.bin>

From alison.waller at utoronto.ca  Thu Nov 29 16:20:59 2007
From: alison.waller at utoronto.ca (alison waller)
Date: Thu, 29 Nov 2007 11:20:59 -0500
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from
	CVS)
Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL>

Hi all,

 
I would like to install the CVS version of bioperl  as I know of some code
changes that will be useful to me.  However, I am having problems installing
it.  

I am trying to install bioperl in my home directly on a linux cluster.  

 
I used

 
> cd bioperl-live

*       perl Build.PL -install /home/awaller

 
However after the build command I got a lot of errors.  Do I have to also
have perl installed in my home directory??  There is perl installed on the
cluster in /usr/bin.  Do I need to point to this or does Build.PL
automatically look there?  I noticed a few errors about not having
permission and a few about not being able to connect. I've copied a portion
of the messages after my Build.pl command.  

 
Any help would be appreciated,

 
alison 

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/02packages.details.txt.gz

Trying to get away with old file:

3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
/root/.cpan/sources/modules/02packages.details.txt.gz

Going to read /root/.cpan/sources/modules/02packages.details.txt.gz

  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT

 
  There's a new CPAN.pm version (v1.9204) available!

  [Current version is v1.7601]

  You might want to try

    install Bundle::CPAN

    reload cpan

  without quitting the current session. It should be a seamless upgrade

  while we are running...

 
Warning: You are not allowed to write into directory
"/root/.cpan/sources/modules".

    I'll continue, but if you encounter problems, they may be due

    to insufficient permissions.

Fetching with LWP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[Cannot write to
'/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied]

Fetching with Net::FTP:

  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from ftp.nrc.ca

Fetching with LWP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[FTP close response: 500 Network seems to
have barfed - Let's all phone our ISP and go postal!

Unknown command.

]

Fetching with Net::FTP:

  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
Permission denied

 at /usr/share/perl/5.8/CPAN.pm line 2265

Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca

Fetching with LWP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'cpan.mirror.cygnal.ca']

Fetching with Net::FTP:

  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Fetching with LWP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
'mirror.isurf.ca']

Fetching with Net::FTP:

  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

 
Trying with "/usr/bin/lynx -source" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/lynx -source" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/lynx -source
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/ncftp" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

Use ncftpget or ncftpput to handle file URLs.

 
System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

 
Trying with "/usr/bin/wget -O -" to get

    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz

sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied

 
System call "/usr/bin/wget -O -
"ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
/root/.cpan/sources/modules/03modlist.data"

returned status 1 (wstat 256), left

/root/.cpan/sources/modules/03modlist.data.gz with size 141973

Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

Local directory now /root/.cpan/sources/modules

local: 03modlist.data.gz: Permission denied

Bad luck... Still failed!

Can't access URL
ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: cpan.mirror.cygnal.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL
ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.

 
Issuing "/usr/bin/ftp -n"

ftp: mirror.isurf.ca: Unknown host

Not connected.

Local directory now /root/.cpan/sources/modules

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Not connected.

Bad luck... Still failed!

Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz.

 
Please check, if the URLs I found in your configuration file

(ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,

ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are

valid. The urllist can be edited. E.g. with 'o conf urllist push

ftp://myurl/'

 
Could not fetch modules/03modlist.data.gz

Trying to get away with old file:

3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
/root/.cpan/sources/modules/03modlist.data.gz

Going to read /root/.cpan/sources/modules/03modlist.data.gz

Going to write /root/.cpan/Metadata

can't create /root/.cpan/Metadata: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 3432

Running install for module Test::Harness

Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz

mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
/usr/share/perl/5.8/CPAN.pm line 2342

******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  
From cjfields at uiuc.edu  Fri Nov 30 04:53:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:53:09 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <D344C28E-BC9B-4226-AD15-149EA001FCAB@uiuc.edu>

Alison,

There are directions on how to do this here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA

(TinyURL link)
http://tinyurl.com/3263dd

Note the additional configuration for CPAN in that section; you'll  
need to set up CPAN so it installs everything locally.

chris

On Nov 29, 2007, at 10:20 AM, alison waller wrote:

> Hi all,
>
>
>
> I would like to install the CVS version of bioperl  as I know of  
> some code
> changes that will be useful to me.  However, I am having problems  
> installing
> it.
>
> I am trying to install bioperl in my home directly on a linux cluster.
>
>
>
> I used
>
>
>
>> cd bioperl-live
>
> *       perl Build.PL -install /home/awaller
>
>
>
> However after the build command I got a lot of errors.  Do I have to  
> also
> have perl installed in my home directory??  There is perl installed  
> on the
> cluster in /usr/bin.  Do I need to point to this or does Build.PL
> automatically look there?  I noticed a few errors about not having
> permission and a few about not being able to connect. I've copied a  
> portion
> of the messages after my Build.pl command.
>
>
>
> Any help would be appreciated,
>
>
>
> alison
>
>
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz.
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/02packages.details.txt.gz
>
> Trying to get away with old file:
>
> 3604718  584 -rw-r--r--  1 0        0          592967 Nov 12 22:53
> /root/.cpan/sources/modules/02packages.details.txt.gz
>
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
>
>  Database was generated on Sat, 10 Nov 2007 22:36:34 GMT
>
>
>
>  There's a new CPAN.pm version (v1.9204) available!
>
>  [Current version is v1.7601]
>
>  You might want to try
>
>    install Bundle::CPAN
>
>    reload cpan
>
>  without quitting the current session. It should be a seamless upgrade
>
>  while we are running...
>
>
>
> Warning: You are not allowed to write into directory
> "/root/.cpan/sources/modules".
>
>    I'll continue, but if you encounter problems, they may be due
>
>    to insufficient permissions.
>
> Fetching with LWP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[Cannot write to
> '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission  
> denied]
>
> Fetching with Net::FTP:
>
>  ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from ftp.nrc.ca
>
> Fetching with LWP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[FTP close response: 500 Network  
> seems to
> have barfed - Let's all phone our ISP and go postal!
>
> Unknown command.
>
> ]
>
> Fetching with Net::FTP:
>
>  ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz:
> Permission denied
>
> at /usr/share/perl/5.8/CPAN.pm line 2265
>
> Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca
>
> Fetching with LWP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'cpan.mirror.cygnal.ca']
>
> Fetching with Net::FTP:
>
>  ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Fetching with LWP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname
> 'mirror.isurf.ca']
>
> Fetching with Net::FTP:
>
>  ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/lynx -source" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/lynx -source
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/ncftp" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> Use ncftpget or ncftpput to handle file URLs.
>
>
>
> System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" "
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
>
>
> Trying with "/usr/bin/wget -O -" to get
>
>    ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz
>
> sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission  
> denied
>
>
>
> System call "/usr/bin/wget -O -
> "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz"  >
> /root/.cpan/sources/modules/03modlist.data"
>
> returned status 1 (wstat 256), left
>
> /root/.cpan/sources/modules/03modlist.data.gz with size 141973
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> Local directory now /root/.cpan/sources/modules
>
> local: 03modlist.data.gz: Permission denied
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: cpan.mirror.cygnal.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz.
>
>
>
> Issuing "/usr/bin/ftp -n"
>
> ftp: mirror.isurf.ca: Unknown host
>
> Not connected.
>
> Local directory now /root/.cpan/sources/modules
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Not connected.
>
> Bad luck... Still failed!
>
> Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz 
> .
>
>
>
> Please check, if the URLs I found in your configuration file
>
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
>
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ 
> CPAN) are
>
> valid. The urllist can be edited. E.g. with 'o conf urllist push
>
> ftp://myurl/'
>
>
>
> Could not fetch modules/03modlist.data.gz
>
> Trying to get away with old file:
>
> 3604719  144 -rw-r--r--  1 0        0          141973 Nov 12 22:53
> /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
>
> Going to write /root/.cpan/Metadata
>
> can't create /root/.cpan/Metadata: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 3432
>
> Running install for module Test::Harness
>
> Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz
>
> mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at
> /usr/share/perl/5.8/CPAN.pm line 2342
>
> ******************************************
> Alison S. Waller  M.A.Sc.
> Doctoral Candidate
> awaller at chem-eng.utoronto.ca
> 416-978-4222 (lab)
> Department of Chemical Engineering
> Wallberg Building
> 200 College st.
> Toronto, ON
> M5S 3E5
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Nov 30 04:57:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 29 Nov 2007 22:57:36 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>

Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
core (I think they were added prior to the 1.5.1 release, but I'm not  
positive).  If possible you should try installing bioperl 1.5.2 or the  
latest code from CVS.

For directions on installing Bioperl for most OS's go here:

http://www.bioperl.org/wiki/Installing_BioPerl

 From CVS:

http://www.bioperl.org/wiki/Using_CVS

chris

On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:

>
> Hi.
>
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
>
> -Paul.
>
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org
>
> -- 
> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Nov 30 08:45:57 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 30 Nov 2007 08:45:57 +0000
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
 from	CVS)
In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL>
References: <002501c832a3$d3e09cf0$d81efea9@AWALL>
Message-ID: <474FCDC5.5020100@sendu.me.uk>

alison waller wrote:
> I would like to install the CVS version of bioperl  as I know of some code
> changes that will be useful to me.  However, I am having problems installing
> it.  
> 
> I am trying to install bioperl in my home directly on a linux cluster.  
[...]
> Please check, if the URLs I found in your configuration file
> (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/,
> ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are
> valid. The urllist can be edited. E.g. with 'o conf urllist push
> ftp://myurl/'

Either these urls are invalid as suggested (try setting the urllist to 
nothing), or your linux cluster doesn't have internet access. You can't 
do a 'proper' install of BioPerl and its dependencies without internet 
access.

However, for most purposes simply downloading the BioPerl modules (ie. 
from a different machine with internet access) and pointing your 
PERL5LIB to their location is sufficient. You can download CVS modules 
from the BioPerl website individually, or as a tarball or everything.


From MEC at stowers-institute.org  Fri Nov 30 14:12:09 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 30 Nov 2007 08:12:09 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
References: <14017289.post@talk.nabble.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>

How many, how often?

Use ensembl biomart!

First time interactively.

Then if you to pipeline it, take the perl code it generates for you and
run it - of course you'll have to install the Ensembl Perl API....


Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Paul N. Hengen
> Sent: Wednesday, November 28, 2007 7:21 PM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
> 
> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find 
> the start and end locations in the human genome. This seemed 
> simple enough, so I started working through some of the 
> examples for using the EntrezGene module at www.bioperl.org  
> Of course this did not work because the core installation 
> does not include this module. So, I think I have two choices 
> (1) install the module (how?), or (2) find an easier way to 
> get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research City of Hope 
> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 
> USA mailto:paulhengen at coh.org
> 
> --
> View this message in context: 
> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
> ntrez-IDs-tf4894403.html#a14017289
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bosborne11 at verizon.net  Fri Nov 30 14:38:58 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 30 Nov 2007 09:38:58 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <14017289.post@talk.nabble.com>
Message-ID: <C3758AB2.10609%bosborne11@verizon.net>

Paul,

Have you taken a look at this page?

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

There's code there that looks similar to what you're proposing.


Brian O.


On 11/28/07 8:20 PM, "Paul N. Hengen" <paulhengen at coh.org> wrote:

> 
> Hi.
> 
> I have a number of gene IDs from Entrez and I want to find the
> start and end locations in the human genome. This seemed simple
> enough, so I started working through some of the examples for
> using the EntrezGene module at www.bioperl.org  Of course this
> did not work because the core installation does not include this
> module. So, I think I have two choices (1) install the module (how?),
> or (2) find an easier way to get the locations in the human genome.
> I want to use the locations to grab sequences out of the genome.
> Can anyone offer advice on this? Thanks.
> 
> -Paul.
> 
> --
> Paul N. Hengen, Ph.D.
> Hematopoietic Stem Cell and Leukemia Research
> City of Hope National Medical Center
> 1500 E. Duarte Road, Duarte, CA 91010 USA
> mailto:paulhengen at coh.org


From cjfields at uiuc.edu  Fri Nov 30 15:47:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 09:47:32 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47502C75.60809@bms.com>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>

My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
Mingyi Liu if he would like to include this parser with BioPerl (since  
it requires it, makes sense to me, and it avoids the circular  
dependency that has plagued these modules).

chris

On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:

> Chris Fields wrote:
> Chris,
> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
> low-level parser and is not part of bioperl. There is a circular
> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
> Paul, you can get it from CPAN and this should make
> Bio::SeqIO::entrezgene functional for you.
> Stefan
>
>
>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>> core (I think they were added prior to the 1.5.1 release, but I'm not
>> positive).  If possible you should try installing bioperl 1.5.2 or  
>> the
>> latest code from CVS.
>>
>> For directions on installing Bioperl for most OS's go here:
>>
>> http://www.bioperl.org/wiki/Installing_BioPerl
>>
>> From CVS:
>>
>> http://www.bioperl.org/wiki/Using_CVS
>>
>> chris
>>
>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>
>>
>>> Hi.
>>>
>>> I have a number of gene IDs from Entrez and I want to find the
>>> start and end locations in the human genome. This seemed simple
>>> enough, so I started working through some of the examples for
>>> using the EntrezGene module at www.bioperl.org  Of course this
>>> did not work because the core installation does not include this
>>> module. So, I think I have two choices (1) install the module  
>>> (how?),
>>> or (2) find an easier way to get the locations in the human genome.
>>> I want to use the locations to grab sequences out of the genome.
>>> Can anyone offer advice on this? Thanks.
>>>
>>> -Paul.
>>>
>>> --
>>> Paul N. Hengen, Ph.D.
>>> Hematopoietic Stem Cell and Leukemia Research
>>> City of Hope National Medical Center
>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>> mailto:paulhengen at coh.org
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Fri Nov 30 16:12:22 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 11:12:22 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
	<47502C75.60809@bms.com>
	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
Message-ID: <47503666.8090004@bms.com>

Chris Fields wrote:
> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
> Mingyi Liu if he would like to include this parser with BioPerl (since  
> it requires it, makes sense to me, and it avoids the circular  
> dependency that has plagued these modules).
>   
Yes, I think this would be a good step.
Stefan
> chris
>
> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>
>   
>> Chris Fields wrote:
>> Chris,
>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>> low-level parser and is not part of bioperl. There is a circular
>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>> Paul, you can get it from CPAN and this should make
>> Bio::SeqIO::entrezgene functional for you.
>> Stefan
>>
>>
>>     
>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>> the
>>> latest code from CVS.
>>>
>>> For directions on installing Bioperl for most OS's go here:
>>>
>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>
>>> From CVS:
>>>
>>> http://www.bioperl.org/wiki/Using_CVS
>>>
>>> chris
>>>
>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>
>>>
>>>       
>>>> Hi.
>>>>
>>>> I have a number of gene IDs from Entrez and I want to find the
>>>> start and end locations in the human genome. This seemed simple
>>>> enough, so I started working through some of the examples for
>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>> did not work because the core installation does not include this
>>>> module. So, I think I have two choices (1) install the module  
>>>> (how?),
>>>> or (2) find an easier way to get the locations in the human genome.
>>>> I want to use the locations to grab sequences out of the genome.
>>>> Can anyone offer advice on this? Thanks.
>>>>
>>>> -Paul.
>>>>
>>>> --
>>>> Paul N. Hengen, Ph.D.
>>>> Hematopoietic Stem Cell and Leukemia Research
>>>> City of Hope National Medical Center
>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>> mailto:paulhengen at coh.org
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From stefan.kirov at bms.com  Fri Nov 30 15:29:57 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Fri, 30 Nov 2007 10:29:57 -0500
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
References: <14017289.post@talk.nabble.com>
	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>
Message-ID: <47502C75.60809@bms.com>

Chris Fields wrote:
Chris,
Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
low-level parser and is not part of bioperl. There is a circular
dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
Paul, you can get it from CPAN and this should make
Bio::SeqIO::entrezgene functional for you.
Stefan


> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- 
> core (I think they were added prior to the 1.5.1 release, but I'm not  
> positive).  If possible you should try installing bioperl 1.5.2 or the  
> latest code from CVS.
>
> For directions on installing Bioperl for most OS's go here:
>
> http://www.bioperl.org/wiki/Installing_BioPerl
>
>  From CVS:
>
> http://www.bioperl.org/wiki/Using_CVS
>
> chris
>
> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>
>   
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find the
>> start and end locations in the human genome. This seemed simple
>> enough, so I started working through some of the examples for
>> using the EntrezGene module at www.bioperl.org  Of course this
>> did not work because the core installation does not include this
>> module. So, I think I have two choices (1) install the module (how?),
>> or (2) find an easier way to get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research
>> City of Hope National Medical Center
>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>> mailto:paulhengen at coh.org
>>
>> -- 
>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From arareko at campus.iztacala.unam.mx  Fri Nov 30 17:01:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 30 Nov 2007 11:01:29 -0600
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <47503666.8090004@bms.com>
References: <14017289.post@talk.nabble.com>	<B1AB4B92-2B22-47BE-A0B3-72BC1F27587A@uiuc.edu>	<47502C75.60809@bms.com>	<9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu>
	<47503666.8090004@bms.com>
Message-ID: <475041E9.8050909@campus.iztacala.unam.mx>

I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the 
past, he mentioned he doesn't track the list closely).

Mauricio.

Stefan Kirov wrote:
> Chris Fields wrote:
>> My bad.  I always forget about Bio::ASN1::Entrezgene.  We should ask  
>> Mingyi Liu if he would like to include this parser with BioPerl (since  
>> it requires it, makes sense to me, and it avoids the circular  
>> dependency that has plagued these modules).
>>   
> Yes, I think this would be a good step.
> Stefan
>> chris
>>
>> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote:
>>
>>   
>>> Chris Fields wrote:
>>> Chris,
>>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the
>>> low-level parser and is not part of bioperl. There is a circular
>>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think)....
>>> Paul, you can get it from CPAN and this should make
>>> Bio::SeqIO::entrezgene functional for you.
>>> Stefan
>>>
>>>
>>>     
>>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl-
>>>> core (I think they were added prior to the 1.5.1 release, but I'm not
>>>> positive).  If possible you should try installing bioperl 1.5.2 or  
>>>> the
>>>> latest code from CVS.
>>>>
>>>> For directions on installing Bioperl for most OS's go here:
>>>>
>>>> http://www.bioperl.org/wiki/Installing_BioPerl
>>>>
>>>> From CVS:
>>>>
>>>> http://www.bioperl.org/wiki/Using_CVS
>>>>
>>>> chris
>>>>
>>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote:
>>>>
>>>>
>>>>       
>>>>> Hi.
>>>>>
>>>>> I have a number of gene IDs from Entrez and I want to find the
>>>>> start and end locations in the human genome. This seemed simple
>>>>> enough, so I started working through some of the examples for
>>>>> using the EntrezGene module at www.bioperl.org  Of course this
>>>>> did not work because the core installation does not include this
>>>>> module. So, I think I have two choices (1) install the module  
>>>>> (how?),
>>>>> or (2) find an easier way to get the locations in the human genome.
>>>>> I want to use the locations to grab sequences out of the genome.
>>>>> Can anyone offer advice on this? Thanks.
>>>>>
>>>>> -Paul.
>>>>>
>>>>> --
>>>>> Paul N. Hengen, Ph.D.
>>>>> Hematopoietic Stem Cell and Leukemia Research
>>>>> City of Hope National Medical Center
>>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA
>>>>> mailto:paulhengen at coh.org
>>>>>
>>>>> -- 
>>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289
>>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>         
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Fri Nov 30 20:21:13 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 30 Nov 2007 12:21:13 -0800
Subject: [Bioperl-l] Trying to find multiple homologs in multiple
	databases
In-Reply-To: <193573097@newdonner.Dartmouth.EDU>
References: <193573097@newdonner.Dartmouth.EDU>
Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org>

Viktor -
Bio::SearchIO helps you parse BLAST reports, but don't underestimate  
the power of going as low-tech as possible and outputting scores with  
the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular  
format that is parseable with the 'split' function in Perl.

See the wiki http://bioperl.org/wiki for HOWTOs and examples of using  
the parsers.

You might also consider already-written tools like OrthoMCL,  
InParanoid, and other that help you define relationships like   
orthologs and paralogs among species. There also exist a few  
published web resources that have pre-computed homologs for you,  
might take a look around first unless the point of the project is to  
learn how to run these kinds of searches.

For general Perl help consider Perlmonks.org and some of  the  
introductory books that are available.
-jason
--
Jason Stajich
jason at bioperl.org

On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote:

> Hello,
>
> My name is Viktor Martyanov and I am a Ph.D. student in biology at  
> Dartmouth.
>
> I need to be able to use a set of genes or FASTA sequences from S.  
> cerevisiae and retrieve a set of corresponding homologs from other  
> fungal species via BLASTP searches.
>
> I would like to find out if there are Perl scripts that approach  
> this problem. By the way, is there a Perl community or forum where  
> I could post this question?
>
> Thanks very much.  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri Nov 30 22:03:23 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 30 Nov 2007 15:03:23 -0700
Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs
In-Reply-To: <CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
References: <14017289.post@talk.nabble.com>
	<CED81D34E37D5043A1211565277A51E50A2ED927@exchkc02.stowers-institute.org>
Message-ID: <B839F4C3-C225-40B2-B7B0-C2940A35B964@genetics.utah.edu>

Paul,

One other alternative is to use the UCSC table browser (http:// 
genome.ucsc.edu/cgi-bin/hgTables?command=start).  Select your  
organism, upload your ID list.  Select you output options.  You can  
download the coordinates or the fasta directly.  You have options for  
including or excluding various parts of the gene, and upstream/ 
downstream sequences.  This is similar the solution that Malcom  
suggested except the Ensembl option can be run repeatedly as perl  
code as he pointed out.  UCSC allows you to do remote connections to  
their MySQL server so you could set up a repeated task and more  
complex queries that way with the UCSC method.

Barry

On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote:

> How many, how often?
>
> Use ensembl biomart!
>
> First time interactively.
>
> Then if you to pipeline it, take the perl code it generates for you  
> and
> run it - of course you'll have to install the Ensembl Perl API....
>
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Paul N. Hengen
>> Sent: Wednesday, November 28, 2007 7:21 PM
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez  
>> IDs
>>
>>
>> Hi.
>>
>> I have a number of gene IDs from Entrez and I want to find
>> the start and end locations in the human genome. This seemed
>> simple enough, so I started working through some of the
>> examples for using the EntrezGene module at www.bioperl.org
>> Of course this did not work because the core installation
>> does not include this module. So, I think I have two choices
>> (1) install the module (how?), or (2) find an easier way to
>> get the locations in the human genome.
>> I want to use the locations to grab sequences out of the genome.
>> Can anyone offer advice on this? Thanks.
>>
>> -Paul.
>>
>> --
>> Paul N. Hengen, Ph.D.
>> Hematopoietic Stem Cell and Leukemia Research City of Hope
>> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010
>> USA mailto:paulhengen at coh.org
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E
>> ntrez-IDs-tf4894403.html#a14017289
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l